All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pranjal Shrivastava <praan@google.com>
To: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Joerg Roedel <joro@8bytes.org>, Will Deacon <will@kernel.org>,
	Robin Murphy <robin.murphy@arm.com>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	Nicolin Chen <nicolinc@nvidia.com>,
	Mostafa Saleh <smostafa@google.com>,
	Daniel Mentz <danielmentz@google.com>,
	iommu@lists.linux.dev
Subject: Re: [RFC PATCH v3 5/8] pm: runtime: Introduce pm_runtime_get_if_not_suspended()
Date: Wed, 9 Jul 2025 17:06:46 +0000	[thread overview]
Message-ID: <aG6hpoMx1QwwP195@google.com> (raw)
In-Reply-To: <CAJZ5v0gGjt+pRbq2k7qL6E+TGhm8PYHm_VVrwg2ZEjSWk+5FRg@mail.gmail.com>

On Wed, Jul 09, 2025 at 06:35:41PM +0200, Rafael J. Wysocki wrote:
> On Wed, Jul 9, 2025 at 5:51 PM Pranjal Shrivastava <praan@google.com> wrote:
> >
> > On Wed, Jul 09, 2025 at 08:44:06AM +0200, Rafael J. Wysocki wrote:
> > > On Mon, Jun 16, 2025 at 10:32 PM Pranjal Shrivastava <praan@google.com> wrote:
> > > >
> > > > The existing opportunistic helpers like pm_runtime_get_if_active() and
> > > > pm_runtime_get_if_in_use(), are too strict for certain use cases. They
> > > > fail if the device is in a transient state like RPM_SUSPENDING, which
> > > > can lead to drivers making incorrect assumptions about the dev's state.
> > > >
> > > > These helpers don't suffice for cases where one wishes to elide HW clean
> > > > up like queue flushes or TLB invalidations if the device is powered off.
> > > > It is wasteful to wake up the device in cases where the resume callback
> > > > resets the device well OR if the HW resets in a clean state. Thus, if a
> > >ppy to hear your feedback or any alternative ideas you might have > device is powered off, it is preferred to elide any clean-up HW ops like
> > > > queue flushes / TLB invalidations when the device will resume afresh.
> > > >
> > > > Consider the following sequence of operations:
> > > >
> > > > 1. The device is in `RPM_SUSPENDING` state
> > > > 2. The driver calls pm_runtime_get_if_active/in_use
> > > > 3. Depending on these API, the driver elides a HW clean-up op like:
> > > >
> > > > if (pm_runtime_get_if_in_active(dev))
> > > >         invalidate_tlb(dev);
> > > > else
> > > >         // Skip flush, assuming device will fully suspend
> > > >
> > > > 4. Now, another rpm dev-linked device wakes up, causing the device's
> > > >    state to bounce from RPM_SUSPENDING to RPM_ACTIVE without invoking
> > > >    any rpm callbacks, preventing them from resetting the dev correctly.
> > >
> > > This never happens.
> > >
> > > If the status is RPM_SUSPENDING, the runtime suspend callback will be invoked.
> >
> > Ack, I believe we're talking about the check[1] in rpm_resume here:
> 
> No, I'm talking about the fact that the status is changed to
> RPM_SUSPENDING right before invoking the callback.
> 

Right, but if the callback returns -EAGAIN or something, in
rpm_suspend() we go ahead and set the status to RPM_ACTIVE again
on jumping to the `fail` label[1].

> >         if (dev->power.runtime_status == RPM_RESUMING ||
> >             dev->power.runtime_status == RPM_SUSPENDING) {
> >
> >             [...]
> >             /* Wait for the operation carried out in parallel with us. */
> >             [...]
> >             finish_wait(&dev->power.wait_queue, &wait);
> >             goto repeat;
> >
> > However, my worry is about the following situation:
> >
> >   rpm_suspend                      rpm_resume
> >
> >   rpm_status = RPM_SUSPENDING
> >                                    if (RPM_SUSPENDING)
> >                                    prepare_to wait(...)
> >
> >   // suspend fails
> >   retval = rpm_callback(...);
> 
> You're talking about the callback returning an error and your
> changelog is talking about a different situation.
> 

Apologies, I should've been more clear. What I meant was for a situation
where due to *any* reason (except disabling runtime PM) we bounce back
from RPM_SUSPENDING to RPM_ACTIVE *without* invoking the resume callback

> >   goto fail;
> >   [...]
> > fail:
> >   rpm_status = RPM_ACTIVE;
> >                                    finish_wait(...);
> >                                    goto repeat;
> >                                    repeat:
> >                                    if (rpm_status == RPM_ACTIVE) {
> >                                         retval = 1;
> >                                         goto out;
> >                                    }
> >                                 out:
> >                                    [ ... ] // put_parent if one
> >
> >                                    // we return without resume cb();
> >                                    return retval;
> >
> > Now, if we rely on APIs based on pm_runtime_get_conditional(), which
> > might return 0 if (dev->power.runtime_status != RPM_ACTIVE), we might
> > end up eliding some TLB invalidations in the period where the status was
> > RPM_SUSPENDING till it's back to RPM_ACTIVE due to a failed suspend.
> 
> It actually depends on the reason why the callback returned an error.
> 

I'm not sure if I follow.. as per rpm_suspend[1] I see that upon failing
(which also happens on getting a non-zero retval from the suspend_cb) we
set the runtime status to RPM_ACTIVE.

> > This can cause some severe address-aliasing/ghost hit issues since the
> > TLB still has some stale entries..
> 
> I'm not quite sure what you mean.
> 

I meant if the TLB invalidations were elided (i.e. TLB still had stale
entries) in hope that the IOMMU would be suspended, but due to some
reason the suspend failed and the status gets back to RPM_ACTIVE while
exiting the rpm_suspend call and a client wakes up and performs a DMA
(IOMMU transaction), the TLB entry might hit for an address which was
supposed to be invalidated by now.

> > Instead, we'd like to ensure that the status IS suspended
> 
> But you can't.
> 
> That's the difference between RPM_ACTIVE and RPM_SUSPENDED.  If you
> bump up the runtime PM usage counter while the device is RPM_ACTIVE
> (and while holding its power.spinlock), it will not suspend.  There's
> no way to prevent a suspended device from resuming (other than
> disabling runtime PM for it).
> 

The problem is avoiding bumping up the count while the device is in a
transient state like RPM_SUSPENDING and then bouncing back to the
RPM_ACTIVE state without invoking RPM resume callback, which seems to be
possible as per the rpm_suspend implementation [1].

> > to make a decision to elide the TLBI or not.
> >
> > What are your thoughts on this? I'm open to approaching it differently.
> 
> IMV this is all misguided, sorry.
> 

Ack. But I'd want to understand better here. Are you saying that it
isn't at all possible that RPM_SUSPENDING bounces back to RPM_ACTIVE
without invoking the rpm_resume ever? Because per the following snippet,
it seems likely:

	__update_runtime_status(dev, RPM_SUSPENDING);

	callback = RPM_GET_CALLBACK(dev, runtime_suspend);

	dev_pm_enable_wake_irq_check(dev, true);
	retval = rpm_callback(callback, dev);
	if (retval)
		goto fail;

	[...]

fail:
	dev_pm_disable_wake_irq_check(dev, true);
	__update_runtime_status(dev, RPM_ACTIVE);
	dev->power.deferred_resume = false;
	wake_up_all(&dev->power.wait_queue);
	[...]

Sorry if I'm missing something here, but it does look like we can bounce
back to RPM_ACTIVE without invoking the resume callback, which is a
behaviour we'd like to avoid.

> > Happy to hear your feedback or any alternative ideas you might have
> 
> Disable runtime PM for the device and check its runtime PM status,
> which cannot be transient at that point.  If it is RPM_ACTIVE, you
> need to flush the TLB.
> 

I see... let me see how or if we can safely do that.. we might be in
interrupt context while flushing the TLBs..

Thanks,
Praan

[1] https://elixir.bootlin.com/linux/v6.16-rc5/source/drivers/base/power/runtime.c#L678 

  reply	other threads:[~2025-07-09 17:06 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-16 20:31 [RFC PATCH v3 0/8] iommu/arm-smmu-v3: Implement Runtime/System Sleep ops Pranjal Shrivastava
2025-06-16 20:31 ` [RFC PATCH v3 1/8] iommu/arm-smmu-v3: Refactor arm_smmu_setup_irqs Pranjal Shrivastava
2025-07-08 15:15   ` Mostafa Saleh
2025-06-16 20:31 ` [RFC PATCH v3 2/8] iommu/arm-smmu-v3: Add a helper to drain cmd queues Pranjal Shrivastava
2025-07-08 15:32   ` Mostafa Saleh
2025-07-14  9:24     ` Pranjal Shrivastava
2025-06-16 20:31 ` [RFC PATCH v3 3/8] iommu/tegra241-cmdqv: Add a helper to drain VCMDQs Pranjal Shrivastava
2025-06-16 20:31 ` [RFC PATCH v3 4/8] iommu/arm-smmu-v3: Cache and restore MSI config Pranjal Shrivastava
2025-07-08 15:34   ` Mostafa Saleh
2025-07-14  9:01     ` Pranjal Shrivastava
2025-06-16 20:31 ` [RFC PATCH v3 5/8] pm: runtime: Introduce pm_runtime_get_if_not_suspended() Pranjal Shrivastava
2025-06-17  9:19   ` kernel test robot
2025-07-08 22:00   ` Nicolin Chen
2025-07-09 15:51     ` Pranjal Shrivastava
2025-07-09  6:44   ` Rafael J. Wysocki
2025-07-09 15:51     ` Pranjal Shrivastava
2025-07-09 16:35       ` Rafael J. Wysocki
2025-07-09 17:06         ` Pranjal Shrivastava [this message]
2025-07-09 19:37           ` Rafael J. Wysocki
2025-07-10  8:59             ` Pranjal Shrivastava
2025-07-10 10:29               ` Rafael J. Wysocki
2025-07-11 10:20                 ` Pranjal Shrivastava
2025-07-15 23:52                   ` Daniel Mentz
2025-07-16 12:53                     ` Rafael J. Wysocki
2025-07-21 12:44                       ` Will Deacon
2025-06-16 20:31 ` [RFC PATCH v3 6/8] iommu/arm-smmu-v3: Implement pm_runtime & system sleep ops Pranjal Shrivastava
2025-06-16 20:31 ` [RFC PATCH v3 7/8] iommu/arm-smmu-v3: Enable pm_runtime and setup devlinks Pranjal Shrivastava
2025-06-16 20:31 ` [RFC PATCH v3 8/8] iommu/arm-smmu-v3: Invoke pm_runtime before hw access Pranjal Shrivastava

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aG6hpoMx1QwwP195@google.com \
    --to=praan@google.com \
    --cc=danielmentz@google.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@ziepe.ca \
    --cc=joro@8bytes.org \
    --cc=nicolinc@nvidia.com \
    --cc=rafael@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=smostafa@google.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.