From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E52C6221287 for ; Wed, 9 Jul 2025 17:06:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.179 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752080816; cv=none; b=tTdrPIts8dkhXzFisZeJBtu1RbfH5ybNll7gf7mBFhmzKwNNBVrJ2lOXMOLYX5rlWbssrVWmHenWdtPp40K21tq458TH7aQu52uHSkBjvFTDaWajGrIisxywsdHHOI8hVcURGB70e9HMBB5YqWO46Rgofk6pr0bv+q41uo0iSeA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752080816; c=relaxed/simple; bh=txOfErDUFYiZR8QxFVt9GIyV76r4ZHg8iY5adfo6AaA=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=BRXVGUQvW5zr4Ec0YaVmoxqq7qryLyNe4kgPM2qHTBX/3QB45iDNWMDdS+vOjYqp56Ox3+dSSYHxuTHkSKbXyZtviIva9kagrrs45bVjJq1/Xt5ClZ6iUCZPebQk8OwHziSP36LgT8gwoRjXrE/B7W367hqmFAW8/gTkZTfMtQo= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=vlHrB5D0; arc=none smtp.client-ip=209.85.214.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="vlHrB5D0" Received: by mail-pl1-f179.google.com with SMTP id d9443c01a7336-237f18108d2so10185ad.0 for ; Wed, 09 Jul 2025 10:06:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1752080814; x=1752685614; darn=lists.linux.dev; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=wNQL2qfZFq/pC9E2XsaOz0UaH/2iNhGanc2vyyVvb8k=; b=vlHrB5D0YjzXwKNTBONQVtiTZP1NtMoXUXuIAc7bXqufluCyhBnNk33Zg42dwY7Dez 5xVc6u3VdttH4b256I8AAQZ0UNeV2qc7F0drSanzfrD+JiNehj9sWVywqG+nToelNgHf AYRBaPdowjliGL8IA2saZeym3f1iPsVz8cwcL/sFq2Czi+vKAP4XyYR8wcxG/vyucBIq pqD2e0zMNhaSUQl/ULl7YV1Qi8SSzFlczb0701JNslEJsqMfbWORYx5s/ckHHXyC865g f8zfJoBTQ+MVTZ0ML3R8iSMfew6qnHF16yyC5MDvH2MXcV5cveQkqJN9cP5tVZx3uUda 3xWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752080814; x=1752685614; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=wNQL2qfZFq/pC9E2XsaOz0UaH/2iNhGanc2vyyVvb8k=; b=Wtlqv2O8BZzfoEJUxfTmshsPJljfe5brodv0bRWqMi8r4Asli2PIXIWFR37KxhIiw5 Yl+wegqHP7ZWeMizL78iizd/aicLRedvIiavNUL1uSqgN5jX0KOt3jwhJiyXr05OUC0t b/dehlJcTiFoWFBTv5LTv1lRH/IU2DEgXrN7mKGjqBHDmDLx8b6op35wR24rqltb5T2E HVsfa9X/EEFwXWlypqcOuSuyYOWFeLZMCrq54GF/+vSJOvWlGa6cY6uSps/7xFVdDTL9 CWna5Cjiv59/73emiQ3qrbDYcnZBhf0aTxyGa+unK6eMIkweDEnH7oWTiqkJWroNMkqq KLZw== X-Forwarded-Encrypted: i=1; AJvYcCVSVyWQVPKZubsW0KTSs4qjR4aNLXPU8H5OGHxixPdbr0ebQto9FxEb4a/VyTOs1Fgu46fO7w==@lists.linux.dev X-Gm-Message-State: AOJu0YzrMxABMEHFFDr/GsiAcpH3Xo/RSTB66FcE+uSES8vXBWDXGLt9 r3VBbCUfgUBF+r2Yd4b07guep2ZN2NfnrYq+5X5OdLD5zbIi5zP73Iy+vL6PHgpohQ== X-Gm-Gg: ASbGncsfO7Fi8qPqPFwb1Me2d/+uefUEXUsHTK2PHnddG+Q9V28ttM4hTl19O7vcAPy bnBhx1WOeD4oZgN/TAwAsHfljIVKKoVlzJw/8M7eQRcIqLfHk1lJTT5QoQAhLgN/LoOR5jrYobo 8qCn/UMU/RpfP9faVj/QasveDff4DBN3XIMTbg5FlqmGtXHsYOY8MYvGxBd14Sl/BUrlS1snw3o 7EHY71WDbfDKfGlntPY1pq7rbUjTuJcB3BsAmDi/SoSxsjfnTVUeA9fUu6nah4WhptiACFpfXLb TMIbEWl7TXDy0IyIB99mcennOwyF1aAtj49IJ2PO6WnO1Mey2LD7KQJMR6trR8B+rrqt34aG2t6 fYzXscH5ddDwvnd/j+8Fv5Zb0p66YuaU= X-Google-Smtp-Source: AGHT+IFYqxm0okir5DcYIiv9kjCiMptlDQEWMDkrgKJbHiHwbvGeKM8qt2/tzx12KiP/i6eDpeHbYQ== X-Received: by 2002:a17:902:ce0e:b0:23c:8f12:7c4b with SMTP id d9443c01a7336-23dda3a6da1mr3743945ad.10.1752080812473; Wed, 09 Jul 2025 10:06:52 -0700 (PDT) Received: from google.com (232.98.126.34.bc.googleusercontent.com. [34.126.98.232]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-23c84592ef4sm152601735ad.191.2025.07.09.10.06.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Jul 2025 10:06:52 -0700 (PDT) Date: Wed, 9 Jul 2025 17:06:46 +0000 From: Pranjal Shrivastava To: "Rafael J. Wysocki" Cc: Joerg Roedel , Will Deacon , Robin Murphy , Jason Gunthorpe , Nicolin Chen , Mostafa Saleh , Daniel Mentz , iommu@lists.linux.dev Subject: Re: [RFC PATCH v3 5/8] pm: runtime: Introduce pm_runtime_get_if_not_suspended() Message-ID: References: <20250616203149.2649118-1-praan@google.com> <20250616203149.2649118-6-praan@google.com> Precedence: bulk X-Mailing-List: iommu@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Wed, Jul 09, 2025 at 06:35:41PM +0200, Rafael J. Wysocki wrote: > On Wed, Jul 9, 2025 at 5:51 PM Pranjal Shrivastava wrote: > > > > On Wed, Jul 09, 2025 at 08:44:06AM +0200, Rafael J. Wysocki wrote: > > > On Mon, Jun 16, 2025 at 10:32 PM Pranjal Shrivastava wrote: > > > > > > > > The existing opportunistic helpers like pm_runtime_get_if_active() and > > > > pm_runtime_get_if_in_use(), are too strict for certain use cases. They > > > > fail if the device is in a transient state like RPM_SUSPENDING, which > > > > can lead to drivers making incorrect assumptions about the dev's state. > > > > > > > > These helpers don't suffice for cases where one wishes to elide HW clean > > > > up like queue flushes or TLB invalidations if the device is powered off. > > > > It is wasteful to wake up the device in cases where the resume callback > > > > resets the device well OR if the HW resets in a clean state. Thus, if a > > >ppy to hear your feedback or any alternative ideas you might have > device is powered off, it is preferred to elide any clean-up HW ops like > > > > queue flushes / TLB invalidations when the device will resume afresh. > > > > > > > > Consider the following sequence of operations: > > > > > > > > 1. The device is in `RPM_SUSPENDING` state > > > > 2. The driver calls pm_runtime_get_if_active/in_use > > > > 3. Depending on these API, the driver elides a HW clean-up op like: > > > > > > > > if (pm_runtime_get_if_in_active(dev)) > > > > invalidate_tlb(dev); > > > > else > > > > // Skip flush, assuming device will fully suspend > > > > > > > > 4. Now, another rpm dev-linked device wakes up, causing the device's > > > > state to bounce from RPM_SUSPENDING to RPM_ACTIVE without invoking > > > > any rpm callbacks, preventing them from resetting the dev correctly. > > > > > > This never happens. > > > > > > If the status is RPM_SUSPENDING, the runtime suspend callback will be invoked. > > > > Ack, I believe we're talking about the check[1] in rpm_resume here: > > No, I'm talking about the fact that the status is changed to > RPM_SUSPENDING right before invoking the callback. > Right, but if the callback returns -EAGAIN or something, in rpm_suspend() we go ahead and set the status to RPM_ACTIVE again on jumping to the `fail` label[1]. > > if (dev->power.runtime_status == RPM_RESUMING || > > dev->power.runtime_status == RPM_SUSPENDING) { > > > > [...] > > /* Wait for the operation carried out in parallel with us. */ > > [...] > > finish_wait(&dev->power.wait_queue, &wait); > > goto repeat; > > > > However, my worry is about the following situation: > > > > rpm_suspend rpm_resume > > > > rpm_status = RPM_SUSPENDING > > if (RPM_SUSPENDING) > > prepare_to wait(...) > > > > // suspend fails > > retval = rpm_callback(...); > > You're talking about the callback returning an error and your > changelog is talking about a different situation. > Apologies, I should've been more clear. What I meant was for a situation where due to *any* reason (except disabling runtime PM) we bounce back from RPM_SUSPENDING to RPM_ACTIVE *without* invoking the resume callback > > goto fail; > > [...] > > fail: > > rpm_status = RPM_ACTIVE; > > finish_wait(...); > > goto repeat; > > repeat: > > if (rpm_status == RPM_ACTIVE) { > > retval = 1; > > goto out; > > } > > out: > > [ ... ] // put_parent if one > > > > // we return without resume cb(); > > return retval; > > > > Now, if we rely on APIs based on pm_runtime_get_conditional(), which > > might return 0 if (dev->power.runtime_status != RPM_ACTIVE), we might > > end up eliding some TLB invalidations in the period where the status was > > RPM_SUSPENDING till it's back to RPM_ACTIVE due to a failed suspend. > > It actually depends on the reason why the callback returned an error. > I'm not sure if I follow.. as per rpm_suspend[1] I see that upon failing (which also happens on getting a non-zero retval from the suspend_cb) we set the runtime status to RPM_ACTIVE. > > This can cause some severe address-aliasing/ghost hit issues since the > > TLB still has some stale entries.. > > I'm not quite sure what you mean. > I meant if the TLB invalidations were elided (i.e. TLB still had stale entries) in hope that the IOMMU would be suspended, but due to some reason the suspend failed and the status gets back to RPM_ACTIVE while exiting the rpm_suspend call and a client wakes up and performs a DMA (IOMMU transaction), the TLB entry might hit for an address which was supposed to be invalidated by now. > > Instead, we'd like to ensure that the status IS suspended > > But you can't. > > That's the difference between RPM_ACTIVE and RPM_SUSPENDED. If you > bump up the runtime PM usage counter while the device is RPM_ACTIVE > (and while holding its power.spinlock), it will not suspend. There's > no way to prevent a suspended device from resuming (other than > disabling runtime PM for it). > The problem is avoiding bumping up the count while the device is in a transient state like RPM_SUSPENDING and then bouncing back to the RPM_ACTIVE state without invoking RPM resume callback, which seems to be possible as per the rpm_suspend implementation [1]. > > to make a decision to elide the TLBI or not. > > > > What are your thoughts on this? I'm open to approaching it differently. > > IMV this is all misguided, sorry. > Ack. But I'd want to understand better here. Are you saying that it isn't at all possible that RPM_SUSPENDING bounces back to RPM_ACTIVE without invoking the rpm_resume ever? Because per the following snippet, it seems likely: __update_runtime_status(dev, RPM_SUSPENDING); callback = RPM_GET_CALLBACK(dev, runtime_suspend); dev_pm_enable_wake_irq_check(dev, true); retval = rpm_callback(callback, dev); if (retval) goto fail; [...] fail: dev_pm_disable_wake_irq_check(dev, true); __update_runtime_status(dev, RPM_ACTIVE); dev->power.deferred_resume = false; wake_up_all(&dev->power.wait_queue); [...] Sorry if I'm missing something here, but it does look like we can bounce back to RPM_ACTIVE without invoking the resume callback, which is a behaviour we'd like to avoid. > > Happy to hear your feedback or any alternative ideas you might have > > Disable runtime PM for the device and check its runtime PM status, > which cannot be transient at that point. If it is RPM_ACTIVE, you > need to flush the TLB. > I see... let me see how or if we can safely do that.. we might be in interrupt context while flushing the TLBs.. Thanks, Praan [1] https://elixir.bootlin.com/linux/v6.16-rc5/source/drivers/base/power/runtime.c#L678