From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pj1-f42.google.com (mail-pj1-f42.google.com [209.85.216.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5D882C2C9 for ; Mon, 17 Feb 2025 03:49:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.42 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739764196; cv=none; b=AEJG7OMctCRwqNmCD5rb9WFrsC7lW0DD9XmSdFqNFSTqMUgjNEVs2wuzMicyUJXqLkL7Jonm7q8koenXdpb7rweJv5t6SUOvW3oGjCtz5rpLz/y7iBWNHv4k7e/pDicAih1IVUaJDHWjrm2RQNo1dn0tUKo2V4sg/w5/D5l6ou8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739764196; c=relaxed/simple; bh=AEctecPW644bInprG1Q0+EICJctiBz+Em9TO21wIBsM=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=QLaphU43CTHPJ0gUB3aiP/bipCsptvdvhUkJK4OOT3ytCh35hqts3ls5UlWsZtLg7gWI6cfV8zkIWUT6W0v4AkVbR36o54CmxLFYZik4baW2KU9I+kbbPOJ40Poefn5iKTabZtxhJTJumwG3bUKRkp/+dhkT3ywC4GKkO8VM/lU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=xeyKevpr; arc=none smtp.client-ip=209.85.216.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="xeyKevpr" Received: by mail-pj1-f42.google.com with SMTP id 98e67ed59e1d1-2fc3a14f6fbso2653705a91.3 for ; Sun, 16 Feb 2025 19:49:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1739764195; x=1740368995; darn=vger.kernel.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=q6qV2LKgDU3t0RtwY5zJqsVUWjzCYVJ/54cGupGGOAA=; b=xeyKevprCrOYvYbdurqZtnbhcZINaGfp+vCLLYEJtPSeqEFj7E8iEMcskh9WcoROAy YNw9/d3YHEqoBG5t+DU4BB1e+Se29yYWVloLKZiTBFYURbtyv87qV9UyabfHfzuJxylv H4L5zOZrnHTF3vKjFo3ZVWNMyYKc6GzrcYBZL4tLblYJR1/dKbCoqoHwLdkf8P+qqqj3 V5t6amp7YrmuVj5GbqUrN9X5AZttyguELNaGx0lS6h6chEaaf8gnz/Aom+J/WdQGEMwQ FLT9oa5ZE6JAgbwW2nyhtxg3ApRGsfJ4Sj90WomhxwjaIlmVfLW+P60Nq5hl06QzeKBY jl7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739764195; x=1740368995; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=q6qV2LKgDU3t0RtwY5zJqsVUWjzCYVJ/54cGupGGOAA=; b=ngfuTtt1CPytafGYsWZRIYYD0mDinxtL1I4+EjwL6+umYd+AU8k2eOU+Yzcv9F5wKU PBv8FVdG//LOkRAhQFutjmGvIJCSFsCidiTMSu42r6rJslH5fl9H0CYsn15V2l1cSJLp z/GzbEd7k/LwR6bsjBAIhszUBTRe2zRf7PBZMY3eWCggVotLqCI4KBmsN3slaA9oixbL csGFugasXYo7RcBbQO2fI3XQTjwhKJ1gZYxK2EoXov+yS8mC/ZEmGr1jAFl8YDXIjaVf 0gYtt5T6iZQvXvmw206TbG2wozwcWClnTFsIMxALKLv+Ldbm+UI6Fs6wFKaaLGCH7j0Y 6G8Q== X-Forwarded-Encrypted: i=1; AJvYcCVRagE2r7SpiqU9KllK5dzZO2mTwXSJDa4ZhHH+FACWxz1Unav2OAgJBq3kfMsTbtal16yjDuP61g==@vger.kernel.org X-Gm-Message-State: AOJu0YwhdRmxfFagnCSuIoFjRSExSQJyvqMWraFV0EzbxNC+ODauKc4Q HNjgzvTyttCTqCO3K73XHbFoABZIgOjVMnyVBAZ8lQNM8hw89o4x1yV+IquDBw== X-Gm-Gg: ASbGncu72KA2uF0KSO9qpRPjSFThUOl3G+OKQtHy7o+ydGGeUnT2WKMOmhXfj2S+Yjb EeWlblE8eXvprAO94aIMyUr034cjW4ONpq5pmIXxRJVFYo7grp3YNHlQXbnSU+pyCzGwGg2/tGF JBClfdAagJVwEXkURgcQvOyUaqK/Mrjd1wpYL4ZKxcJ3U440oaZ9FqTWo07WruElC64Fa27Vfn3 qqloYXM5A21aKax5jVY1uTm502DZTmZ0GM8s9QoXta0f00+AcjdE6Xk4pFaKjVWVcwI3ldqZEbS adTF+O4i9O/pY0kfa6hxWj+iV+zpFwU5lnOHBoO1OpRvhHhkuZuiP4kZBw== X-Google-Smtp-Source: AGHT+IG+mp2u7hYvW8OafGPbN4o2uVFyOs0S++K2BLbGBPWA98vSvCccRtLwR4LjGhALrh85g6TMxQ== X-Received: by 2002:a17:90b:35c9:b0:2f6:d266:f462 with SMTP id 98e67ed59e1d1-2fc411540camr12191613a91.35.1739764194422; Sun, 16 Feb 2025 19:49:54 -0800 (PST) Received: from google.com (49.156.143.34.bc.googleusercontent.com. [34.143.156.49]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2fc13ac0a5bsm6833135a91.18.2025.02.16.19.49.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 16 Feb 2025 19:49:53 -0800 (PST) Date: Mon, 17 Feb 2025 09:19:45 +0530 From: Ajay Agarwal To: "Rafael J. Wysocki" Cc: Brian Norris , Oliver Neukum , "Rafael J. Wysocki" , Vincent Whitchurch , "jic23@kernel.org" , "linux-pm@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-iio@vger.kernel.org" , Brian Norris , Joy Chakraborty , Vamshi Gajjela , Manu Gautam Subject: Re: PM runtime_error handling missing in many drivers? Message-ID: References: <5caa944f-c841-6f74-8e43-a278b2b93b06@suse.com> <20220708110325.GA5307@axis.com> <4ca77763-53d0-965a-889e-be2eafadfd2f@intel.com> <1937b65c-36c0-5475-c745-d7285d1a6e25@suse.com> <5c37ee19-fe2c-fb22-63a2-638e3dab8f7a@suse.com> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Wed, Feb 12, 2025 at 08:29:34PM +0100, Rafael J. Wysocki wrote: > On Tue, Feb 11, 2025 at 11:21 PM Brian Norris wrote: > > > > Hi Ajay, > > > > On Mon, Feb 10, 2025 at 09:02:41AM +0530, Ajay Agarwal wrote: > > > On Wed, Jul 27, 2022 at 06:31:48PM +0200, Rafael J. Wysocki wrote: > > > > On Wed, Jul 27, 2022 at 10:08 AM Oliver Neukum wrote: > > > > > On 26.07.22 17:41, Rafael J. Wysocki wrote: > > > > > > Well, in general suspending or resuming a device is a collaborative > > > > > > effort and if one of the pieces falls over, making it work again > > > > > > involves fixing up the failing piece and notifying the others that it > > > > > > is ready again. However, that part isn't covered and I'm not sure if > > > > > > it can be covered in a sufficiently generic way. > > > > > > > > > > True. But that still cannot solve the question what is to be done > > > > > if error handling fails. Hence my proposal: > > > > > - record all failures > > > > > - heed the record only when suspending > > > > > > > > I guess that would boil down to moving the power.runtime_error update > > > > from rpm_callback() to rpm_suspend()? > > > Resuming this discussion. One of the ways the device drivers are > > > clearing the runtime_error flag is by calling pm_runtime_set_suspended > > > [1]. > > I personally think that jumping on a 2.5 years old thread is not a > good idea. It would be better to restate the problem statement and > provide the link to the previous discussion. > > > > To me, it feels weird that a device driver calls pm_runtime_set_suspended > > > if the runtime_resume() has failed. It should be implied that the device > > > is in suspended state if the resume failed. > > > > > > So how really should the runtime_error flag be cleared? Should there be > > > a new API exposed to device drivers for this? Or should we plan for it > > > in the framework itself? > > > > While the API naming is unclear, that's exactly what > > pm_runtime_set_suspended() is about. Personally, I find it nice when a > > driver adds the comment "clear runtime_error flag", because otherwise > > it's not really obvious why a driver has to take care of "suspending" > > after a failed resume. But that's not the biggest question here, IMO. > > > > The real reson I pointed you at this thread was because I think it's > > useful to pursue the proposal above: to avoid setting a persistent > > "runtime_error" for resume failures. This seems to just create a pitfall > > for clients, as asked by Vincent and Oliver upthread. > > > > And along this line, there are relatively few drivers that actually > > bother to reset this error flag ever (e.g., commit f2bc2afe34c1 > > ("accel/ivpu: Clear runtime_error after pm_runtime_resume_and_get() > > fails")). > > > > So to me, we should simply answer Rafael's question: > > > > (repeated:) > > > > I guess that would boil down to moving the power.runtime_error update > > > > from rpm_callback() to rpm_suspend()? > > > > Yes, I think so. (Although I'm not sure if this leaves undesirable spam > > where persistent .runtime_resume() failures occur.) > > > > ...and then write/test/submit such a patch, provided it achieves the > > desired results. > > > > Unless of course one of the thread participants here has some other > > update in the intervening 2.5 years, or if Rafael was simply asking the > > above rhetorically, and wasn't actually interested in fielding such a > > change. > > The reason why runtime_error is there is to prevent runtime PM > callbacks from being run until something is done about the error, > under the assumption that running them in that case may make the > problem worse. > > I'm not sure if I see a substantial difference between suspend and > resume in that respect: If any of them fails, the state of the device > is kind of unstable. In particular, if resume fails and the device > doesn't actually resume, something needs to be done about it or it > just becomes unusable. > > Now, the way of clearing the error may not be super-convenient, which > was a bit hard to figure out upfront, so I'm not against making any > changes as long as there are sufficient reasons for making them. I am thinking if we can start with a change to not check runtime_error in rpm_resume, and let it go through even if the previous rpm_resume attempt failed. Something like this: ``` static int rpm_resume(struct device *dev, int rpmflags) trace_rpm_resume(dev, rpmflags); repeat: - if (dev->power.runtime_error) { - retval = -EINVAL; - } else if (dev->power.disable_depth > 0) { + if (dev->power.disable_depth > 0) { if (dev->power.runtime_status == RPM_ACTIVE && dev->power.last_status == RPM_ACTIVE) retval = 1; ``` I think setting the runtime_error in rpm_callback, i.e. for both resume and suspend is still a good idea for book-keeping purposes, e.g. the user reading the runtime_status of the device from sysfs.