Linux Hardware Monitor development
 help / color / mirror / Atom feed
From: Zev Weiss <zev@bewilderbeest.net>
To: Guenter Roeck <linux@roeck-us.net>
Cc: "Armin Wolf" <W_Armin@gmx.de>,
	"Zoltán Kővágó" <dirty.ice.hu@gmail.com>,
	linux-hwmon@vger.kernel.org
Subject: Re: PROBLEM: NCT6775: suspend doesn't work after updating to Linux 5.19
Date: Tue, 9 Aug 2022 23:03:15 -0700	[thread overview]
Message-ID: <YvNKI1ADmFYEsurd@hatter.bewilderbeest.net> (raw)
In-Reply-To: <c53405c8-5a1c-9a68-2135-b8460b915091@roeck-us.net>

On Tue, Aug 09, 2022 at 08:36:24PM PDT, Guenter Roeck wrote:
>On 8/9/22 20:00, Armin Wolf wrote:
>>Am 10.08.22 um 03:39 schrieb Zev Weiss:
>>
>>>On Tue, Aug 09, 2022 at 04:50:20PM PDT, Zoltán Kővágó wrote:
>>>>On 2022-08-10 00:34, Zev Weiss wrote:
>>>>>On Tue, Aug 09, 2022 at 02:28:20PM PDT, Zoltán Kővágó wrote:
>>>>>>On 2022-08-09 22:56, Zev Weiss wrote:
>>>>>>>On Tue, Aug 09, 2022 at 01:27:48PM PDT, Zoltán Kővágó wrote:
>>>>>>>>Hi,
>>>>>>>>
>>>>>>>>[1.] One line summary of the problem:
>>>>>>>>NCT6775: suspend doesn't work after updating to Linux 5.19
>>>>>>>>
>>>>>>>>[2.] Full description of the problem/report:
>>>>>>>>After updating my kernel from 5.18.11 to 5.19, I've noticed that
>>>>>>>>resuming after suspend no longer works: fans start up, then about
>>>>>>>>a second later, the computer just shuts down, leaving the USB
>>>>>>>>ports powered up (normally it turns them off on shutdown). The
>>>>>>>>screens don't turn on during this timeframe, so I can't see any
>>>>>>>>useful log messages.
>>>>>>>>
>>>>>>>>Bisecting between 5.18 (where it still worked) and 5.19 lead me
>>>>>>>>to commit c3963bc0a0cf9ecb205a9d4976eb92b6df2fa3fd "hwmon:
>>>>>>>>(nct6775) Split core and platform driver" which looks like a
>>>>>>>>refactor commit, but apparently it broke something.
>>>>>>>>
>>>>>>>
>>>>>>>Hi Zoltán,
>>>>>>>
>>>>>>>Thanks for the thorough bug report.  You're right that that commit
>>>>>>>was essentially just a refactor, though there was one slight
>>>>>>>change to the nct6775_suspend() function introduced during the
>>>>>>>review process that may perhaps have had some subtle unintended
>>>>>>>side-effects.
>>>>>>>
>>>>>>>Can you test the following patch and report if it resolves the
>>>>>>>problem?
>>>>>>>
>>>>>>>
>>>>>>>Thanks,
>>>>>>>Zev
>>>>>>
>>>>>>Hi Zev,
>>>>>>
>>>>>>Thanks for the quick reply. Yes, it looks like your patch does
>>>>>>solve the problem (I've applied it on top of 5.19 (after fighting
>>>>>>with my mail client for a while) and suspended a few times, it's
>>>>>>working so far).
>>>>>>
>>>>>>Regards,
>>>>>>Zoltan
>>>>>
>>>>>Great, thanks.
>>>>>
>>>>>Guenter, it looks like nct6775_suspend() really does in fact need to
>>>>>use nct6775_update_device() instead of dev_get_drvdata(), though
>>>>>it's not immediately obvious to me why.  Though given that the bulk
>>>>>of of the body of nct6775_update_device() is inside an 'if' block
>>>>>that might not necessarily execute every time, I also wonder if it
>>>>>might be vulnerable to exhibiting the same problem depending on timing.
>>>>>
>>>>>Zoltán, if you could try another experiment to try to gather some
>>>>>data on that -- with the patch from my previous email still applied,
>>>>>could you try suspending via:
>>>>>
>>>>>    $ cat
>>>>>/sys/bus/platform/drivers/nct6775/nct6775.*/hwmon/hwmon*/*_input &&
>>>>>echo mem > /sys/power/state
>>>>
>>>>Tried it, three times in fact, and it worked fine every time. Looking
>>>>at the dmesg, though, it looks like it needs a bit more than 1.5 sec
>>>>to suspend. Where's that 1.5 sec limit defined? I will try to
>>>>increase it tomorrow.
>>>>
>>>
>>>The 1.5 second duration comes from this line in nct6775_update_device():
>>>
>>>  if (time_after(jiffies, data->last_updated + HZ + HZ / 2)
>>>
>>>Each 'HZ' represents one second, so HZ + HZ / 2 = 1.5 sec; if you want
>>>to lengthen it you could do e.g. 10 * HZ or something instead.
>>>
>>>Though as Guenter noted, one other possibility is that with the
>>>previous (buggy) version nct6775_update_device() might never have
>>>gotten called at all -- do you know if that might be the case on your
>>>system?  (i.e.  do you have any userspace monitoring program or the
>>>like that would have been reading from the nct6775 device's sensors?)
>>>If something like that never ran between the driver getting loaded and
>>>the system suspending, that might also (partially) explain things; to
>>>test that out you could revert to the old buggy code and see if the
>>>suspend problem still occurs if you explicitly run
>>>
>>>  $ cat /sys/bus/platform/drivers/nct6775/nct6775.*/hwmon/hwmon*/*_input
>>>
>>>(or just 'sensors' if you've got the lm-sensors package installed).
>>>That will
>>>ensure that nct6775_update_device() gets called before the suspend
>>>operation, which could help narrow things down further.
>>>
>>In case you are running a Debian-based distribution, please note that Debian unconditionally executes sensors on startup, something
>>i discovered when the dell-smm-hwmon driver was causing issues on my machine.
>>
>>So you may need to temporarily disable lm-sensors.service and/or /etc/init.d/lm-sensors for testing.
>>

I'm not sure offhand what it does in that area, but Zoltán's initial bug 
report indicated that the problem occurred on Gentoo, FWIW.

>
>The assumption was that the sensors command was _not_ executed prior
>to suspend and that this triggers the problem. If it was executed at least
>once and the crash on resume still happens, something else must be going
>on. Either case, I think it is best to apply the patch suggested by Zev.
>We can try to figure out the exact cause of the problem later. Zev, can
>you send me a formal patch ? I would like to apply and send it to Linus
>as soon as possible.
>

Ack, patch sent.

 From what we've seen so far though, it seems likely (if for no other 
reason than that if this _isn't_ the cause there must be something truly 
weird going on) that the problem stems from nct6775_update_device() 
simply having never been called at all.  If we can confirm that that's 
the case, should we perhaps just add an unconditional call to 
nct6775_update_device() at the end of nct6775_probe(), and revert the 
patch I just sent so as to avoid the single-consumer symbol export?


Zev


  reply	other threads:[~2022-08-10  6:04 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <f5990ef1-4efe-d2b1-8e50-c6890526c054@gmail.com>
2022-08-09 20:56 ` PROBLEM: NCT6775: suspend doesn't work after updating to Linux 5.19 Zev Weiss
2022-08-09 21:28   ` Zoltán Kővágó
2022-08-09 22:34     ` Zev Weiss
2022-08-09 23:50       ` Zoltán Kővágó
2022-08-10  1:39         ` Zev Weiss
2022-08-10  3:00           ` Armin Wolf
2022-08-10  3:36             ` Guenter Roeck
2022-08-10  6:03               ` Zev Weiss [this message]
2022-08-10  7:33                 ` Guenter Roeck
2022-08-10 22:56                   ` Zev Weiss
2022-08-11  0:20                     ` Guenter Roeck
2022-08-10  8:22           ` Zoltán Kővágó
2022-08-10 22:59             ` Zev Weiss
2022-08-10  0:03       ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YvNKI1ADmFYEsurd@hatter.bewilderbeest.net \
    --to=zev@bewilderbeest.net \
    --cc=W_Armin@gmx.de \
    --cc=dirty.ice.hu@gmail.com \
    --cc=linux-hwmon@vger.kernel.org \
    --cc=linux@roeck-us.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox