From: Guenter Roeck <linux@roeck-us.net>
To: Zev Weiss <zev@bewilderbeest.net>
Cc: "Armin Wolf" <W_Armin@gmx.de>,
"Zoltán Kővágó" <dirty.ice.hu@gmail.com>,
linux-hwmon@vger.kernel.org
Subject: Re: PROBLEM: NCT6775: suspend doesn't work after updating to Linux 5.19
Date: Wed, 10 Aug 2022 00:33:43 -0700 [thread overview]
Message-ID: <e1086de8-e58c-0aa2-cc42-6ea8958c795e@roeck-us.net> (raw)
In-Reply-To: <YvNKI1ADmFYEsurd@hatter.bewilderbeest.net>
On 8/9/22 23:03, Zev Weiss wrote:
> On Tue, Aug 09, 2022 at 08:36:24PM PDT, Guenter Roeck wrote:
>> On 8/9/22 20:00, Armin Wolf wrote:
>>> Am 10.08.22 um 03:39 schrieb Zev Weiss:
>>>
>>>> On Tue, Aug 09, 2022 at 04:50:20PM PDT, Zoltán Kővágó wrote:
>>>>> On 2022-08-10 00:34, Zev Weiss wrote:
>>>>>> On Tue, Aug 09, 2022 at 02:28:20PM PDT, Zoltán Kővágó wrote:
>>>>>>> On 2022-08-09 22:56, Zev Weiss wrote:
>>>>>>>> On Tue, Aug 09, 2022 at 01:27:48PM PDT, Zoltán Kővágó wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> [1.] One line summary of the problem:
>>>>>>>>> NCT6775: suspend doesn't work after updating to Linux 5.19
>>>>>>>>>
>>>>>>>>> [2.] Full description of the problem/report:
>>>>>>>>> After updating my kernel from 5.18.11 to 5.19, I've noticed that
>>>>>>>>> resuming after suspend no longer works: fans start up, then about
>>>>>>>>> a second later, the computer just shuts down, leaving the USB
>>>>>>>>> ports powered up (normally it turns them off on shutdown). The
>>>>>>>>> screens don't turn on during this timeframe, so I can't see any
>>>>>>>>> useful log messages.
>>>>>>>>>
>>>>>>>>> Bisecting between 5.18 (where it still worked) and 5.19 lead me
>>>>>>>>> to commit c3963bc0a0cf9ecb205a9d4976eb92b6df2fa3fd "hwmon:
>>>>>>>>> (nct6775) Split core and platform driver" which looks like a
>>>>>>>>> refactor commit, but apparently it broke something.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Zoltán,
>>>>>>>>
>>>>>>>> Thanks for the thorough bug report. You're right that that commit
>>>>>>>> was essentially just a refactor, though there was one slight
>>>>>>>> change to the nct6775_suspend() function introduced during the
>>>>>>>> review process that may perhaps have had some subtle unintended
>>>>>>>> side-effects.
>>>>>>>>
>>>>>>>> Can you test the following patch and report if it resolves the
>>>>>>>> problem?
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Zev
>>>>>>>
>>>>>>> Hi Zev,
>>>>>>>
>>>>>>> Thanks for the quick reply. Yes, it looks like your patch does
>>>>>>> solve the problem (I've applied it on top of 5.19 (after fighting
>>>>>>> with my mail client for a while) and suspended a few times, it's
>>>>>>> working so far).
>>>>>>>
>>>>>>> Regards,
>>>>>>> Zoltan
>>>>>>
>>>>>> Great, thanks.
>>>>>>
>>>>>> Guenter, it looks like nct6775_suspend() really does in fact need to
>>>>>> use nct6775_update_device() instead of dev_get_drvdata(), though
>>>>>> it's not immediately obvious to me why. Though given that the bulk
>>>>>> of of the body of nct6775_update_device() is inside an 'if' block
>>>>>> that might not necessarily execute every time, I also wonder if it
>>>>>> might be vulnerable to exhibiting the same problem depending on timing.
>>>>>>
>>>>>> Zoltán, if you could try another experiment to try to gather some
>>>>>> data on that -- with the patch from my previous email still applied,
>>>>>> could you try suspending via:
>>>>>>
>>>>>> $ cat
>>>>>> /sys/bus/platform/drivers/nct6775/nct6775.*/hwmon/hwmon*/*_input &&
>>>>>> echo mem > /sys/power/state
>>>>>
>>>>> Tried it, three times in fact, and it worked fine every time. Looking
>>>>> at the dmesg, though, it looks like it needs a bit more than 1.5 sec
>>>>> to suspend. Where's that 1.5 sec limit defined? I will try to
>>>>> increase it tomorrow.
>>>>>
>>>>
>>>> The 1.5 second duration comes from this line in nct6775_update_device():
>>>>
>>>> if (time_after(jiffies, data->last_updated + HZ + HZ / 2)
>>>>
>>>> Each 'HZ' represents one second, so HZ + HZ / 2 = 1.5 sec; if you want
>>>> to lengthen it you could do e.g. 10 * HZ or something instead.
>>>>
>>>> Though as Guenter noted, one other possibility is that with the
>>>> previous (buggy) version nct6775_update_device() might never have
>>>> gotten called at all -- do you know if that might be the case on your
>>>> system? (i.e. do you have any userspace monitoring program or the
>>>> like that would have been reading from the nct6775 device's sensors?)
>>>> If something like that never ran between the driver getting loaded and
>>>> the system suspending, that might also (partially) explain things; to
>>>> test that out you could revert to the old buggy code and see if the
>>>> suspend problem still occurs if you explicitly run
>>>>
>>>> $ cat /sys/bus/platform/drivers/nct6775/nct6775.*/hwmon/hwmon*/*_input
>>>>
>>>> (or just 'sensors' if you've got the lm-sensors package installed).
>>>> That will
>>>> ensure that nct6775_update_device() gets called before the suspend
>>>> operation, which could help narrow things down further.
>>>>
>>> In case you are running a Debian-based distribution, please note that Debian unconditionally executes sensors on startup, something
>>> i discovered when the dell-smm-hwmon driver was causing issues on my machine.
>>>
>>> So you may need to temporarily disable lm-sensors.service and/or /etc/init.d/lm-sensors for testing.
>>>
>
> I'm not sure offhand what it does in that area, but Zoltán's initial bug report indicated that the problem occurred on Gentoo, FWIW.
>
>>
>> The assumption was that the sensors command was _not_ executed prior
>> to suspend and that this triggers the problem. If it was executed at least
>> once and the crash on resume still happens, something else must be going
>> on. Either case, I think it is best to apply the patch suggested by Zev.
>> We can try to figure out the exact cause of the problem later. Zev, can
>> you send me a formal patch ? I would like to apply and send it to Linus
>> as soon as possible.
>>
>
> Ack, patch sent.
>
> From what we've seen so far though, it seems likely (if for no other reason than that if this _isn't_ the cause there must be something truly weird going on) that the problem stems from nct6775_update_device() simply having never been called at all. If we can confirm that that's the case, should we perhaps just add an unconditional call to nct6775_update_device() at the end of nct6775_probe(), and revert the patch I just sent so as to avoid the single-consumer symbol export?
>
I thought about calling it in probe, but I don't really like it because
it would slow down boot. Let's introduce a namespace (say, NCT6775) instead.
Change all EXPORT_SYMBOL_GPL(x) to EXPORT_SYMBOL_NS_GPL(x, NCT6775) and
add MODULE_IMPORT_NS(NCT6775) to nct6775-platform.c and nct6775-i2c.c.
This will ensure that the exported symbols can only be called from the
nct6775 code.
Thanks,
Guenter
next prev parent reply other threads:[~2022-08-10 7:33 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <f5990ef1-4efe-d2b1-8e50-c6890526c054@gmail.com>
2022-08-09 20:56 ` PROBLEM: NCT6775: suspend doesn't work after updating to Linux 5.19 Zev Weiss
2022-08-09 21:28 ` Zoltán Kővágó
2022-08-09 22:34 ` Zev Weiss
2022-08-09 23:50 ` Zoltán Kővágó
2022-08-10 1:39 ` Zev Weiss
2022-08-10 3:00 ` Armin Wolf
2022-08-10 3:36 ` Guenter Roeck
2022-08-10 6:03 ` Zev Weiss
2022-08-10 7:33 ` Guenter Roeck [this message]
2022-08-10 22:56 ` Zev Weiss
2022-08-11 0:20 ` Guenter Roeck
2022-08-10 8:22 ` Zoltán Kővágó
2022-08-10 22:59 ` Zev Weiss
2022-08-10 0:03 ` Guenter Roeck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e1086de8-e58c-0aa2-cc42-6ea8958c795e@roeck-us.net \
--to=linux@roeck-us.net \
--cc=W_Armin@gmx.de \
--cc=dirty.ice.hu@gmail.com \
--cc=linux-hwmon@vger.kernel.org \
--cc=zev@bewilderbeest.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox