All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Lippers-Hollmann <s.l-h@gmx.de>
To: Eric Biggers <ebiggers@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Linux PM <linux-pm@vger.kernel.org>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Lukasz Luba <lukasz.luba@arm.com>,
	Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>,
	Zhang Rui <rui.zhang@intel.com>,
	Neil Armstrong <neil.armstrong@linaro.org>,
	"Linux regression tracking (Thorsten Leemhuis)" 
	<regressions@leemhuis.info>
Subject: Re: [PATCH v3] thermal: core: Call monitor_thermal_zone() if zone temperature is invalid
Date: Mon, 15 Jul 2024 11:06:59 +0200	[thread overview]
Message-ID: <20240715110659.51b441e2@mir> (raw)
In-Reply-To: <20240715044527.GA1544@sol.localdomain>

Hi

On 2024-07-14, Eric Biggers wrote:
> On Thu, Jul 04, 2024 at 01:46:26PM +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > Commit 202aa0d4bb53 ("thermal: core: Do not call handle_thermal_trip()
> > if zone temperature is invalid") caused __thermal_zone_device_update()
> > to return early if the current thermal zone temperature was invalid.
> >
> > This was done to avoid running handle_thermal_trip() and governor
> > callbacks in that case which led to confusion.  However, it went too
> > far because monitor_thermal_zone() still needs to be called even when
> > the zone temperature is invalid to ensure that it will be updated
> > eventually in case thermal polling is enabled and the driver has no
> > other means to notify the core of zone temperature changes (for example,
> > it does not register an interrupt handler or ACPI notifier).
> >
> > Also if the .set_trips() zone callback is expected to set up monitoring
> > interrupts for a thermal zone, it needs to be provided with valid
> > boundaries and that can only be done if the zone temperature is known.
> >
> > Accordingly, to ensure that __thermal_zone_device_update() will
> > run again after a failing zone temperature check, make it call
> > monitor_thermal_zone() regardless of whether or not the zone
> > temperature is valid and make the latter schedule a thermal zone
> > temperature update if the zone temperature is invalid even if
> > polling is not enabled for the thermal zone (however, if this
> > continues to fail, give up after some time).
> >
> > Fixes: 202aa0d4bb53 ("thermal: core: Do not call handle_thermal_trip() if zone temperature is invalid")
> > Reported-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> > Link: https://lore.kernel.org/linux-pm/dc1e6cba-352b-4c78-93b5-94dd033fca16@linaro.org
> > Link: https://lore.kernel.org/linux-pm/2764814.mvXUDI8C0e@rjwysocki.net
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> On v6.10 I'm seeing the following messages spammed to the kernel log endlessly,
> and reverting this commit fixes it.
>
>     [  156.410567] thermal thermal_zone0: failed to read out thermal zone (-61)
[...]
>     [  158.458697] thermal thermal_zone0: failed to read out thermal zone (-61)
>
> /sys/class/thermal/thermal_zone0/type contains "iwlwifi_1".

I am observing the same issue on v6.10 with an Intel ax200 WLAN
card in a kaby-lake/ i5-7400 system and a Fujitsu D3400-B22
mainboard and the 'newest' BIOS (V5.0.0.12 R1.29.0) as well:

$ dmesg | grep -i -e iwlwifi -e thermal_zone2
[    3.692433] iwlwifi 0000:04:00.0: enabling device (0140 -> 0142)
[    3.698547] iwlwifi 0000:04:00.0: Detected crf-id 0x3617, cnv-id 0x100530 wfpm id 0x80000000
[    3.698556] iwlwifi 0000:04:00.0: PCI dev 2723/0084, rev=0x340, rfid=0x10a100
[    3.703292] iwlwifi 0000:04:00.0: TLV_FW_FSEQ_VERSION: FSEQ Version: 89.3.35.37
[    3.797296] iwlwifi 0000:04:00.0: loaded firmware version 77.a20fb07d.0 cc-a0-77.ucode op_mode iwlmvm
[    4.090341] iwlwifi 0000:04:00.0: Detected Intel(R) Wi-Fi 6 AX200 160MHz, REV=0x340
[    4.090524] thermal thermal_zone2: failed to read out thermal zone (-61)
[    4.218496] iwlwifi 0000:04:00.0: Detected RF HR B3, rfid=0x10a100
[    4.285399] iwlwifi 0000:04:00.0: base HW address: 94:e6:f7:XX:XX:XX
[    4.341754] iwlwifi 0000:04:00.0 wlp4s0: renamed from wlan0
[    4.345445] thermal thermal_zone2: failed to read out thermal zone (-61)
[    4.601400] thermal thermal_zone2: failed to read out thermal zone (-61)
[    4.857372] thermal thermal_zone2: failed to read out thermal zone (-61)
[    5.114387] thermal thermal_zone2: failed to read out thermal zone (-61)
[...]
[  143.643801] thermal thermal_zone2: failed to read out thermal zone (-61)
[  143.899818] thermal thermal_zone2: failed to read out thermal zone (-61)
[  144.155813] thermal thermal_zone2: failed to read out thermal zone (-61)
[  144.411815] thermal thermal_zone2: failed to read out thermal zone (-61)
[  144.667828] thermal thermal_zone2: failed to read out thermal zone (-61)
[  144.923801] thermal thermal_zone2: failed to read out thermal zone (-61)
[  145.179822] thermal thermal_zone2: failed to read out thermal zone (-61)
[...]

$ cat  /sys/class/thermal/thermal_zone2/type
iwlwifi_1

38cba05a86d157685d930a4400022eb4  /lib/firmware/iwlwifi-cc-a0-77.ucode
ce9c6e3bda22003f9a9b97cbca94b8215911b7a146c0f4f017963dbb1a233351  /lib/firmware/iwlwifi-cc-a0-77.ucode

git bisect led me to this commit as part of kernel v6.10:

$ LANG= git bisect log
git bisect start
# Status: warte auf guten und schlechten Commit
# bad: [0c3836482481200ead7b416ca80c68a29cfdaabd] Linux 6.10
git bisect bad 0c3836482481200ead7b416ca80c68a29cfdaabd
# Status: warte auf gute(n) Commit(s), schlechter Commit bekannt
# good: [a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6] Linux 6.9
git bisect good a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6
# good: [33e02dc69afbd8f1b85a51d74d72f139ba4ca623] Merge tag 'sound-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
git bisect good 33e02dc69afbd8f1b85a51d74d72f139ba4ca623
# good: [29c73fc794c83505066ee6db893b2a83ac5fac63] Merge tag 'perf-tools-for-v6.10-1-2024-05-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
git bisect good 29c73fc794c83505066ee6db893b2a83ac5fac63
# good: [e159d63e6940a2a16bb73616d8c528e93b84a6bb] Merge tag 'kvm-riscv-fixes-6.10-2' of https://github.com/kvm-riscv/linux into HEAD
git bisect good e159d63e6940a2a16bb73616d8c528e93b84a6bb
# good: [d1505b5cd0426bbddbbc99f10e3ae0b52aaa1d1f] Merge tag 'powerpc-6.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
git bisect good d1505b5cd0426bbddbbc99f10e3ae0b52aaa1d1f
# good: [4a0929b0062a6b04207a414be9be97eb22965bc1] Merge tag 'media/v6.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
git bisect good 4a0929b0062a6b04207a414be9be97eb22965bc1
# bad: [ef2b7eb55e10294f4f384f21506ef20a6184128c] Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
git bisect bad ef2b7eb55e10294f4f384f21506ef20a6184128c
# good: [968460731f95be9977bc59a513acbc5afc71117d] Merge tag 'gpio-fixes-for-v6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
git bisect good 968460731f95be9977bc59a513acbc5afc71117d
# good: [5a4bd506ddad75f1f2711cfbcf7551a5504e3f1e] Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
git bisect good 5a4bd506ddad75f1f2711cfbcf7551a5504e3f1e
# bad: [a19ea421490dcc45c9f78145bb2703ac5d373b28] Merge tag 'platform-drivers-x86-v6.10-6' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
git bisect bad a19ea421490dcc45c9f78145bb2703ac5d373b28
# good: [34afb82a3c67f869267a26f593b6f8fc6bf35905] Merge tag '6.10-rc6-smb3-server-fixes' of git://git.samba.org/ksmbd
git bisect good 34afb82a3c67f869267a26f593b6f8fc6bf35905
# bad: [d045c46c52740b0d5e92d376f0b7843b0c0d935a] Merge tag 'thermal-6.10-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
git bisect bad d045c46c52740b0d5e92d376f0b7843b0c0d935a
# bad: [94eacc1c583dd2ba51a2158fb13285f5dc42714b] thermal: core: Fix list sorting in __thermal_zone_device_update()
git bisect bad 94eacc1c583dd2ba51a2158fb13285f5dc42714b
# bad: [a8a261774466d8691e555ea674c193bb1b09edab] thermal: core: Call monitor_thermal_zone() if zone temperature is invalid
git bisect bad a8a261774466d8691e555ea674c193bb1b09edab
# good: [aaa18ff54b97706b84306b6613630262706b1f6b] thermal: gov_power_allocator: Return early in manage if trip_max is NULL
git bisect good aaa18ff54b97706b84306b6613630262706b1f6b
# first bad commit: [a8a261774466d8691e555ea674c193bb1b09edab] thermal: core: Call monitor_thermal_zone() if zone temperature is invalid

Reverting 202aa0d4bb532338cd27bcc64c60abc2987a2be7 on top of v6.10 avoids
the issue for me.

$ lspci -nn
00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers [8086:591f] (rev 05)
00:01.0 PCI bridge [0604]: Intel Corporation 6th-10th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 05)
00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 630 [8086:5912] (rev 04)
00:14.0 USB controller [0c03]: Intel Corporation 100 Series/C230 Series Chipset Family USB 3.0 xHCI Controller [8086:a12f] (rev 31)
00:14.2 Signal processing controller [1180]: Intel Corporation 100 Series/C230 Series Chipset Family Thermal Subsystem [8086:a131] (rev 31)
00:16.0 Communication controller [0780]: Intel Corporation 100 Series/C230 Series Chipset Family MEI Controller #1 [8086:a13a] (rev 31)
00:17.0 SATA controller [0106]: Intel Corporation Q170/Q150/B150/H170/H110/Z170/CM236 Chipset SATA Controller [AHCI Mode] [8086:a102] (rev 31)
00:1c.0 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #5 [8086:a114] (rev f1)
00:1c.6 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #7 [8086:a116] (rev f1)
00:1c.7 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #8 [8086:a117] (rev f1)
00:1f.0 ISA bridge [0601]: Intel Corporation H110 Chipset LPC/eSPI Controller [8086:a143] (rev 31)
00:1f.2 Memory controller [0580]: Intel Corporation 100 Series/C230 Series Chipset Family Power Management Controller [8086:a121] (rev 31)
00:1f.3 Audio device [0403]: Intel Corporation 100 Series/C230 Series Chipset Family HD Audio Controller [8086:a170] (rev 31)
00:1f.4 SMBus [0c05]: Intel Corporation 100 Series/C230 Series Chipset Family SMBus [8086:a123] (rev 31)
01:00.0 Non-Volatile memory controller [0108]: SK hynix BC901 NVMe Solid State Drive (DRAM-less) [1c5c:1d59] (rev 03)
02:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller [10ec:8125] (rev 05)
03:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 0c)
04:00.0 Network controller [0280]: Intel Corporation Wi-Fi 6 AX200 [8086:2723] (rev 1a)

04:00.0 Network controller: Intel Corporation Wi-Fi 6 AX200 (rev 1a)
        Subsystem: Intel Corporation Wi-Fi 6 AX200NGW
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 19
        IOMMU group: 12
        Region 0: Memory at efb00000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [c8] Power Management version 3
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [40] Express (v2) Endpoint, IntMsgNum 0
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 unlimited
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0W TEE-IO-
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr+ NoSnoop+ FLReset-
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L1, Exit Latency L1 <8us
                        ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes, LnkDisable- CommClk+
                        ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x1
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range B, TimeoutDis+ NROPrPrP- LTR+
                         10BitTagComp- 10BitTagReq- OBFF Via WAKE#, ExtFmt- EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS- TPHComp- ExtTPHComp-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 16ms to 55ms, TimeoutDis-
                         AtomicOpsCtl: ReqEn-
                         IDOReq- IDOCompl- LTR+ EmergencyPowerReductionReq-
                         10BitTagReq- OBFF Disabled, EETLPPrefixBlk-
                LnkCap2: Supported Link Speeds: 2.5-5GT/s, Crosslink- Retimer- 2Retimers- DRS-
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
                         EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [80] MSI-X: Enable+ Count=16 Masked-
                Vector table: BAR=0 offset=00002000
                PBA: BAR=0 offset=00003000
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP-
                        ECRC- UnsupReq- ACSViol- UncorrIntErr- BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
                        PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP-
                        ECRC- UnsupReq- ACSViol- UncorrIntErr- BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
                        PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+
                        ECRC- UnsupReq- ACSViol- UncorrIntErr+ BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
                        PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CorrIntErr- HeaderOF-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ CorrIntErr- HeaderOF-
                AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [14c v1] Latency Tolerance Reporting
                Max snoop latency: 3145728ns
                Max no snoop latency: 3145728ns
        Capabilities: [154 v1] L1 PM Substates
                L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
                          PortCommonModeRestoreTime=30us PortTPowerOnTime=18us
                L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
                           T_CommonMode=0us LTR1.2_Threshold=0ns
                L1SubCtl2: T_PwrOn=44us
        Kernel driver in use: iwlwifi
        Kernel modules: iwlwifi

Regards
	Stefan Lippers-Hollmann

  reply	other threads:[~2024-07-15  9:07 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-04 11:46 [PATCH v3] thermal: core: Call monitor_thermal_zone() if zone temperature is invalid Rafael J. Wysocki
2024-07-04 12:49 ` Daniel Lezcano
2024-07-04 12:52   ` Neil Armstrong
2024-07-04 14:23     ` Rafael J. Wysocki
2024-07-04 14:21   ` Rafael J. Wysocki
2024-07-04 16:53     ` Daniel Lezcano
2024-07-04 16:58       ` Rafael J. Wysocki
2024-07-15  4:45 ` Eric Biggers
2024-07-15  9:06   ` Stefan Lippers-Hollmann [this message]
2024-07-15 10:52     ` Rafael J. Wysocki
2024-07-15  9:09   ` Daniel Lezcano
2024-07-15 11:21     ` Rafael J. Wysocki
2024-07-15 12:54       ` Stefan Lippers-Hollmann
2024-07-15 14:48         ` Rafael J. Wysocki
2024-07-15 21:12           ` Eric Biggers
2024-07-15 23:48           ` Stefan Lippers-Hollmann
2024-07-16 10:05             ` Rafael J. Wysocki
2024-07-16 10:55               ` Stefan Lippers-Hollmann
2024-07-16 11:15                 ` Stefan Lippers-Hollmann
2024-07-16 11:36                   ` Rafael J. Wysocki
2024-07-16 12:10                     ` Daniel Lezcano
2024-07-16 12:18                       ` Rafael J. Wysocki
2024-07-16 12:30                     ` Rafael J. Wysocki
2024-07-16 13:20                       ` Stefan Lippers-Hollmann
2024-07-16 14:04                         ` Rafael J. Wysocki
2024-07-16 16:37                           ` Oleksandr Natalenko
2024-07-16 17:03                             ` Rafael J. Wysocki
2024-07-16 23:30                           ` Stefan Lippers-Hollmann
2024-07-16 11:19                 ` Rafael J. Wysocki
2024-07-15 10:49   ` Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240715110659.51b441e2@mir \
    --to=s.l-h@gmx.de \
    --cc=daniel.lezcano@linaro.org \
    --cc=ebiggers@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=lukasz.luba@arm.com \
    --cc=neil.armstrong@linaro.org \
    --cc=regressions@leemhuis.info \
    --cc=rjw@rjwysocki.net \
    --cc=rui.zhang@intel.com \
    --cc=srinivas.pandruvada@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.