* ACPI IRQ storm with 6.10
@ 2024-08-14 5:22 Jiri Slaby
2024-08-14 6:47 ` Jiri Slaby
0 siblings, 1 reply; 19+ messages in thread
From: Jiri Slaby @ 2024-08-14 5:22 UTC (permalink / raw)
To: Rafael J. Wysocki, Len Brown, linux-acpi@vger.kernel.org
Cc: Linux kernel mailing list, Linux regressions mailing list
Hi,
one openSUSE's user reported that with 6.10, he sees one CPU under an
IRQ storm from ACPI (sci_interrupt):
9: 20220768 ... IR-IO-APIC 9-fasteoi acpi
At:
https://bugzilla.suse.com/show_bug.cgi?id=1229085
6.9 was OK.
With acpi.debug_level=0x08000000 acpi.debug_layer=0xffffffff, there is a
repeated load of:
> event-0188 ev_fixed_event_detect : Fixed Event Block: Enable 00000020 Status 00000000
> evgpe-0673 ev_detect_gpe : Read registers for GPE 00: Status=00, Enable=00, RunEnable=02, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 01: Status=00, Enable=02, RunEnable=02, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 02: Status=00, Enable=00, RunEnable=02, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 03: Status=00, Enable=00, RunEnable=02, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 04: Status=00, Enable=00, RunEnable=02, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 05: Status=00, Enable=00, RunEnable=02, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 06: Status=00, Enable=00, RunEnable=02, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 07: Status=00, Enable=00, RunEnable=02, WakeEnable=00
> evgpe-0396 ev_gpe_detect : Ignore disabled registers for GPE 08-0F: RunEnable=00, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 10: Status=00, Enable=00, RunEnable=80, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 11: Status=00, Enable=00, RunEnable=80, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 12: Status=00, Enable=00, RunEnable=80, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 13: Status=00, Enable=00, RunEnable=80, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 14: Status=00, Enable=00, RunEnable=80, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 15: Status=00, Enable=00, RunEnable=80, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 16: Status=00, Enable=00, RunEnable=80, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 17: Status=00, Enable=80, RunEnable=80, WakeEnable=00
> evgpe-0396 ev_gpe_detect : Ignore disabled registers for GPE 18-1F: RunEnable=00, WakeEnable=00
> evgpe-0396 ev_gpe_detect : Ignore disabled registers for GPE 20-27: RunEnable=00, WakeEnable=00
> evgpe-0396 ev_gpe_detect : Ignore disabled registers for GPE 28-2F: RunEnable=00, WakeEnable=00
> evgpe-0396 ev_gpe_detect : Ignore disabled registers for GPE 30-37: RunEnable=00, WakeEnable=00
> evgpe-0396 ev_gpe_detect : Ignore disabled registers for GPE 38-3F: RunEnable=00, WakeEnable=00
> evgpe-0396 ev_gpe_detect : Ignore disabled registers for GPE 40-47: RunEnable=00, WakeEnable=00
> evgpe-0396 ev_gpe_detect : Ignore disabled registers for GPE 48-4F: RunEnable=00, WakeEnable=00
> evgpe-0396 ev_gpe_detect : Ignore disabled registers for GPE 50-57: RunEnable=00, WakeEnable=00
> evgpe-0396 ev_gpe_detect : Ignore disabled registers for GPE 58-5F: RunEnable=00, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 60: Status=00, Enable=00, RunEnable=06, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 61: Status=00, Enable=02, RunEnable=06, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 62: Status=00, Enable=04, RunEnable=06, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 63: Status=00, Enable=00, RunEnable=06, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 64: Status=00, Enable=00, RunEnable=06, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 65: Status=00, Enable=00, RunEnable=06, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 66: Status=00, Enable=00, RunEnable=06, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 67: Status=00, Enable=00, RunEnable=06, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 68: Status=00, Enable=00, RunEnable=4A, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 69: Status=00, Enable=02, RunEnable=4A, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 6A: Status=00, Enable=00, RunEnable=4A, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 6B: Status=00, Enable=08, RunEnable=4A, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 6C: Status=00, Enable=00, RunEnable=4A, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 6D: Status=20, Enable=00, RunEnable=4A, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 6E: Status=40, Enable=40, RunEnable=4A, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 6F: Status=00, Enable=00, RunEnable=4A, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 70: Status=00, Enable=00, RunEnable=0E, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 71: Status=00, Enable=02, RunEnable=0E, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 72: Status=00, Enable=04, RunEnable=0E, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 73: Status=00, Enable=08, RunEnable=0E, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 74: Status=00, Enable=00, RunEnable=0E, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 75: Status=00, Enable=00, RunEnable=0E, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 76: Status=00, Enable=00, RunEnable=0E, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 77: Status=00, Enable=00, RunEnable=0E, WakeEnable=00
> evgpe-0396 ev_gpe_detect : Ignore disabled registers for GPE 78-7F: RunEnable=00, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 80: Status=00, Enable=01, RunEnable=01, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 81: Status=00, Enable=00, RunEnable=01, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 82: Status=00, Enable=00, RunEnable=01, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 83: Status=00, Enable=00, RunEnable=01, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 84: Status=00, Enable=00, RunEnable=01, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 85: Status=00, Enable=00, RunEnable=01, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 86: Status=40, Enable=00, RunEnable=01, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 87: Status=00, Enable=00, RunEnable=01, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 88: Status=00, Enable=00, RunEnable=04, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 89: Status=00, Enable=00, RunEnable=04, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 8A: Status=00, Enable=04, RunEnable=04, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 8B: Status=00, Enable=00, RunEnable=04, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 8C: Status=00, Enable=00, RunEnable=04, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 8D: Status=00, Enable=00, RunEnable=04, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 8E: Status=00, Enable=00, RunEnable=04, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE 8F: Status=00, Enable=00, RunEnable=04, WakeEnable=00
> evgpe-0396 ev_gpe_detect : Ignore disabled registers for GPE 90-97: RunEnable=00, WakeEnable=00
> evgpe-0396 ev_gpe_detect : Ignore disabled registers for GPE 98-9F: RunEnable=00, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE A0: Status=00, Enable=01, RunEnable=87, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE A1: Status=00, Enable=02, RunEnable=87, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE A2: Status=00, Enable=04, RunEnable=87, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE A3: Status=00, Enable=00, RunEnable=87, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE A4: Status=00, Enable=00, RunEnable=87, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE A5: Status=00, Enable=00, RunEnable=87, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE A6: Status=00, Enable=00, RunEnable=87, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE A7: Status=00, Enable=80, RunEnable=87, WakeEnable=00
> evgpe-0396 ev_gpe_detect : Ignore disabled registers for GPE A8-AF: RunEnable=00, WakeEnable=00
> evgpe-0396 ev_gpe_detect : Ignore disabled registers for GPE B0-B7: RunEnable=00, WakeEnable=00
> evgpe-0396 ev_gpe_detect : Ignore disabled registers for GPE B8-BF: RunEnable=00, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE C0: Status=00, Enable=00, RunEnable=80, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE C1: Status=00, Enable=00, RunEnable=80, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE C2: Status=00, Enable=00, RunEnable=80, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE C3: Status=00, Enable=00, RunEnable=80, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE C4: Status=00, Enable=00, RunEnable=80, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE C5: Status=00, Enable=00, RunEnable=80, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE C6: Status=00, Enable=00, RunEnable=80, WakeEnable=00
> evgpe-0673 ev_detect_gpe : Read registers for GPE C7: Status=00, Enable=80, RunEnable=80, WakeEnable=00
> evgpe-0396 ev_gpe_detect : Ignore disabled registers for GPE C8-CF: RunEnable=00, WakeEnable=00
> evgpe-0396 ev_gpe_detect : Ignore disabled registers for GPE D0-D7: RunEnable=00, WakeEnable=00
> evgpe-0396 ev_gpe_detect : Ignore disabled registers for GPE D8-DF: RunEnable=00, WakeEnable=00
Longer (yet still trimmed due to pace of the messages) log:
https://bugzilla.suse.com/attachment.cgi?id=876664
Any ideas?
thanks,
--
js
suse labs
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: ACPI IRQ storm with 6.10 2024-08-14 5:22 ACPI IRQ storm with 6.10 Jiri Slaby @ 2024-08-14 6:47 ` Jiri Slaby 2024-08-16 18:29 ` Rafael J. Wysocki 0 siblings, 1 reply; 19+ messages in thread From: Jiri Slaby @ 2024-08-14 6:47 UTC (permalink / raw) To: Rafael J. Wysocki, Len Brown, linux-acpi@vger.kernel.org Cc: Linux kernel mailing list, Linux regressions mailing list On 14. 08. 24, 7:22, Jiri Slaby wrote: > Hi, > > one openSUSE's user reported that with 6.10, he sees one CPU under an > IRQ storm from ACPI (sci_interrupt): > 9: 20220768 ... IR-IO-APIC 9-fasteoi acpi > > At: > https://bugzilla.suse.com/show_bug.cgi?id=1229085 > > 6.9 was OK. > > With acpi.debug_level=0x08000000 acpi.debug_layer=0xffffffff, there is a > repeated load of: >> evgpe-0673 ev_detect_gpe : Read registers for GPE 6D: >> Status=20, Enable=00, RunEnable=4A, WakeEnable=00 0x6d seems to count excessively (10 snapshots every 1 second): > /sys/firmware/acpi/interrupts/gpe6D: 82066 EN STS enabled unmasked > /sys/firmware/acpi/interrupts/gpe6D: 86536 EN STS enabled unmasked > /sys/firmware/acpi/interrupts/gpe6D: 90990 STS enabled unmasked > /sys/firmware/acpi/interrupts/gpe6D: 95468 EN STS enabled unmasked > /sys/firmware/acpi/interrupts/gpe6D: 100282 EN STS enabled unmasked > /sys/firmware/acpi/interrupts/gpe6D: 105187 STS enabled unmasked > /sys/firmware/acpi/interrupts/gpe6D: 110014 STS enabled unmasked > /sys/firmware/acpi/interrupts/gpe6D: 114852 STS enabled unmasked > /sys/firmware/acpi/interrupts/gpe6D: 119682 STS enabled unmasked > /sys/firmware/acpi/interrupts/gpe6D: 124194 STS enabled unmasked > /sys/firmware/acpi/interrupts/gpe6D: 128641 EN STS enabled unmasked acpidump: https://bugzilla.suse.com/attachment.cgi?id=876677 DSDT: https://bugzilla.suse.com/attachment.cgi?id=876678 > Any ideas? > > thanks,-- js ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: ACPI IRQ storm with 6.10 2024-08-14 6:47 ` Jiri Slaby @ 2024-08-16 18:29 ` Rafael J. Wysocki 2024-08-16 21:36 ` Petr Valenta 2024-08-17 17:57 ` Petr Valenta 0 siblings, 2 replies; 19+ messages in thread From: Rafael J. Wysocki @ 2024-08-16 18:29 UTC (permalink / raw) To: Jiri Slaby Cc: Rafael J. Wysocki, Len Brown, linux-acpi@vger.kernel.org, Linux kernel mailing list, Linux regressions mailing list On Wed, Aug 14, 2024 at 8:48 AM Jiri Slaby <jirislaby@kernel.org> wrote: > > On 14. 08. 24, 7:22, Jiri Slaby wrote: > > Hi, > > > > one openSUSE's user reported that with 6.10, he sees one CPU under an > > IRQ storm from ACPI (sci_interrupt): > > 9: 20220768 ... IR-IO-APIC 9-fasteoi acpi > > > > At: > > https://bugzilla.suse.com/show_bug.cgi?id=1229085 > > > > 6.9 was OK. > > > > With acpi.debug_level=0x08000000 acpi.debug_layer=0xffffffff, there is a > > repeated load of: > >> evgpe-0673 ev_detect_gpe : Read registers for GPE 6D: > >> Status=20, Enable=00, RunEnable=4A, WakeEnable=00 > > 0x6d seems to count excessively (10 snapshots every 1 second): > > /sys/firmware/acpi/interrupts/gpe6D: 82066 EN STS enabled unmasked > > /sys/firmware/acpi/interrupts/gpe6D: 86536 EN STS enabled unmasked > > /sys/firmware/acpi/interrupts/gpe6D: 90990 STS enabled unmasked > > /sys/firmware/acpi/interrupts/gpe6D: 95468 EN STS enabled unmasked > > /sys/firmware/acpi/interrupts/gpe6D: 100282 EN STS enabled unmasked > > /sys/firmware/acpi/interrupts/gpe6D: 105187 STS enabled unmasked > > /sys/firmware/acpi/interrupts/gpe6D: 110014 STS enabled unmasked > > /sys/firmware/acpi/interrupts/gpe6D: 114852 STS enabled unmasked > > /sys/firmware/acpi/interrupts/gpe6D: 119682 STS enabled unmasked > > /sys/firmware/acpi/interrupts/gpe6D: 124194 STS enabled unmasked > > /sys/firmware/acpi/interrupts/gpe6D: 128641 EN STS enabled unmasked > > acpidump: > https://bugzilla.suse.com/attachment.cgi?id=876677 > > DSDT: > https://bugzilla.suse.com/attachment.cgi?id=876678 > > > Any ideas? GPE 6D is listed in _PRW for some devices, so maybe one of them continues to trigger wakeup events? You can ask the reporter to mask that GPE via "echo mask > /sys/firmware/acpi/interrupts/gpe6D" and see if the storm goes away then. The only ACPI core issue introduced between 6.9 and 6.10 I'm aware of is the one addressed by this series https://lore.kernel.org/linux-acpi/22385894.EfDdHjke4D@rjwysocki.net/ but this is about the EC and the problem here doesn't appear to be EC-related. It may be worth trying anyway, though. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: ACPI IRQ storm with 6.10 2024-08-16 18:29 ` Rafael J. Wysocki @ 2024-08-16 21:36 ` Petr Valenta 2024-08-17 17:57 ` Petr Valenta 1 sibling, 0 replies; 19+ messages in thread From: Petr Valenta @ 2024-08-16 21:36 UTC (permalink / raw) To: Rafael J. Wysocki, Jiri Slaby Cc: Len Brown, linux-acpi@vger.kernel.org, Linux kernel mailing list, Linux regressions mailing list Dne 16. 08. 24 v 20:29 Rafael J. Wysocki napsal(a): > On Wed, Aug 14, 2024 at 8:48 AM Jiri Slaby <jirislaby@kernel.org> wrote: >> >> On 14. 08. 24, 7:22, Jiri Slaby wrote: >>> Hi, >>> >>> one openSUSE's user reported that with 6.10, he sees one CPU under an >>> IRQ storm from ACPI (sci_interrupt): >>> 9: 20220768 ... IR-IO-APIC 9-fasteoi acpi >>> >>> At: >>> https://bugzilla.suse.com/show_bug.cgi?id=1229085 >>> >>> 6.9 was OK. >>> >>> With acpi.debug_level=0x08000000 acpi.debug_layer=0xffffffff, there is a >>> repeated load of: >>>> evgpe-0673 ev_detect_gpe : Read registers for GPE 6D: >>>> Status=20, Enable=00, RunEnable=4A, WakeEnable=00 >> >> 0x6d seems to count excessively (10 snapshots every 1 second): >>> /sys/firmware/acpi/interrupts/gpe6D: 82066 EN STS enabled unmasked >>> /sys/firmware/acpi/interrupts/gpe6D: 86536 EN STS enabled unmasked >>> /sys/firmware/acpi/interrupts/gpe6D: 90990 STS enabled unmasked >>> /sys/firmware/acpi/interrupts/gpe6D: 95468 EN STS enabled unmasked >>> /sys/firmware/acpi/interrupts/gpe6D: 100282 EN STS enabled unmasked >>> /sys/firmware/acpi/interrupts/gpe6D: 105187 STS enabled unmasked >>> /sys/firmware/acpi/interrupts/gpe6D: 110014 STS enabled unmasked >>> /sys/firmware/acpi/interrupts/gpe6D: 114852 STS enabled unmasked >>> /sys/firmware/acpi/interrupts/gpe6D: 119682 STS enabled unmasked >>> /sys/firmware/acpi/interrupts/gpe6D: 124194 STS enabled unmasked >>> /sys/firmware/acpi/interrupts/gpe6D: 128641 EN STS enabled unmasked >> >> acpidump: >> https://bugzilla.suse.com/attachment.cgi?id=876677 >> >> DSDT: >> https://bugzilla.suse.com/attachment.cgi?id=876678 >> >>> Any ideas? > > GPE 6D is listed in _PRW for some devices, so maybe one of them > continues to trigger wakeup events? > > You can ask the reporter to mask that GPE via "echo mask > > /sys/firmware/acpi/interrupts/gpe6D" and see if the storm goes away > then. > It works, thank you. High CPU usage by irq/9-acpi is immediately gone. > The only ACPI core issue introduced between 6.9 and 6.10 I'm aware of > is the one addressed by this series > > https://lore.kernel.org/linux-acpi/22385894.EfDdHjke4D@rjwysocki.net/ > > but this is about the EC and the problem here doesn't appear to be > EC-related. It may be worth trying anyway, though. > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: ACPI IRQ storm with 6.10 2024-08-16 18:29 ` Rafael J. Wysocki 2024-08-16 21:36 ` Petr Valenta @ 2024-08-17 17:57 ` Petr Valenta 2024-08-19 4:50 ` Jiri Slaby 1 sibling, 1 reply; 19+ messages in thread From: Petr Valenta @ 2024-08-17 17:57 UTC (permalink / raw) To: Rafael J. Wysocki, Jiri Slaby Cc: Len Brown, linux-acpi@vger.kernel.org, Linux kernel mailing list, Linux regressions mailing list Dne 16. 08. 24 v 20:29 Rafael J. Wysocki napsal(a): > On Wed, Aug 14, 2024 at 8:48 AM Jiri Slaby <jirislaby@kernel.org> wrote: >> >> On 14. 08. 24, 7:22, Jiri Slaby wrote: >>> Hi, >>> >>> one openSUSE's user reported that with 6.10, he sees one CPU under an >>> IRQ storm from ACPI (sci_interrupt): >>> 9: 20220768 ... IR-IO-APIC 9-fasteoi acpi >>> >>> At: >>> https://bugzilla.suse.com/show_bug.cgi?id=1229085 >>> >>> 6.9 was OK. >>> >>> With acpi.debug_level=0x08000000 acpi.debug_layer=0xffffffff, there is a >>> repeated load of: >>>> evgpe-0673 ev_detect_gpe : Read registers for GPE 6D: >>>> Status=20, Enable=00, RunEnable=4A, WakeEnable=00 >> >> 0x6d seems to count excessively (10 snapshots every 1 second): >>> /sys/firmware/acpi/interrupts/gpe6D: 82066 EN STS enabled unmasked >>> /sys/firmware/acpi/interrupts/gpe6D: 86536 EN STS enabled unmasked >>> /sys/firmware/acpi/interrupts/gpe6D: 90990 STS enabled unmasked >>> /sys/firmware/acpi/interrupts/gpe6D: 95468 EN STS enabled unmasked >>> /sys/firmware/acpi/interrupts/gpe6D: 100282 EN STS enabled unmasked >>> /sys/firmware/acpi/interrupts/gpe6D: 105187 STS enabled unmasked >>> /sys/firmware/acpi/interrupts/gpe6D: 110014 STS enabled unmasked >>> /sys/firmware/acpi/interrupts/gpe6D: 114852 STS enabled unmasked >>> /sys/firmware/acpi/interrupts/gpe6D: 119682 STS enabled unmasked >>> /sys/firmware/acpi/interrupts/gpe6D: 124194 STS enabled unmasked >>> /sys/firmware/acpi/interrupts/gpe6D: 128641 EN STS enabled unmasked >> >> acpidump: >> https://bugzilla.suse.com/attachment.cgi?id=876677 >> >> DSDT: >> https://bugzilla.suse.com/attachment.cgi?id=876678 >> >>> Any ideas? > > GPE 6D is listed in _PRW for some devices, so maybe one of them > continues to trigger wakeup events? > Disabling powertop service (which calls /usr/sbin/powertop --auto-tune) solves problem completely. After some search I have found this is the cause: # causes IRQ storm on 6.10.x # kernel 6.9.9 is immune echo 'auto' > /sys/bus/pci/devices/0000:00:1f.6/power/control lspci | grep 1f.6 00:1f.6 Ethernet controller: Intel Corporation Device 550b (rev 20) journalctl -b | grep 1f.6 srp 17 19:44:17 e14 kernel: pci 0000:00:1f.6: [8086:550b] type 00 class 0x020000 conventional PCI endpoint srp 17 19:44:17 e14 kernel: pci 0000:00:1f.6: BAR 0 [mem 0x9c300000-0x9c31ffff] srp 17 19:44:17 e14 kernel: pci 0000:00:1f.6: PME# supported from D0 D3hot D3cold srp 17 19:44:17 e14 kernel: pci 0000:00:1f.6: Adding to iommu group 12 srp 17 19:44:19 e14 kernel: e1000e 0000:00:1f.6: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode srp 17 19:44:19 e14 kernel: e1000e 0000:00:1f.6 0000:00:1f.6 (uninitialized): registered PHC clock srp 17 19:44:20 e14 kernel: e1000e 0000:00:1f.6 eth0: (PCI Express:2.5GT/s:Width x1) fc:5c:ee:b0:13:74 srp 17 19:44:20 e14 kernel: e1000e 0000:00:1f.6 eth0: Intel(R) PRO/1000 Network Connection srp 17 19:44:20 e14 kernel: e1000e 0000:00:1f.6 eth0: MAC: 16, PHY: 12, PBA No: FFFFFF-0FF srp 17 19:44:20 e14 kernel: e1000e 0000:00:1f.6 enp0s31f6: renamed from eth0 srp 17 19:44:24 e14 ModemManager[1434]: <info> [base-manager] couldn't check support for device '/sys/devices/pci0000:00/0000:00:1f.6': not supported by any plugin > You can ask the reporter to mask that GPE via "echo mask > > /sys/firmware/acpi/interrupts/gpe6D" and see if the storm goes away > then. > > The only ACPI core issue introduced between 6.9 and 6.10 I'm aware of > is the one addressed by this series > > https://lore.kernel.org/linux-acpi/22385894.EfDdHjke4D@rjwysocki.net/ > > but this is about the EC and the problem here doesn't appear to be > EC-related. It may be worth trying anyway, though. > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: ACPI IRQ storm with 6.10 2024-08-17 17:57 ` Petr Valenta @ 2024-08-19 4:50 ` Jiri Slaby 2024-08-19 5:23 ` Jiri Slaby 0 siblings, 1 reply; 19+ messages in thread From: Jiri Slaby @ 2024-08-19 4:50 UTC (permalink / raw) To: Petr Valenta, Rafael J. Wysocki Cc: Len Brown, linux-acpi@vger.kernel.org, Linux kernel mailing list, Linux regressions mailing list, Tony Nguyen, przemyslaw.kitszel, intel-wired-lan, jesse.brandeburg CC e1000e guys + Jesse (due to 75a3f93b5383) + Bjorn (due to b2c289415b2b) On 17. 08. 24, 19:57, Petr Valenta wrote: > > > Dne 16. 08. 24 v 20:29 Rafael J. Wysocki napsal(a): >> On Wed, Aug 14, 2024 at 8:48 AM Jiri Slaby <jirislaby@kernel.org> wrote: >>> >>> On 14. 08. 24, 7:22, Jiri Slaby wrote: >>>> Hi, >>>> >>>> one openSUSE's user reported that with 6.10, he sees one CPU under an >>>> IRQ storm from ACPI (sci_interrupt): >>>> 9: 20220768 ... IR-IO-APIC 9-fasteoi acpi >>>> >>>> At: >>>> https://bugzilla.suse.com/show_bug.cgi?id=1229085 >>>> >>>> 6.9 was OK. >>>> >>>> With acpi.debug_level=0x08000000 acpi.debug_layer=0xffffffff, there >>>> is a >>>> repeated load of: >>>>> evgpe-0673 ev_detect_gpe : Read registers for GPE 6D: >>>>> Status=20, Enable=00, RunEnable=4A, WakeEnable=00 >>> >>> 0x6d seems to count excessively (10 snapshots every 1 second): >>>> /sys/firmware/acpi/interrupts/gpe6D: 82066 EN STS enabled >>>> unmasked >>>> /sys/firmware/acpi/interrupts/gpe6D: 86536 EN STS enabled >>>> unmasked >>>> /sys/firmware/acpi/interrupts/gpe6D: 90990 STS enabled >>>> unmasked >>>> /sys/firmware/acpi/interrupts/gpe6D: 95468 EN STS enabled >>>> unmasked >>>> /sys/firmware/acpi/interrupts/gpe6D: 100282 EN STS enabled >>>> unmasked >>>> /sys/firmware/acpi/interrupts/gpe6D: 105187 STS enabled >>>> unmasked >>>> /sys/firmware/acpi/interrupts/gpe6D: 110014 STS enabled >>>> unmasked >>>> /sys/firmware/acpi/interrupts/gpe6D: 114852 STS enabled >>>> unmasked >>>> /sys/firmware/acpi/interrupts/gpe6D: 119682 STS enabled >>>> unmasked >>>> /sys/firmware/acpi/interrupts/gpe6D: 124194 STS enabled >>>> unmasked >>>> /sys/firmware/acpi/interrupts/gpe6D: 128641 EN STS enabled >>>> unmasked >>> >>> acpidump: >>> https://bugzilla.suse.com/attachment.cgi?id=876677 >>> >>> DSDT: >>> https://bugzilla.suse.com/attachment.cgi?id=876678 >>> >>>> Any ideas? >> >> GPE 6D is listed in _PRW for some devices, so maybe one of them >> continues to trigger wakeup events? >> > > Disabling powertop service (which calls /usr/sbin/powertop --auto-tune) > solves problem completely. After some search I have found this is the > cause: > > # causes IRQ storm on 6.10.x > # kernel 6.9.9 is immune > echo 'auto' > /sys/bus/pci/devices/0000:00:1f.6/power/control $ git log --no-merges --oneline v6.9..v6.10 drivers/net/ethernet/intel/e1000e/ 76a0a3f9cc2f e1000e: fix force smbus during suspend flow c93a6f62cb1b e1000e: Fix S0ix residency on corporate systems bfd546a552e1 e1000e: move force SMBUS near the end of enable_ulp function 6918107e2540 net: e1000e & ixgbe: Remove PCI_HEADER_TYPE_MFD duplicates 1eb2cded45b3 net: annotate writes on dev->mtu from ndo_change_mtu() b2c289415b2b e1000e: Remove redundant runtime resume for ethtool_ops 75a3f93b5383 net: intel: implement modern PM ops declarations The last two play with PM ^^. I cannot immediately see if the issue can be caused by any of those, though. If there are no ideas, possibly giving revert of both a try? > lspci | grep 1f.6 > 00:1f.6 Ethernet controller: Intel Corporation Device 550b (rev 20) > > journalctl -b | grep 1f.6 > srp 17 19:44:17 e14 kernel: pci 0000:00:1f.6: [8086:550b] type 00 class > 0x020000 conventional PCI endpoint > srp 17 19:44:17 e14 kernel: pci 0000:00:1f.6: BAR 0 [mem > 0x9c300000-0x9c31ffff] > srp 17 19:44:17 e14 kernel: pci 0000:00:1f.6: PME# supported from D0 > D3hot D3cold > srp 17 19:44:17 e14 kernel: pci 0000:00:1f.6: Adding to iommu group 12 > srp 17 19:44:19 e14 kernel: e1000e 0000:00:1f.6: Interrupt Throttling > Rate (ints/sec) set to dynamic conservative mode > srp 17 19:44:19 e14 kernel: e1000e 0000:00:1f.6 0000:00:1f.6 > (uninitialized): registered PHC clock > srp 17 19:44:20 e14 kernel: e1000e 0000:00:1f.6 eth0: (PCI > Express:2.5GT/s:Width x1) fc:5c:ee:b0:13:74 > srp 17 19:44:20 e14 kernel: e1000e 0000:00:1f.6 eth0: Intel(R) PRO/1000 > Network Connection > srp 17 19:44:20 e14 kernel: e1000e 0000:00:1f.6 eth0: MAC: 16, PHY: 12, > PBA No: FFFFFF-0FF > srp 17 19:44:20 e14 kernel: e1000e 0000:00:1f.6 enp0s31f6: renamed from > eth0 > srp 17 19:44:24 e14 ModemManager[1434]: <info> [base-manager] couldn't > check support for device '/sys/devices/pci0000:00/0000:00:1f.6': not > supported by any plugin > > > >> You can ask the reporter to mask that GPE via "echo mask > >> /sys/firmware/acpi/interrupts/gpe6D" and see if the storm goes away >> then. >> >> The only ACPI core issue introduced between 6.9 and 6.10 I'm aware of >> is the one addressed by this series >> >> https://lore.kernel.org/linux-acpi/22385894.EfDdHjke4D@rjwysocki.net/ >> >> but this is about the EC and the problem here doesn't appear to be >> EC-related. It may be worth trying anyway, though. >> -- js suse labs ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: ACPI IRQ storm with 6.10 2024-08-19 4:50 ` Jiri Slaby @ 2024-08-19 5:23 ` Jiri Slaby 2024-08-19 16:47 ` Bjorn Helgaas ` (2 more replies) 0 siblings, 3 replies; 19+ messages in thread From: Jiri Slaby @ 2024-08-19 5:23 UTC (permalink / raw) To: Bjorn Helgaas Cc: Len Brown, linux-acpi@vger.kernel.org, Linux kernel mailing list, Linux regressions mailing list, Tony Nguyen, przemyslaw.kitszel, intel-wired-lan, jesse.brandeburg, Rafael J. Wysocki, Petr Valenta On 19. 08. 24, 6:50, Jiri Slaby wrote: > CC e1000e guys + Jesse (due to 75a3f93b5383) + Bjorn (due to b2c289415b2b) Bjorn, I am confused by these changes: ========================================== @@ -291,16 +288,13 @@ static int e1000_set_link_ksettings(struct net_device *net dev, * duplex is forced. */ if (cmd->base.eth_tp_mdix_ctrl) { - if (hw->phy.media_type != e1000_media_type_copper) { - ret_val = -EOPNOTSUPP; - goto out; - } + if (hw->phy.media_type != e1000_media_type_copper) + return -EOPNOTSUPP; if ((cmd->base.eth_tp_mdix_ctrl != ETH_TP_MDI_AUTO) && (cmd->base.autoneg != AUTONEG_ENABLE)) { e_err("forcing MDI/MDI-X state is not supported when lin k speed and/or duplex are forced\n"); - ret_val = -EINVAL; - goto out; + return -EINVAL; } } @@ -347,7 +341,6 @@ static int e1000_set_link_ksettings(struct net_device *netde v, } out: - pm_runtime_put_sync(netdev->dev.parent); clear_bit(__E1000_RESETTING, &adapter->state); return ret_val; } ========================================== So no more clear_bit(__E1000_RESETTING in the above fail paths. Is that intentional? thanks, -- js suse labs ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: ACPI IRQ storm with 6.10 2024-08-19 5:23 ` Jiri Slaby @ 2024-08-19 16:47 ` Bjorn Helgaas 2024-08-20 18:09 ` Bjorn Helgaas 2024-08-20 18:44 ` Bjorn Helgaas 2 siblings, 0 replies; 19+ messages in thread From: Bjorn Helgaas @ 2024-08-19 16:47 UTC (permalink / raw) To: Jiri Slaby Cc: Bjorn Helgaas, Len Brown, linux-acpi@vger.kernel.org, Linux kernel mailing list, Linux regressions mailing list, Tony Nguyen, przemyslaw.kitszel, intel-wired-lan, jesse.brandeburg, Rafael J. Wysocki, Petr Valenta On Mon, Aug 19, 2024 at 07:23:42AM +0200, Jiri Slaby wrote: > On 19. 08. 24, 6:50, Jiri Slaby wrote: > > CC e1000e guys + Jesse (due to 75a3f93b5383) + Bjorn (due to b2c289415b2b) > > Bjorn, > > I am confused by these changes: > ========================================== > @@ -291,16 +288,13 @@ static int e1000_set_link_ksettings(struct net_device > *net > dev, > * duplex is forced. > */ > if (cmd->base.eth_tp_mdix_ctrl) { > - if (hw->phy.media_type != e1000_media_type_copper) { > - ret_val = -EOPNOTSUPP; > - goto out; > - } > + if (hw->phy.media_type != e1000_media_type_copper) > + return -EOPNOTSUPP; > > if ((cmd->base.eth_tp_mdix_ctrl != ETH_TP_MDI_AUTO) && > (cmd->base.autoneg != AUTONEG_ENABLE)) { > e_err("forcing MDI/MDI-X state is not supported when > lin > k speed and/or duplex are forced\n"); > - ret_val = -EINVAL; > - goto out; > + return -EINVAL; > } > } > > @@ -347,7 +341,6 @@ static int e1000_set_link_ksettings(struct net_device > *netde > v, > } > > out: > - pm_runtime_put_sync(netdev->dev.parent); > clear_bit(__E1000_RESETTING, &adapter->state); > return ret_val; > } > ========================================== > > So no more clear_bit(__E1000_RESETTING in the above fail paths. Is that > intentional? No, not intentional, looks like I just blew it, sorry. Will post a fix soon. Thanks a lot for debugging this. Bjorn ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: ACPI IRQ storm with 6.10 2024-08-19 5:23 ` Jiri Slaby 2024-08-19 16:47 ` Bjorn Helgaas @ 2024-08-20 18:09 ` Bjorn Helgaas 2024-08-20 21:13 ` Petr Valenta 2024-08-20 18:44 ` Bjorn Helgaas 2 siblings, 1 reply; 19+ messages in thread From: Bjorn Helgaas @ 2024-08-20 18:09 UTC (permalink / raw) To: Jiri Slaby, Petr Valenta Cc: Bjorn Helgaas, Len Brown, linux-acpi@vger.kernel.org, Linux kernel mailing list, Linux regressions mailing list, Tony Nguyen, przemyslaw.kitszel, intel-wired-lan, Rafael J. Wysocki [+to Petr, -cc Jesse, bouncing] On Mon, Aug 19, 2024 at 07:23:42AM +0200, Jiri Slaby wrote: > On 19. 08. 24, 6:50, Jiri Slaby wrote: > > CC e1000e guys + Jesse (due to 75a3f93b5383) + Bjorn (due to b2c289415b2b) > > Bjorn, > > I am confused by these changes: > ========================================== > @@ -291,16 +288,13 @@ static int e1000_set_link_ksettings(struct net_device > *net > dev, > * duplex is forced. > */ > if (cmd->base.eth_tp_mdix_ctrl) { > - if (hw->phy.media_type != e1000_media_type_copper) { > - ret_val = -EOPNOTSUPP; > - goto out; > - } > + if (hw->phy.media_type != e1000_media_type_copper) > + return -EOPNOTSUPP; > > if ((cmd->base.eth_tp_mdix_ctrl != ETH_TP_MDI_AUTO) && > (cmd->base.autoneg != AUTONEG_ENABLE)) { > e_err("forcing MDI/MDI-X state is not supported when > lin > k speed and/or duplex are forced\n"); > - ret_val = -EINVAL; > - goto out; > + return -EINVAL; > } > } > > @@ -347,7 +341,6 @@ static int e1000_set_link_ksettings(struct net_device > *netde > v, > } > > out: > - pm_runtime_put_sync(netdev->dev.parent); > clear_bit(__E1000_RESETTING, &adapter->state); > return ret_val; > } > ========================================== > > So no more clear_bit(__E1000_RESETTING in the above fail paths. Is that > intentional? Not intentional. Petr, do you have the ability to test the patch below? I'm not sure it's the correct fix, but it reverts the pieces of b2c289415b2b that Jiri pointed out. diff --git a/drivers/net/ethernet/intel/e1000e/ethtool.c b/drivers/net/ethernet/intel/e1000e/ethtool.c index 9364bc2b4eb1..9db36ee71684 100644 --- a/drivers/net/ethernet/intel/e1000e/ethtool.c +++ b/drivers/net/ethernet/intel/e1000e/ethtool.c @@ -280,7 +280,8 @@ static int e1000_set_link_ksettings(struct net_device *netdev, if (hw->phy.ops.check_reset_block && hw->phy.ops.check_reset_block(hw)) { e_err("Cannot change link characteristics when SoL/IDER is active.\n"); - return -EINVAL; + ret_val = -EINVAL; + goto out; } /* MDI setting is only allowed when autoneg enabled because @@ -288,13 +289,16 @@ static int e1000_set_link_ksettings(struct net_device *netdev, * duplex is forced. */ if (cmd->base.eth_tp_mdix_ctrl) { - if (hw->phy.media_type != e1000_media_type_copper) - return -EOPNOTSUPP; + if (hw->phy.media_type != e1000_media_type_copper) { + ret_val = -EOPNOTSUPP; + goto out; + } if ((cmd->base.eth_tp_mdix_ctrl != ETH_TP_MDI_AUTO) && (cmd->base.autoneg != AUTONEG_ENABLE)) { e_err("forcing MDI/MDI-X state is not supported when link speed and/or duplex are forced\n"); - return -EINVAL; + ret_val = -EINVAL; + goto out; } } ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: ACPI IRQ storm with 6.10 2024-08-20 18:09 ` Bjorn Helgaas @ 2024-08-20 21:13 ` Petr Valenta 2024-08-20 21:30 ` Bjorn Helgaas 0 siblings, 1 reply; 19+ messages in thread From: Petr Valenta @ 2024-08-20 21:13 UTC (permalink / raw) To: Bjorn Helgaas, Jiri Slaby Cc: Bjorn Helgaas, Len Brown, linux-acpi@vger.kernel.org, Linux kernel mailing list, Linux regressions mailing list, Tony Nguyen, przemyslaw.kitszel, intel-wired-lan, Rafael J. Wysocki Dne 20. 08. 24 v 20:09 Bjorn Helgaas napsal(a): > [+to Petr, -cc Jesse, bouncing] > > On Mon, Aug 19, 2024 at 07:23:42AM +0200, Jiri Slaby wrote: >> On 19. 08. 24, 6:50, Jiri Slaby wrote: >>> CC e1000e guys + Jesse (due to 75a3f93b5383) + Bjorn (due to b2c289415b2b) >> >> Bjorn, >> >> I am confused by these changes: >> ========================================== >> @@ -291,16 +288,13 @@ static int e1000_set_link_ksettings(struct net_device >> *net >> dev, >> * duplex is forced. >> */ >> if (cmd->base.eth_tp_mdix_ctrl) { >> - if (hw->phy.media_type != e1000_media_type_copper) { >> - ret_val = -EOPNOTSUPP; >> - goto out; >> - } >> + if (hw->phy.media_type != e1000_media_type_copper) >> + return -EOPNOTSUPP; >> >> if ((cmd->base.eth_tp_mdix_ctrl != ETH_TP_MDI_AUTO) && >> (cmd->base.autoneg != AUTONEG_ENABLE)) { >> e_err("forcing MDI/MDI-X state is not supported when >> lin >> k speed and/or duplex are forced\n"); >> - ret_val = -EINVAL; >> - goto out; >> + return -EINVAL; >> } >> } >> >> @@ -347,7 +341,6 @@ static int e1000_set_link_ksettings(struct net_device >> *netde >> v, >> } >> >> out: >> - pm_runtime_put_sync(netdev->dev.parent); >> clear_bit(__E1000_RESETTING, &adapter->state); >> return ret_val; >> } >> ========================================== >> >> So no more clear_bit(__E1000_RESETTING in the above fail paths. Is that >> intentional? > > Not intentional. Petr, do you have the ability to test the patch > below? I'm not sure it's the correct fix, but it reverts the pieces > of b2c289415b2b that Jiri pointed out. > I tested the patch below but it didn't help. After the first boot with new kernel it looked promising as the irq storm only appeared for a few seconds, but with subsequent reboots it was the same as without the patch. To be sure, I also send the md5sum of ethtool.c after applying the patch: a25c003257538f16994b4d7c4260e894 ethtool.c > diff --git a/drivers/net/ethernet/intel/e1000e/ethtool.c b/drivers/net/ethernet/intel/e1000e/ethtool.c > index 9364bc2b4eb1..9db36ee71684 100644 > --- a/drivers/net/ethernet/intel/e1000e/ethtool.c > +++ b/drivers/net/ethernet/intel/e1000e/ethtool.c > @@ -280,7 +280,8 @@ static int e1000_set_link_ksettings(struct net_device *netdev, > if (hw->phy.ops.check_reset_block && > hw->phy.ops.check_reset_block(hw)) { > e_err("Cannot change link characteristics when SoL/IDER is active.\n"); > - return -EINVAL; > + ret_val = -EINVAL; > + goto out; > } > > /* MDI setting is only allowed when autoneg enabled because > @@ -288,13 +289,16 @@ static int e1000_set_link_ksettings(struct net_device *netdev, > * duplex is forced. > */ > if (cmd->base.eth_tp_mdix_ctrl) { > - if (hw->phy.media_type != e1000_media_type_copper) > - return -EOPNOTSUPP; > + if (hw->phy.media_type != e1000_media_type_copper) { > + ret_val = -EOPNOTSUPP; > + goto out; > + } > > if ((cmd->base.eth_tp_mdix_ctrl != ETH_TP_MDI_AUTO) && > (cmd->base.autoneg != AUTONEG_ENABLE)) { > e_err("forcing MDI/MDI-X state is not supported when link speed and/or duplex are forced\n"); > - return -EINVAL; > + ret_val = -EINVAL; > + goto out; > } > } > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: ACPI IRQ storm with 6.10 2024-08-20 21:13 ` Petr Valenta @ 2024-08-20 21:30 ` Bjorn Helgaas 2024-08-21 5:09 ` Jiri Slaby 2024-08-21 11:39 ` Petr Valenta 0 siblings, 2 replies; 19+ messages in thread From: Bjorn Helgaas @ 2024-08-20 21:30 UTC (permalink / raw) To: Petr Valenta Cc: Jiri Slaby, Bjorn Helgaas, Len Brown, linux-acpi@vger.kernel.org, Linux kernel mailing list, Linux regressions mailing list, Tony Nguyen, przemyslaw.kitszel, intel-wired-lan, Rafael J. Wysocki On Tue, Aug 20, 2024 at 11:13:54PM +0200, Petr Valenta wrote: > Dne 20. 08. 24 v 20:09 Bjorn Helgaas napsal(a): > > On Mon, Aug 19, 2024 at 07:23:42AM +0200, Jiri Slaby wrote: > > > On 19. 08. 24, 6:50, Jiri Slaby wrote: > > > > CC e1000e guys + Jesse (due to 75a3f93b5383) + Bjorn (due to b2c289415b2b) > > > > > > Bjorn, > > > > > > I am confused by these changes: > > > ========================================== > > > @@ -291,16 +288,13 @@ static int e1000_set_link_ksettings(struct net_device > > > *net > > > dev, > > > * duplex is forced. > > > */ > > > if (cmd->base.eth_tp_mdix_ctrl) { > > > - if (hw->phy.media_type != e1000_media_type_copper) { > > > - ret_val = -EOPNOTSUPP; > > > - goto out; > > > - } > > > + if (hw->phy.media_type != e1000_media_type_copper) > > > + return -EOPNOTSUPP; > > > > > > if ((cmd->base.eth_tp_mdix_ctrl != ETH_TP_MDI_AUTO) && > > > (cmd->base.autoneg != AUTONEG_ENABLE)) { > > > e_err("forcing MDI/MDI-X state is not supported when > > > lin > > > k speed and/or duplex are forced\n"); > > > - ret_val = -EINVAL; > > > - goto out; > > > + return -EINVAL; > > > } > > > } > > > > > > @@ -347,7 +341,6 @@ static int e1000_set_link_ksettings(struct net_device > > > *netde > > > v, > > > } > > > > > > out: > > > - pm_runtime_put_sync(netdev->dev.parent); > > > clear_bit(__E1000_RESETTING, &adapter->state); > > > return ret_val; > > > } > > > ========================================== > > > > > > So no more clear_bit(__E1000_RESETTING in the above fail paths. Is that > > > intentional? > > > > Not intentional. Petr, do you have the ability to test the patch > > below? I'm not sure it's the correct fix, but it reverts the pieces > > of b2c289415b2b that Jiri pointed out. > > I tested the patch below but it didn't help. After the first boot with new > kernel it looked promising as the irq storm only appeared for a few seconds, > but with subsequent reboots it was the same as without the patch. Thank you very much for testing that! > To be sure, I also send the md5sum of ethtool.c after applying the patch: > > a25c003257538f16994b4d7c4260e894 ethtool.c Thanks, that matches what I get when applying the patch on v6.10. I'm at a loss. You could try reverting the entire b2c289415b2b commit (patch for that is below). If that doesn't help, I guess you could try reverting the other commits Jiri mentioned: 76a0a3f9cc2f e1000e: fix force smbus during suspend flow c93a6f62cb1b e1000e: Fix S0ix residency on corporate systems bfd546a552e1 e1000e: move force SMBUS near the end of enable_ulp function 6918107e2540 net: e1000e & ixgbe: Remove PCI_HEADER_TYPE_MFD duplicates 1eb2cded45b3 net: annotate writes on dev->mtu from ndo_change_mtu() b2c289415b2b e1000e: Remove redundant runtime resume for ethtool_ops 75a3f93b5383 net: intel: implement modern PM ops declarations If you do this, I would revert 76a0a3f9cc2f, test, then revert c93a6f62cb1b in addition, test, then revert bfd546a552e1 in addition, etc. commit 5e92945ffe5c ("Revert "e1000e: Remove redundant runtime resume for ethtool_ops"") Author: Bjorn Helgaas <bhelgaas@google.com> Date: Tue Aug 20 16:18:32 2024 -0500 Revert "e1000e: Remove redundant runtime resume for ethtool_ops" This reverts commit b2c289415b2b2ef112b78d5e73b4acecf5db409e. diff --git a/drivers/net/ethernet/intel/e1000e/ethtool.c b/drivers/net/ethernet/intel/e1000e/ethtool.c index 9364bc2b4eb1..61fa2f6b3708 100644 --- a/drivers/net/ethernet/intel/e1000e/ethtool.c +++ b/drivers/net/ethernet/intel/e1000e/ethtool.c @@ -156,7 +156,7 @@ static int e1000_get_link_ksettings(struct net_device *netdev, speed = adapter->link_speed; cmd->base.duplex = adapter->link_duplex - 1; } - } else { + } else if (!pm_runtime_suspended(netdev->dev.parent)) { u32 status = er32(STATUS); if (status & E1000_STATUS_LU) { @@ -274,13 +274,16 @@ static int e1000_set_link_ksettings(struct net_device *netdev, ethtool_convert_link_mode_to_legacy_u32(&advertising, cmd->link_modes.advertising); + pm_runtime_get_sync(netdev->dev.parent); + /* When SoL/IDER sessions are active, autoneg/speed/duplex * cannot be changed */ if (hw->phy.ops.check_reset_block && hw->phy.ops.check_reset_block(hw)) { e_err("Cannot change link characteristics when SoL/IDER is active.\n"); - return -EINVAL; + ret_val = -EINVAL; + goto out; } /* MDI setting is only allowed when autoneg enabled because @@ -288,13 +291,16 @@ static int e1000_set_link_ksettings(struct net_device *netdev, * duplex is forced. */ if (cmd->base.eth_tp_mdix_ctrl) { - if (hw->phy.media_type != e1000_media_type_copper) - return -EOPNOTSUPP; + if (hw->phy.media_type != e1000_media_type_copper) { + ret_val = -EOPNOTSUPP; + goto out; + } if ((cmd->base.eth_tp_mdix_ctrl != ETH_TP_MDI_AUTO) && (cmd->base.autoneg != AUTONEG_ENABLE)) { e_err("forcing MDI/MDI-X state is not supported when link speed and/or duplex are forced\n"); - return -EINVAL; + ret_val = -EINVAL; + goto out; } } @@ -341,6 +347,7 @@ static int e1000_set_link_ksettings(struct net_device *netdev, } out: + pm_runtime_put_sync(netdev->dev.parent); clear_bit(__E1000_RESETTING, &adapter->state); return ret_val; } @@ -376,6 +383,8 @@ static int e1000_set_pauseparam(struct net_device *netdev, while (test_and_set_bit(__E1000_RESETTING, &adapter->state)) usleep_range(1000, 2000); + pm_runtime_get_sync(netdev->dev.parent); + if (adapter->fc_autoneg == AUTONEG_ENABLE) { hw->fc.requested_mode = e1000_fc_default; if (netif_running(adapter->netdev)) { @@ -408,6 +417,7 @@ static int e1000_set_pauseparam(struct net_device *netdev, } out: + pm_runtime_put_sync(netdev->dev.parent); clear_bit(__E1000_RESETTING, &adapter->state); return retval; } @@ -438,6 +448,8 @@ static void e1000_get_regs(struct net_device *netdev, u32 *regs_buff = p; u16 phy_data; + pm_runtime_get_sync(netdev->dev.parent); + memset(p, 0, E1000_REGS_LEN * sizeof(u32)); regs->version = (1u << 24) | @@ -483,6 +495,8 @@ static void e1000_get_regs(struct net_device *netdev, e1e_rphy(hw, MII_STAT1000, &phy_data); regs_buff[24] = (u32)phy_data; /* phy local receiver status */ regs_buff[25] = regs_buff[24]; /* phy remote receiver status */ + + pm_runtime_put_sync(netdev->dev.parent); } static int e1000_get_eeprom_len(struct net_device *netdev) @@ -515,6 +529,8 @@ static int e1000_get_eeprom(struct net_device *netdev, if (!eeprom_buff) return -ENOMEM; + pm_runtime_get_sync(netdev->dev.parent); + if (hw->nvm.type == e1000_nvm_eeprom_spi) { ret_val = e1000_read_nvm(hw, first_word, last_word - first_word + 1, @@ -528,6 +544,8 @@ static int e1000_get_eeprom(struct net_device *netdev, } } + pm_runtime_put_sync(netdev->dev.parent); + if (ret_val) { /* a read error occurred, throw away the result */ memset(eeprom_buff, 0xff, sizeof(u16) * @@ -577,6 +595,8 @@ static int e1000_set_eeprom(struct net_device *netdev, ptr = (void *)eeprom_buff; + pm_runtime_get_sync(netdev->dev.parent); + if (eeprom->offset & 1) { /* need read/modify/write of first changed EEPROM word */ /* only the second byte of the word is being modified */ @@ -617,6 +637,7 @@ static int e1000_set_eeprom(struct net_device *netdev, ret_val = e1000e_update_nvm_checksum(hw); out: + pm_runtime_put_sync(netdev->dev.parent); kfree(eeprom_buff); return ret_val; } @@ -712,6 +733,8 @@ static int e1000_set_ringparam(struct net_device *netdev, } } + pm_runtime_get_sync(netdev->dev.parent); + e1000e_down(adapter, true); /* We can't just free everything and then setup again, because the @@ -750,6 +773,7 @@ static int e1000_set_ringparam(struct net_device *netdev, e1000e_free_tx_resources(temp_tx); err_setup: e1000e_up(adapter); + pm_runtime_put_sync(netdev->dev.parent); free_temp: vfree(temp_tx); vfree(temp_rx); @@ -1792,6 +1816,8 @@ static void e1000_diag_test(struct net_device *netdev, u8 autoneg; bool if_running = netif_running(netdev); + pm_runtime_get_sync(netdev->dev.parent); + set_bit(__E1000_TESTING, &adapter->state); if (!if_running) { @@ -1877,6 +1903,8 @@ static void e1000_diag_test(struct net_device *netdev, } msleep_interruptible(4 * 1000); + + pm_runtime_put_sync(netdev->dev.parent); } static void e1000_get_wol(struct net_device *netdev, @@ -2018,11 +2046,15 @@ static int e1000_set_coalesce(struct net_device *netdev, adapter->itr_setting = adapter->itr & ~3; } + pm_runtime_get_sync(netdev->dev.parent); + if (adapter->itr_setting != 0) e1000e_write_itr(adapter, adapter->itr); else e1000e_write_itr(adapter, 0); + pm_runtime_put_sync(netdev->dev.parent); + return 0; } @@ -2036,7 +2068,9 @@ static int e1000_nway_reset(struct net_device *netdev) if (!adapter->hw.mac.autoneg) return -EINVAL; + pm_runtime_get_sync(netdev->dev.parent); e1000e_reinit_locked(adapter); + pm_runtime_put_sync(netdev->dev.parent); return 0; } @@ -2050,8 +2084,12 @@ static void e1000_get_ethtool_stats(struct net_device *netdev, int i; char *p = NULL; + pm_runtime_get_sync(netdev->dev.parent); + dev_get_stats(netdev, &net_stats); + pm_runtime_put_sync(netdev->dev.parent); + for (i = 0; i < E1000_GLOBAL_STATS_LEN; i++) { switch (e1000_gstrings_stats[i].type) { case NETDEV_STATS: @@ -2108,7 +2146,9 @@ static int e1000_get_rxnfc(struct net_device *netdev, struct e1000_hw *hw = &adapter->hw; u32 mrqc; + pm_runtime_get_sync(netdev->dev.parent); mrqc = er32(MRQC); + pm_runtime_put_sync(netdev->dev.parent); if (!(mrqc & E1000_MRQC_RSS_FIELD_MASK)) return 0; @@ -2171,9 +2211,13 @@ static int e1000e_get_eee(struct net_device *netdev, struct ethtool_keee *edata) return -EOPNOTSUPP; } + pm_runtime_get_sync(netdev->dev.parent); + ret_val = hw->phy.ops.acquire(hw); - if (ret_val) + if (ret_val) { + pm_runtime_put_sync(netdev->dev.parent); return -EBUSY; + } /* EEE Capability */ ret_val = e1000_read_emi_reg_locked(hw, cap_addr, &phy_data); @@ -2213,6 +2257,8 @@ static int e1000e_get_eee(struct net_device *netdev, struct ethtool_keee *edata) if (ret_val) ret_val = -ENODATA; + pm_runtime_put_sync(netdev->dev.parent); + return ret_val; } @@ -2253,12 +2299,16 @@ static int e1000e_set_eee(struct net_device *netdev, struct ethtool_keee *edata) hw->dev_spec.ich8lan.eee_disable = !edata->eee_enabled; + pm_runtime_get_sync(netdev->dev.parent); + /* reset the link */ if (netif_running(netdev)) e1000e_reinit_locked(adapter); else e1000e_reset(adapter); + pm_runtime_put_sync(netdev->dev.parent); + return 0; } ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: ACPI IRQ storm with 6.10 2024-08-20 21:30 ` Bjorn Helgaas @ 2024-08-21 5:09 ` Jiri Slaby 2024-08-21 11:39 ` Petr Valenta 1 sibling, 0 replies; 19+ messages in thread From: Jiri Slaby @ 2024-08-21 5:09 UTC (permalink / raw) To: Bjorn Helgaas, Petr Valenta Cc: Bjorn Helgaas, Len Brown, linux-acpi@vger.kernel.org, Linux kernel mailing list, Linux regressions mailing list, Tony Nguyen, przemyslaw.kitszel, intel-wired-lan, Rafael J. Wysocki On 20. 08. 24, 23:30, Bjorn Helgaas wrote: > On Tue, Aug 20, 2024 at 11:13:54PM +0200, Petr Valenta wrote: >> Dne 20. 08. 24 v 20:09 Bjorn Helgaas napsal(a): >>> On Mon, Aug 19, 2024 at 07:23:42AM +0200, Jiri Slaby wrote: >>>> On 19. 08. 24, 6:50, Jiri Slaby wrote: >>>>> CC e1000e guys + Jesse (due to 75a3f93b5383) + Bjorn (due to b2c289415b2b) >>>> >>>> Bjorn, >>>> >>>> I am confused by these changes: >>>> ========================================== >>>> @@ -291,16 +288,13 @@ static int e1000_set_link_ksettings(struct net_device >>>> *net >>>> dev, >>>> * duplex is forced. >>>> */ >>>> if (cmd->base.eth_tp_mdix_ctrl) { >>>> - if (hw->phy.media_type != e1000_media_type_copper) { >>>> - ret_val = -EOPNOTSUPP; >>>> - goto out; >>>> - } >>>> + if (hw->phy.media_type != e1000_media_type_copper) >>>> + return -EOPNOTSUPP; >>>> >>>> if ((cmd->base.eth_tp_mdix_ctrl != ETH_TP_MDI_AUTO) && >>>> (cmd->base.autoneg != AUTONEG_ENABLE)) { >>>> e_err("forcing MDI/MDI-X state is not supported when >>>> lin >>>> k speed and/or duplex are forced\n"); >>>> - ret_val = -EINVAL; >>>> - goto out; >>>> + return -EINVAL; >>>> } >>>> } >>>> >>>> @@ -347,7 +341,6 @@ static int e1000_set_link_ksettings(struct net_device >>>> *netde >>>> v, >>>> } >>>> >>>> out: >>>> - pm_runtime_put_sync(netdev->dev.parent); >>>> clear_bit(__E1000_RESETTING, &adapter->state); >>>> return ret_val; >>>> } >>>> ========================================== >>>> >>>> So no more clear_bit(__E1000_RESETTING in the above fail paths. Is that >>>> intentional? >>> >>> Not intentional. Petr, do you have the ability to test the patch >>> below? I'm not sure it's the correct fix, but it reverts the pieces >>> of b2c289415b2b that Jiri pointed out. >> >> I tested the patch below but it didn't help. After the first boot with new >> kernel it looked promising as the irq storm only appeared for a few seconds, >> but with subsequent reboots it was the same as without the patch. > > Thank you very much for testing that! >> To be sure, I also send the md5sum of ethtool.c after applying the patch: >> >> a25c003257538f16994b4d7c4260e894 ethtool.c > > Thanks, that matches what I get when applying the patch on v6.10. > > I'm at a loss. You could try reverting the entire b2c289415b2b commit > (patch for that is below). FWIW he already tested with b2c289415b2b reverted (I provided him with a built kernel). It behaves the same. So you are not the breaker. > If that doesn't help, I guess you could try reverting the other > commits Jiri mentioned: > > 76a0a3f9cc2f e1000e: fix force smbus during suspend flow > c93a6f62cb1b e1000e: Fix S0ix residency on corporate systems > bfd546a552e1 e1000e: move force SMBUS near the end of enable_ulp function > 6918107e2540 net: e1000e & ixgbe: Remove PCI_HEADER_TYPE_MFD duplicates > 1eb2cded45b3 net: annotate writes on dev->mtu from ndo_change_mtu() > b2c289415b2b e1000e: Remove redundant runtime resume for ethtool_ops > 75a3f93b5383 net: intel: implement modern PM ops declarations > > If you do this, I would revert 76a0a3f9cc2f, test, then revert > c93a6f62cb1b in addition, test, then revert bfd546a552e1 in addition, > etc. Or perhaps easier to do: git bisect v6.10 v6.9 -- drivers/net/ethernet/intel/e1000e/ directly. But that assumes one of the above commits broke it. If they did not, as a last resort, you can still do full bisect (without the "-- drivers" part). I would take v6.10 suses config. Would boot 6.10. do lsmod > /tmp/lsmod make LSMOD=/tmp/lsmod localyesconfig make bzImage and use that bzImage. Note that graphics, wireless and other stuff will be defunct unless you build in firmwares for them (EXTRA_FIRMWARE config). Alternatively use localmodconfig and build and install also modules (now limited to your machine). thanks, -- js suse labs ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: ACPI IRQ storm with 6.10 2024-08-20 21:30 ` Bjorn Helgaas 2024-08-21 5:09 ` Jiri Slaby @ 2024-08-21 11:39 ` Petr Valenta 2024-08-21 14:59 ` Bjorn Helgaas 1 sibling, 1 reply; 19+ messages in thread From: Petr Valenta @ 2024-08-21 11:39 UTC (permalink / raw) To: Bjorn Helgaas Cc: Jiri Slaby, Bjorn Helgaas, Len Brown, linux-acpi@vger.kernel.org, Linux kernel mailing list, Linux regressions mailing list, Tony Nguyen, przemyslaw.kitszel, intel-wired-lan, Rafael J. Wysocki Dne 20. 08. 24 v 23:30 Bjorn Helgaas napsal(a): > On Tue, Aug 20, 2024 at 11:13:54PM +0200, Petr Valenta wrote: >> Dne 20. 08. 24 v 20:09 Bjorn Helgaas napsal(a): >>> On Mon, Aug 19, 2024 at 07:23:42AM +0200, Jiri Slaby wrote: >>>> On 19. 08. 24, 6:50, Jiri Slaby wrote: >>>>> CC e1000e guys + Jesse (due to 75a3f93b5383) + Bjorn (due to b2c289415b2b) >>>> >>>> Bjorn, >>>> >>>> I am confused by these changes: >>>> ========================================== >>>> @@ -291,16 +288,13 @@ static int e1000_set_link_ksettings(struct net_device >>>> *net >>>> dev, >>>> * duplex is forced. >>>> */ >>>> if (cmd->base.eth_tp_mdix_ctrl) { >>>> - if (hw->phy.media_type != e1000_media_type_copper) { >>>> - ret_val = -EOPNOTSUPP; >>>> - goto out; >>>> - } >>>> + if (hw->phy.media_type != e1000_media_type_copper) >>>> + return -EOPNOTSUPP; >>>> >>>> if ((cmd->base.eth_tp_mdix_ctrl != ETH_TP_MDI_AUTO) && >>>> (cmd->base.autoneg != AUTONEG_ENABLE)) { >>>> e_err("forcing MDI/MDI-X state is not supported when >>>> lin >>>> k speed and/or duplex are forced\n"); >>>> - ret_val = -EINVAL; >>>> - goto out; >>>> + return -EINVAL; >>>> } >>>> } >>>> >>>> @@ -347,7 +341,6 @@ static int e1000_set_link_ksettings(struct net_device >>>> *netde >>>> v, >>>> } >>>> >>>> out: >>>> - pm_runtime_put_sync(netdev->dev.parent); >>>> clear_bit(__E1000_RESETTING, &adapter->state); >>>> return ret_val; >>>> } >>>> ========================================== >>>> >>>> So no more clear_bit(__E1000_RESETTING in the above fail paths. Is that >>>> intentional? >>> >>> Not intentional. Petr, do you have the ability to test the patch >>> below? I'm not sure it's the correct fix, but it reverts the pieces >>> of b2c289415b2b that Jiri pointed out. >> >> I tested the patch below but it didn't help. After the first boot with new >> kernel it looked promising as the irq storm only appeared for a few seconds, >> but with subsequent reboots it was the same as without the patch. > > Thank you very much for testing that! > >> To be sure, I also send the md5sum of ethtool.c after applying the patch: >> >> a25c003257538f16994b4d7c4260e894 ethtool.c > > Thanks, that matches what I get when applying the patch on v6.10. > > I'm at a loss. You could try reverting the entire b2c289415b2b commit > (patch for that is below). This patch didn't help, so I reverted it back. > > If that doesn't help, I guess you could try reverting the other > commits Jiri mentioned: > > 76a0a3f9cc2f e1000e: fix force smbus during suspend flow > c93a6f62cb1b e1000e: Fix S0ix residency on corporate systems > bfd546a552e1 e1000e: move force SMBUS near the end of enable_ulp function > 6918107e2540 net: e1000e & ixgbe: Remove PCI_HEADER_TYPE_MFD duplicates > 1eb2cded45b3 net: annotate writes on dev->mtu from ndo_change_mtu() > b2c289415b2b e1000e: Remove redundant runtime resume for ethtool_ops > 75a3f93b5383 net: intel: implement modern PM ops declarations > > If you do this, I would revert 76a0a3f9cc2f, test, then revert > c93a6f62cb1b in addition, test, then revert bfd546a552e1 in addition, > etc. I have created revert patches like this: git format-patch --stdout -1 76a0a3f9cc2f | interdiff -q /dev/stdin \ /dev/null > revert_76a0a3f9cc2f.patch I have applied revert_76a0a3f9cc2f.patch (rebuild and tested), then in addition revert_c93a6f62cb1b.patch and after applying revert_bfd546a552e1.patch irq storm didn't appear. I have tested it with 3 subsequent reboots and in all those cases it was ok. > > commit 5e92945ffe5c ("Revert "e1000e: Remove redundant runtime resume for ethtool_ops"") > Author: Bjorn Helgaas <bhelgaas@google.com> > Date: Tue Aug 20 16:18:32 2024 -0500 > > Revert "e1000e: Remove redundant runtime resume for ethtool_ops" > > This reverts commit b2c289415b2b2ef112b78d5e73b4acecf5db409e. > > > diff --git a/drivers/net/ethernet/intel/e1000e/ethtool.c b/drivers/net/ethernet/intel/e1000e/ethtool.c > index 9364bc2b4eb1..61fa2f6b3708 100644 > --- a/drivers/net/ethernet/intel/e1000e/ethtool.c > +++ b/drivers/net/ethernet/intel/e1000e/ethtool.c > @@ -156,7 +156,7 @@ static int e1000_get_link_ksettings(struct net_device *netdev, > speed = adapter->link_speed; > cmd->base.duplex = adapter->link_duplex - 1; > } > - } else { > + } else if (!pm_runtime_suspended(netdev->dev.parent)) { > u32 status = er32(STATUS); > > if (status & E1000_STATUS_LU) { > @@ -274,13 +274,16 @@ static int e1000_set_link_ksettings(struct net_device *netdev, > ethtool_convert_link_mode_to_legacy_u32(&advertising, > cmd->link_modes.advertising); > > + pm_runtime_get_sync(netdev->dev.parent); > + > /* When SoL/IDER sessions are active, autoneg/speed/duplex > * cannot be changed > */ > if (hw->phy.ops.check_reset_block && > hw->phy.ops.check_reset_block(hw)) { > e_err("Cannot change link characteristics when SoL/IDER is active.\n"); > - return -EINVAL; > + ret_val = -EINVAL; > + goto out; > } > > /* MDI setting is only allowed when autoneg enabled because > @@ -288,13 +291,16 @@ static int e1000_set_link_ksettings(struct net_device *netdev, > * duplex is forced. > */ > if (cmd->base.eth_tp_mdix_ctrl) { > - if (hw->phy.media_type != e1000_media_type_copper) > - return -EOPNOTSUPP; > + if (hw->phy.media_type != e1000_media_type_copper) { > + ret_val = -EOPNOTSUPP; > + goto out; > + } > > if ((cmd->base.eth_tp_mdix_ctrl != ETH_TP_MDI_AUTO) && > (cmd->base.autoneg != AUTONEG_ENABLE)) { > e_err("forcing MDI/MDI-X state is not supported when link speed and/or duplex are forced\n"); > - return -EINVAL; > + ret_val = -EINVAL; > + goto out; > } > } > > @@ -341,6 +347,7 @@ static int e1000_set_link_ksettings(struct net_device *netdev, > } > > out: > + pm_runtime_put_sync(netdev->dev.parent); > clear_bit(__E1000_RESETTING, &adapter->state); > return ret_val; > } > @@ -376,6 +383,8 @@ static int e1000_set_pauseparam(struct net_device *netdev, > while (test_and_set_bit(__E1000_RESETTING, &adapter->state)) > usleep_range(1000, 2000); > > + pm_runtime_get_sync(netdev->dev.parent); > + > if (adapter->fc_autoneg == AUTONEG_ENABLE) { > hw->fc.requested_mode = e1000_fc_default; > if (netif_running(adapter->netdev)) { > @@ -408,6 +417,7 @@ static int e1000_set_pauseparam(struct net_device *netdev, > } > > out: > + pm_runtime_put_sync(netdev->dev.parent); > clear_bit(__E1000_RESETTING, &adapter->state); > return retval; > } > @@ -438,6 +448,8 @@ static void e1000_get_regs(struct net_device *netdev, > u32 *regs_buff = p; > u16 phy_data; > > + pm_runtime_get_sync(netdev->dev.parent); > + > memset(p, 0, E1000_REGS_LEN * sizeof(u32)); > > regs->version = (1u << 24) | > @@ -483,6 +495,8 @@ static void e1000_get_regs(struct net_device *netdev, > e1e_rphy(hw, MII_STAT1000, &phy_data); > regs_buff[24] = (u32)phy_data; /* phy local receiver status */ > regs_buff[25] = regs_buff[24]; /* phy remote receiver status */ > + > + pm_runtime_put_sync(netdev->dev.parent); > } > > static int e1000_get_eeprom_len(struct net_device *netdev) > @@ -515,6 +529,8 @@ static int e1000_get_eeprom(struct net_device *netdev, > if (!eeprom_buff) > return -ENOMEM; > > + pm_runtime_get_sync(netdev->dev.parent); > + > if (hw->nvm.type == e1000_nvm_eeprom_spi) { > ret_val = e1000_read_nvm(hw, first_word, > last_word - first_word + 1, > @@ -528,6 +544,8 @@ static int e1000_get_eeprom(struct net_device *netdev, > } > } > > + pm_runtime_put_sync(netdev->dev.parent); > + > if (ret_val) { > /* a read error occurred, throw away the result */ > memset(eeprom_buff, 0xff, sizeof(u16) * > @@ -577,6 +595,8 @@ static int e1000_set_eeprom(struct net_device *netdev, > > ptr = (void *)eeprom_buff; > > + pm_runtime_get_sync(netdev->dev.parent); > + > if (eeprom->offset & 1) { > /* need read/modify/write of first changed EEPROM word */ > /* only the second byte of the word is being modified */ > @@ -617,6 +637,7 @@ static int e1000_set_eeprom(struct net_device *netdev, > ret_val = e1000e_update_nvm_checksum(hw); > > out: > + pm_runtime_put_sync(netdev->dev.parent); > kfree(eeprom_buff); > return ret_val; > } > @@ -712,6 +733,8 @@ static int e1000_set_ringparam(struct net_device *netdev, > } > } > > + pm_runtime_get_sync(netdev->dev.parent); > + > e1000e_down(adapter, true); > > /* We can't just free everything and then setup again, because the > @@ -750,6 +773,7 @@ static int e1000_set_ringparam(struct net_device *netdev, > e1000e_free_tx_resources(temp_tx); > err_setup: > e1000e_up(adapter); > + pm_runtime_put_sync(netdev->dev.parent); > free_temp: > vfree(temp_tx); > vfree(temp_rx); > @@ -1792,6 +1816,8 @@ static void e1000_diag_test(struct net_device *netdev, > u8 autoneg; > bool if_running = netif_running(netdev); > > + pm_runtime_get_sync(netdev->dev.parent); > + > set_bit(__E1000_TESTING, &adapter->state); > > if (!if_running) { > @@ -1877,6 +1903,8 @@ static void e1000_diag_test(struct net_device *netdev, > } > > msleep_interruptible(4 * 1000); > + > + pm_runtime_put_sync(netdev->dev.parent); > } > > static void e1000_get_wol(struct net_device *netdev, > @@ -2018,11 +2046,15 @@ static int e1000_set_coalesce(struct net_device *netdev, > adapter->itr_setting = adapter->itr & ~3; > } > > + pm_runtime_get_sync(netdev->dev.parent); > + > if (adapter->itr_setting != 0) > e1000e_write_itr(adapter, adapter->itr); > else > e1000e_write_itr(adapter, 0); > > + pm_runtime_put_sync(netdev->dev.parent); > + > return 0; > } > > @@ -2036,7 +2068,9 @@ static int e1000_nway_reset(struct net_device *netdev) > if (!adapter->hw.mac.autoneg) > return -EINVAL; > > + pm_runtime_get_sync(netdev->dev.parent); > e1000e_reinit_locked(adapter); > + pm_runtime_put_sync(netdev->dev.parent); > > return 0; > } > @@ -2050,8 +2084,12 @@ static void e1000_get_ethtool_stats(struct net_device *netdev, > int i; > char *p = NULL; > > + pm_runtime_get_sync(netdev->dev.parent); > + > dev_get_stats(netdev, &net_stats); > > + pm_runtime_put_sync(netdev->dev.parent); > + > for (i = 0; i < E1000_GLOBAL_STATS_LEN; i++) { > switch (e1000_gstrings_stats[i].type) { > case NETDEV_STATS: > @@ -2108,7 +2146,9 @@ static int e1000_get_rxnfc(struct net_device *netdev, > struct e1000_hw *hw = &adapter->hw; > u32 mrqc; > > + pm_runtime_get_sync(netdev->dev.parent); > mrqc = er32(MRQC); > + pm_runtime_put_sync(netdev->dev.parent); > > if (!(mrqc & E1000_MRQC_RSS_FIELD_MASK)) > return 0; > @@ -2171,9 +2211,13 @@ static int e1000e_get_eee(struct net_device *netdev, struct ethtool_keee *edata) > return -EOPNOTSUPP; > } > > + pm_runtime_get_sync(netdev->dev.parent); > + > ret_val = hw->phy.ops.acquire(hw); > - if (ret_val) > + if (ret_val) { > + pm_runtime_put_sync(netdev->dev.parent); > return -EBUSY; > + } > > /* EEE Capability */ > ret_val = e1000_read_emi_reg_locked(hw, cap_addr, &phy_data); > @@ -2213,6 +2257,8 @@ static int e1000e_get_eee(struct net_device *netdev, struct ethtool_keee *edata) > if (ret_val) > ret_val = -ENODATA; > > + pm_runtime_put_sync(netdev->dev.parent); > + > return ret_val; > } > > @@ -2253,12 +2299,16 @@ static int e1000e_set_eee(struct net_device *netdev, struct ethtool_keee *edata) > > hw->dev_spec.ich8lan.eee_disable = !edata->eee_enabled; > > + pm_runtime_get_sync(netdev->dev.parent); > + > /* reset the link */ > if (netif_running(netdev)) > e1000e_reinit_locked(adapter); > else > e1000e_reset(adapter); > > + pm_runtime_put_sync(netdev->dev.parent); > + > return 0; > } > > > From mboxrd@z Thu Jan 1 00:00:00 1970 > Return-Path: <intel-wired-lan-bounces@osuosl.org> > X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on > aws-us-west-2-korg-lkml-1.web.codeaurora.org > Received: from smtp2.osuosl.org (smtp2.osuosl.org [140.211.166.133]) > (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) > (No client certificate requested) > by smtp.lore.kernel.org (Postfix) with ESMTPS id 22EE6C3DA4A > for <intel-wired-lan@archiver.kernel.org>; Tue, 20 Aug 2024 21:30:44 +0000 (UTC) > Received: from localhost (localhost [127.0.0.1]) > by smtp2.osuosl.org (Postfix) with ESMTP id E223040474; > Tue, 20 Aug 2024 21:30:43 +0000 (UTC) > X-Virus-Scanned: amavis at osuosl.org > Received: from smtp2.osuosl.org ([127.0.0.1]) > by localhost (smtp2.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP > id bS7tH8aI4dXw; Tue, 20 Aug 2024 21:30:42 +0000 (UTC) > X-Comment: SPF check N/A for local connections - client-ip=140.211.166.34; helo=ash.osuosl.org; envelope-from=intel-wired-lan-bounces@osuosl.org; receiver=<UNKNOWN> > DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org 9F4E940B38 > DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osuosl.org; > s=default; t=1724189442; > bh=mlj2OU/9jQQlYVgc8xXGFGEbmSA2BUO2s9ZMNvlZa5E=; > h=Date:From:To:In-Reply-To:Subject:List-Id:List-Unsubscribe: > List-Archive:List-Post:List-Help:List-Subscribe:Cc:From; > b=gRsw4VpLnbvTsoxcHbTQgLUM2iVdECmRSG1uEylrYmPsdWJDin8aVjhm4QrxSXuMO > g+6njzmv+w6950L+rcQlJiK6ry1jbhFrNzTmFOyCvctsUeLmaqGu1jrLvOp2XhP3uJ > dOpebDahCjpzldI2iCU6Oy/iNQsLx2KSsxzZgtlNqOx3rXCjDFXINVRKkL9Sz2kv5b > etzK/KOTa/xffZ7mnp9ZelV0v0YGUFmgFKvt6/QLqeTpqpXphMr3MntldSJLu7BrFm > 4EJloku9spGmIB2ZBt4wRHhPvuoKG4HPu8DszG7VJAOAj52QaC1bmnQ19OKwKM+Rlz > C0+0QilD6XFkQ== > Received: from ash.osuosl.org (ash.osuosl.org [140.211.166.34]) > by smtp2.osuosl.org (Postfix) with ESMTP id 9F4E940B38; > Tue, 20 Aug 2024 21:30:42 +0000 (UTC) > Received: from smtp4.osuosl.org (smtp4.osuosl.org [140.211.166.137]) > by ash.osuosl.org (Postfix) with ESMTP id B70001BF3B8 > for <intel-wired-lan@lists.osuosl.org>; Tue, 20 Aug 2024 21:30:40 +0000 (UTC) > Received: from localhost (localhost [127.0.0.1]) > by smtp4.osuosl.org (Postfix) with ESMTP id B04CA4067F > for <intel-wired-lan@lists.osuosl.org>; Tue, 20 Aug 2024 21:30:40 +0000 (UTC) > X-Virus-Scanned: amavis at osuosl.org > Received: from smtp4.osuosl.org ([127.0.0.1]) > by localhost (smtp4.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP > id 6VOvnWK04xp7 for <intel-wired-lan@lists.osuosl.org>; > Tue, 20 Aug 2024 21:30:39 +0000 (UTC) > Received-SPF: Pass (mailfrom) identity=mailfrom; > client-ip=2604:1380:4641:c500::1; helo=dfw.source.kernel.org; > envelope-from=helgaas@kernel.org; receiver=<UNKNOWN> > DMARC-Filter: OpenDMARC Filter v1.4.2 smtp4.osuosl.org 3098940679 > DKIM-Filter: OpenDKIM Filter v2.11.0 smtp4.osuosl.org 3098940679 > Received: from dfw.source.kernel.org (dfw.source.kernel.org > [IPv6:2604:1380:4641:c500::1]) > by smtp4.osuosl.org (Postfix) with ESMTPS id 3098940679 > for <intel-wired-lan@lists.osuosl.org>; Tue, 20 Aug 2024 21:30:39 +0000 (UTC) > Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) > by dfw.source.kernel.org (Postfix) with ESMTP id EF93D60DF6; > Tue, 20 Aug 2024 21:30:37 +0000 (UTC) > Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4A00BC4AF09; > Tue, 20 Aug 2024 21:30:37 +0000 (UTC) > Date: Tue, 20 Aug 2024 16:30:35 -0500 > From: Bjorn Helgaas <helgaas@kernel.org> > To: Petr Valenta <petr@jevklidu.cz> > Message-ID: <20240820213035.GA226181@bhelgaas> > MIME-Version: 1.0 > Content-Type: text/plain; charset=us-ascii > Content-Disposition: inline > In-Reply-To: <7ec28d20-c729-496a-b8bf-2bf88bbb4d70@jevklidu.cz> > X-Mailman-Original-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; > d=kernel.org; s=k20201202; t=1724189437; > bh=0t79HOH2kYVclDTTF8FvHw6OfaaM4JS6jorC17zXI3k=; > h=Date:From:To:Cc:Subject:In-Reply-To:From; > b=GhFyFhraHJm9t4ey15N3XxG5Bc8U2z1N5/4pJSFm4vNlndjUOQltzb39mj8r5tzdO > 0Qrk2iUfn9usdQTfr3eHowUDbvxoFI6z/iFJUe0kPts5eaue6Z5J1ESezm6daA6gwZ > yJtcho1EXrvEH0dGBx7xhKIKBezS0BAXW2lbUFKuTB9YSHJvprlgy0VZjDNjg/G3WL > 4nV3cEtNJ3EiMYMMNl9JNQtJDhvg/VIeHBFzQPJHHzBh+gZ4MmwNPJSkTNCDceZicW > OJIm3Vprxxg/osSezx/ysd0VAXqV7ywftKaXHshPID8u49v2SiJtBafOh/+7jMlh+U > CASP86+QHj9sA== > X-Mailman-Original-Authentication-Results: smtp4.osuosl.org; > dmarc=pass (p=quarantine dis=none) > header.from=kernel.org > X-Mailman-Original-Authentication-Results: smtp4.osuosl.org; > dkim=pass (2048-bit key, > unprotected) header.d=kernel.org header.i=@kernel.org header.a=rsa-sha256 > header.s=k20201202 header.b=GhFyFhra > Subject: Re: [Intel-wired-lan] ACPI IRQ storm with 6.10 > X-BeenThere: intel-wired-lan@osuosl.org > X-Mailman-Version: 2.1.29 > Precedence: list > List-Id: Intel Wired Ethernet Linux Kernel Driver Development > <intel-wired-lan.osuosl.org> > List-Unsubscribe: <https://lists.osuosl.org/mailman/options/intel-wired-lan>, > <mailto:intel-wired-lan-request@osuosl.org?subject=unsubscribe> > List-Archive: <http://lists.osuosl.org/pipermail/intel-wired-lan/> > List-Post: <mailto:intel-wired-lan@osuosl.org> > List-Help: <mailto:intel-wired-lan-request@osuosl.org?subject=help> > List-Subscribe: <https://lists.osuosl.org/mailman/listinfo/intel-wired-lan>, > <mailto:intel-wired-lan-request@osuosl.org?subject=subscribe> > Cc: Linux regressions mailing list <regressions@lists.linux.dev>, > "Rafael J. Wysocki" <rafael@kernel.org>, przemyslaw.kitszel@intel.com, > Linux kernel mailing list <linux-kernel@vger.kernel.org>, > "linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>, > Tony Nguyen <anthony.l.nguyen@intel.com>, Bjorn Helgaas <bhelgaas@google.com>, > intel-wired-lan@lists.osuosl.org, Jiri Slaby <jirislaby@kernel.org>, > Len Brown <lenb@kernel.org> > Errors-To: intel-wired-lan-bounces@osuosl.org > Sender: "Intel-wired-lan" <intel-wired-lan-bounces@osuosl.org> > > On Tue, Aug 20, 2024 at 11:13:54PM +0200, Petr Valenta wrote: >> Dne 20. 08. 24 v 20:09 Bjorn Helgaas napsal(a): >>> On Mon, Aug 19, 2024 at 07:23:42AM +0200, Jiri Slaby wrote: >>>> On 19. 08. 24, 6:50, Jiri Slaby wrote: >>>>> CC e1000e guys + Jesse (due to 75a3f93b5383) + Bjorn (due to b2c289415b2b) >>>> >>>> Bjorn, >>>> >>>> I am confused by these changes: >>>> ========================================== >>>> @@ -291,16 +288,13 @@ static int e1000_set_link_ksettings(struct net_device >>>> *net >>>> dev, >>>> * duplex is forced. >>>> */ >>>> if (cmd->base.eth_tp_mdix_ctrl) { >>>> - if (hw->phy.media_type != e1000_media_type_copper) { >>>> - ret_val = -EOPNOTSUPP; >>>> - goto out; >>>> - } >>>> + if (hw->phy.media_type != e1000_media_type_copper) >>>> + return -EOPNOTSUPP; >>>> >>>> if ((cmd->base.eth_tp_mdix_ctrl != ETH_TP_MDI_AUTO) && >>>> (cmd->base.autoneg != AUTONEG_ENABLE)) { >>>> e_err("forcing MDI/MDI-X state is not supported when >>>> lin >>>> k speed and/or duplex are forced\n"); >>>> - ret_val = -EINVAL; >>>> - goto out; >>>> + return -EINVAL; >>>> } >>>> } >>>> >>>> @@ -347,7 +341,6 @@ static int e1000_set_link_ksettings(struct net_device >>>> *netde >>>> v, >>>> } >>>> >>>> out: >>>> - pm_runtime_put_sync(netdev->dev.parent); >>>> clear_bit(__E1000_RESETTING, &adapter->state); >>>> return ret_val; >>>> } >>>> ========================================== >>>> >>>> So no more clear_bit(__E1000_RESETTING in the above fail paths. Is that >>>> intentional? >>> >>> Not intentional. Petr, do you have the ability to test the patch >>> below? I'm not sure it's the correct fix, but it reverts the pieces >>> of b2c289415b2b that Jiri pointed out. >> >> I tested the patch below but it didn't help. After the first boot with new >> kernel it looked promising as the irq storm only appeared for a few seconds, >> but with subsequent reboots it was the same as without the patch. > > Thank you very much for testing that! > >> To be sure, I also send the md5sum of ethtool.c after applying the patch: >> >> a25c003257538f16994b4d7c4260e894 ethtool.c > > Thanks, that matches what I get when applying the patch on v6.10. > > I'm at a loss. You could try reverting the entire b2c289415b2b commit > (patch for that is below). > > If that doesn't help, I guess you could try reverting the other > commits Jiri mentioned: > > 76a0a3f9cc2f e1000e: fix force smbus during suspend flow > c93a6f62cb1b e1000e: Fix S0ix residency on corporate systems > bfd546a552e1 e1000e: move force SMBUS near the end of enable_ulp function > 6918107e2540 net: e1000e & ixgbe: Remove PCI_HEADER_TYPE_MFD duplicates > 1eb2cded45b3 net: annotate writes on dev->mtu from ndo_change_mtu() > b2c289415b2b e1000e: Remove redundant runtime resume for ethtool_ops > 75a3f93b5383 net: intel: implement modern PM ops declarations > > If you do this, I would revert 76a0a3f9cc2f, test, then revert > c93a6f62cb1b in addition, test, then revert bfd546a552e1 in addition, > etc. > > commit 5e92945ffe5c ("Revert "e1000e: Remove redundant runtime resume for ethtool_ops"") > Author: Bjorn Helgaas <bhelgaas@google.com> > Date: Tue Aug 20 16:18:32 2024 -0500 > > Revert "e1000e: Remove redundant runtime resume for ethtool_ops" > > This reverts commit b2c289415b2b2ef112b78d5e73b4acecf5db409e. > > > diff --git a/drivers/net/ethernet/intel/e1000e/ethtool.c b/drivers/net/ethernet/intel/e1000e/ethtool.c > index 9364bc2b4eb1..61fa2f6b3708 100644 > --- a/drivers/net/ethernet/intel/e1000e/ethtool.c > +++ b/drivers/net/ethernet/intel/e1000e/ethtool.c > @@ -156,7 +156,7 @@ static int e1000_get_link_ksettings(struct net_device *netdev, > speed = adapter->link_speed; > cmd->base.duplex = adapter->link_duplex - 1; > } > - } else { > + } else if (!pm_runtime_suspended(netdev->dev.parent)) { > u32 status = er32(STATUS); > > if (status & E1000_STATUS_LU) { > @@ -274,13 +274,16 @@ static int e1000_set_link_ksettings(struct net_device *netdev, > ethtool_convert_link_mode_to_legacy_u32(&advertising, > cmd->link_modes.advertising); > > + pm_runtime_get_sync(netdev->dev.parent); > + > /* When SoL/IDER sessions are active, autoneg/speed/duplex > * cannot be changed > */ > if (hw->phy.ops.check_reset_block && > hw->phy.ops.check_reset_block(hw)) { > e_err("Cannot change link characteristics when SoL/IDER is active.\n"); > - return -EINVAL; > + ret_val = -EINVAL; > + goto out; > } > > /* MDI setting is only allowed when autoneg enabled because > @@ -288,13 +291,16 @@ static int e1000_set_link_ksettings(struct net_device *netdev, > * duplex is forced. > */ > if (cmd->base.eth_tp_mdix_ctrl) { > - if (hw->phy.media_type != e1000_media_type_copper) > - return -EOPNOTSUPP; > + if (hw->phy.media_type != e1000_media_type_copper) { > + ret_val = -EOPNOTSUPP; > + goto out; > + } > > if ((cmd->base.eth_tp_mdix_ctrl != ETH_TP_MDI_AUTO) && > (cmd->base.autoneg != AUTONEG_ENABLE)) { > e_err("forcing MDI/MDI-X state is not supported when link speed and/or duplex are forced\n"); > - return -EINVAL; > + ret_val = -EINVAL; > + goto out; > } > } > > @@ -341,6 +347,7 @@ static int e1000_set_link_ksettings(struct net_device *netdev, > } > > out: > + pm_runtime_put_sync(netdev->dev.parent); > clear_bit(__E1000_RESETTING, &adapter->state); > return ret_val; > } > @@ -376,6 +383,8 @@ static int e1000_set_pauseparam(struct net_device *netdev, > while (test_and_set_bit(__E1000_RESETTING, &adapter->state)) > usleep_range(1000, 2000); > > + pm_runtime_get_sync(netdev->dev.parent); > + > if (adapter->fc_autoneg == AUTONEG_ENABLE) { > hw->fc.requested_mode = e1000_fc_default; > if (netif_running(adapter->netdev)) { > @@ -408,6 +417,7 @@ static int e1000_set_pauseparam(struct net_device *netdev, > } > > out: > + pm_runtime_put_sync(netdev->dev.parent); > clear_bit(__E1000_RESETTING, &adapter->state); > return retval; > } > @@ -438,6 +448,8 @@ static void e1000_get_regs(struct net_device *netdev, > u32 *regs_buff = p; > u16 phy_data; > > + pm_runtime_get_sync(netdev->dev.parent); > + > memset(p, 0, E1000_REGS_LEN * sizeof(u32)); > > regs->version = (1u << 24) | > @@ -483,6 +495,8 @@ static void e1000_get_regs(struct net_device *netdev, > e1e_rphy(hw, MII_STAT1000, &phy_data); > regs_buff[24] = (u32)phy_data; /* phy local receiver status */ > regs_buff[25] = regs_buff[24]; /* phy remote receiver status */ > + > + pm_runtime_put_sync(netdev->dev.parent); > } > > static int e1000_get_eeprom_len(struct net_device *netdev) > @@ -515,6 +529,8 @@ static int e1000_get_eeprom(struct net_device *netdev, > if (!eeprom_buff) > return -ENOMEM; > > + pm_runtime_get_sync(netdev->dev.parent); > + > if (hw->nvm.type == e1000_nvm_eeprom_spi) { > ret_val = e1000_read_nvm(hw, first_word, > last_word - first_word + 1, > @@ -528,6 +544,8 @@ static int e1000_get_eeprom(struct net_device *netdev, > } > } > > + pm_runtime_put_sync(netdev->dev.parent); > + > if (ret_val) { > /* a read error occurred, throw away the result */ > memset(eeprom_buff, 0xff, sizeof(u16) * > @@ -577,6 +595,8 @@ static int e1000_set_eeprom(struct net_device *netdev, > > ptr = (void *)eeprom_buff; > > + pm_runtime_get_sync(netdev->dev.parent); > + > if (eeprom->offset & 1) { > /* need read/modify/write of first changed EEPROM word */ > /* only the second byte of the word is being modified */ > @@ -617,6 +637,7 @@ static int e1000_set_eeprom(struct net_device *netdev, > ret_val = e1000e_update_nvm_checksum(hw); > > out: > + pm_runtime_put_sync(netdev->dev.parent); > kfree(eeprom_buff); > return ret_val; > } > @@ -712,6 +733,8 @@ static int e1000_set_ringparam(struct net_device *netdev, > } > } > > + pm_runtime_get_sync(netdev->dev.parent); > + > e1000e_down(adapter, true); > > /* We can't just free everything and then setup again, because the > @@ -750,6 +773,7 @@ static int e1000_set_ringparam(struct net_device *netdev, > e1000e_free_tx_resources(temp_tx); > err_setup: > e1000e_up(adapter); > + pm_runtime_put_sync(netdev->dev.parent); > free_temp: > vfree(temp_tx); > vfree(temp_rx); > @@ -1792,6 +1816,8 @@ static void e1000_diag_test(struct net_device *netdev, > u8 autoneg; > bool if_running = netif_running(netdev); > > + pm_runtime_get_sync(netdev->dev.parent); > + > set_bit(__E1000_TESTING, &adapter->state); > > if (!if_running) { > @@ -1877,6 +1903,8 @@ static void e1000_diag_test(struct net_device *netdev, > } > > msleep_interruptible(4 * 1000); > + > + pm_runtime_put_sync(netdev->dev.parent); > } > > static void e1000_get_wol(struct net_device *netdev, > @@ -2018,11 +2046,15 @@ static int e1000_set_coalesce(struct net_device *netdev, > adapter->itr_setting = adapter->itr & ~3; > } > > + pm_runtime_get_sync(netdev->dev.parent); > + > if (adapter->itr_setting != 0) > e1000e_write_itr(adapter, adapter->itr); > else > e1000e_write_itr(adapter, 0); > > + pm_runtime_put_sync(netdev->dev.parent); > + > return 0; > } > > @@ -2036,7 +2068,9 @@ static int e1000_nway_reset(struct net_device *netdev) > if (!adapter->hw.mac.autoneg) > return -EINVAL; > > + pm_runtime_get_sync(netdev->dev.parent); > e1000e_reinit_locked(adapter); > + pm_runtime_put_sync(netdev->dev.parent); > > return 0; > } > @@ -2050,8 +2084,12 @@ static void e1000_get_ethtool_stats(struct net_device *netdev, > int i; > char *p = NULL; > > + pm_runtime_get_sync(netdev->dev.parent); > + > dev_get_stats(netdev, &net_stats); > > + pm_runtime_put_sync(netdev->dev.parent); > + > for (i = 0; i < E1000_GLOBAL_STATS_LEN; i++) { > switch (e1000_gstrings_stats[i].type) { > case NETDEV_STATS: > @@ -2108,7 +2146,9 @@ static int e1000_get_rxnfc(struct net_device *netdev, > struct e1000_hw *hw = &adapter->hw; > u32 mrqc; > > + pm_runtime_get_sync(netdev->dev.parent); > mrqc = er32(MRQC); > + pm_runtime_put_sync(netdev->dev.parent); > > if (!(mrqc & E1000_MRQC_RSS_FIELD_MASK)) > return 0; > @@ -2171,9 +2211,13 @@ static int e1000e_get_eee(struct net_device *netdev, struct ethtool_keee *edata) > return -EOPNOTSUPP; > } > > + pm_runtime_get_sync(netdev->dev.parent); > + > ret_val = hw->phy.ops.acquire(hw); > - if (ret_val) > + if (ret_val) { > + pm_runtime_put_sync(netdev->dev.parent); > return -EBUSY; > + } > > /* EEE Capability */ > ret_val = e1000_read_emi_reg_locked(hw, cap_addr, &phy_data); > @@ -2213,6 +2257,8 @@ static int e1000e_get_eee(struct net_device *netdev, struct ethtool_keee *edata) > if (ret_val) > ret_val = -ENODATA; > > + pm_runtime_put_sync(netdev->dev.parent); > + > return ret_val; > } > > @@ -2253,12 +2299,16 @@ static int e1000e_set_eee(struct net_device *netdev, struct ethtool_keee *edata) > > hw->dev_spec.ich8lan.eee_disable = !edata->eee_enabled; > > + pm_runtime_get_sync(netdev->dev.parent); > + > /* reset the link */ > if (netif_running(netdev)) > e1000e_reinit_locked(adapter); > else > e1000e_reset(adapter); > > + pm_runtime_put_sync(netdev->dev.parent); > + > return 0; > } > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: ACPI IRQ storm with 6.10 2024-08-21 11:39 ` Petr Valenta @ 2024-08-21 14:59 ` Bjorn Helgaas [not found] ` <1041b9b5-cc78-13b1-459a-d1d3a313475a@intel.com> 0 siblings, 1 reply; 19+ messages in thread From: Bjorn Helgaas @ 2024-08-21 14:59 UTC (permalink / raw) To: Petr Valenta, Dima Ruinskiy, Vitaly Lifshits, Hui Wang Cc: Jiri Slaby, Bjorn Helgaas, Len Brown, linux-acpi@vger.kernel.org, Linux kernel mailing list, Linux regressions mailing list, Tony Nguyen, przemyslaw.kitszel, intel-wired-lan, Rafael J. Wysocki [+to Dima, Vitaly, Hui; beginning of thread at https://lore.kernel.org/r/60ac8988-ace4-4cf0-8c44-028ca741c0a1@kernel.org] On Wed, Aug 21, 2024 at 01:39:11PM +0200, Petr Valenta wrote: > Dne 20. 08. 24 v 23:30 Bjorn Helgaas napsal(a): > > On Tue, Aug 20, 2024 at 11:13:54PM +0200, Petr Valenta wrote: > > > Dne 20. 08. 24 v 20:09 Bjorn Helgaas napsal(a): > > > > On Mon, Aug 19, 2024 at 07:23:42AM +0200, Jiri Slaby wrote: > > > > > On 19. 08. 24, 6:50, Jiri Slaby wrote: > > > > > > CC e1000e guys + Jesse (due to 75a3f93b5383) + Bjorn (due to b2c289415b2b) > ... > > I'm at a loss. You could try reverting the entire b2c289415b2b commit > > (patch for that is below). > > This patch didn't help, so I reverted it back. > > > If that doesn't help, I guess you could try reverting the other > > commits Jiri mentioned: > > > > 76a0a3f9cc2f e1000e: fix force smbus during suspend flow > > c93a6f62cb1b e1000e: Fix S0ix residency on corporate systems > > bfd546a552e1 e1000e: move force SMBUS near the end of enable_ulp function > > 6918107e2540 net: e1000e & ixgbe: Remove PCI_HEADER_TYPE_MFD duplicates > > 1eb2cded45b3 net: annotate writes on dev->mtu from ndo_change_mtu() > > b2c289415b2b e1000e: Remove redundant runtime resume for ethtool_ops > > 75a3f93b5383 net: intel: implement modern PM ops declarations > > > > If you do this, I would revert 76a0a3f9cc2f, test, then revert > > c93a6f62cb1b in addition, test, then revert bfd546a552e1 in addition, > > etc. > > I have created revert patches like this: > git format-patch --stdout -1 76a0a3f9cc2f | interdiff -q /dev/stdin \ > /dev/null > revert_76a0a3f9cc2f.patch > > I have applied revert_76a0a3f9cc2f.patch (rebuild and tested), then in > addition revert_c93a6f62cb1b.patch and after applying > revert_bfd546a552e1.patch irq storm didn't appear. > > I have tested it with 3 subsequent reboots and in all those cases it was ok. Thanks for all this testing. It sounds like reverting all three of 76a0a3f9cc2f, c93a6f62cb1b, and bfd546a552e1 fixed the IRQ storm, but I'm not clear on the results of other situations. It looks like c93a6f62cb1b could be reverted by itself because it's unrelated to 76a0a3f9cc2f and bfd546a552e1. I added the authors of all three in case they have any insights. Bjorn ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <1041b9b5-cc78-13b1-459a-d1d3a313475a@intel.com>]
* Re: ACPI IRQ storm with 6.10 [not found] ` <1041b9b5-cc78-13b1-459a-d1d3a313475a@intel.com> @ 2024-08-22 7:44 ` Petr Valenta 2024-08-22 8:33 ` Petr Valenta 0 siblings, 1 reply; 19+ messages in thread From: Petr Valenta @ 2024-08-22 7:44 UTC (permalink / raw) To: Vitaly Lifshits, Bjorn Helgaas, Dima Ruinskiy, Hui Wang Cc: Jiri Slaby, Bjorn Helgaas, Len Brown, linux-acpi@vger.kernel.org, Linux kernel mailing list, Linux regressions mailing list, Tony Nguyen, przemyslaw.kitszel, intel-wired-lan, Rafael J. Wysocki Dne 21. 08. 24 v 17:17 Vitaly Lifshits napsal(a): > > On 8/21/2024 5:59 PM, Bjorn Helgaas wrote: >> [+to Dima, Vitaly, Hui; beginning of thread at >> https://lore.kernel.org/r/60ac8988-ace4-4cf0-8c44-028ca741c0a1@kernel.org] >> >> On Wed, Aug 21, 2024 at 01:39:11PM +0200, Petr Valenta wrote: >>> Dne 20. 08. 24 v 23:30 Bjorn Helgaas napsal(a): >>>> On Tue, Aug 20, 2024 at 11:13:54PM +0200, Petr Valenta wrote: >>>>> Dne 20. 08. 24 v 20:09 Bjorn Helgaas napsal(a): >>>>>> On Mon, Aug 19, 2024 at 07:23:42AM +0200, Jiri Slaby wrote: >>>>>>> On 19. 08. 24, 6:50, Jiri Slaby wrote: >>>>>>>> CC e1000e guys + Jesse (due to 75a3f93b5383) + Bjorn (due to b2c289415b2b) >>> ... >>>> I'm at a loss. You could try reverting the entire b2c289415b2b commit >>>> (patch for that is below). >>> This patch didn't help, so I reverted it back. >>> >>>> If that doesn't help, I guess you could try reverting the other >>>> commits Jiri mentioned: >>>> >>>> 76a0a3f9cc2f e1000e: fix force smbus during suspend flow >>>> c93a6f62cb1b e1000e: Fix S0ix residency on corporate systems >>>> bfd546a552e1 e1000e: move force SMBUS near the end of enable_ulp function >>>> 6918107e2540 net: e1000e & ixgbe: Remove PCI_HEADER_TYPE_MFD duplicates >>>> 1eb2cded45b3 net: annotate writes on dev->mtu from ndo_change_mtu() >>>> b2c289415b2b e1000e: Remove redundant runtime resume for ethtool_ops >>>> 75a3f93b5383 net: intel: implement modern PM ops declarations >>>> >>>> If you do this, I would revert 76a0a3f9cc2f, test, then revert >>>> c93a6f62cb1b in addition, test, then revert bfd546a552e1 in addition, >>>> etc. >>> I have created revert patches like this: >>> git format-patch --stdout -1 76a0a3f9cc2f | interdiff -q /dev/stdin \ >>> /dev/null > revert_76a0a3f9cc2f.patch >>> >>> I have applied revert_76a0a3f9cc2f.patch (rebuild and tested), then in >>> addition revert_c93a6f62cb1b.patch and after applying >>> revert_bfd546a552e1.patch irq storm didn't appear. >>> >>> I have tested it with 3 subsequent reboots and in all those cases it was ok. >> Thanks for all this testing. It sounds like reverting all three of >> 76a0a3f9cc2f, c93a6f62cb1b, and bfd546a552e1 fixed the IRQ storm, but >> I'm not clear on the results of other situations. >> >> It looks like c93a6f62cb1b could be reverted by itself because it's >> unrelated to 76a0a3f9cc2f and bfd546a552e1. I added the authors of >> all three in case they have any insights. >> >> Bjorn > > > I doubt that it is related to c93a6f62cb1b, I believe that is more > probable to be related to the two other patches. > > Apart from what I suggested in the other mailing thread (enabling e1000e > debug and to test if it happens with a cable connected), > > I suggest to try to apply this patch and see if it fixes the issue: > > https://patchwork.ozlabs.org/project/intel-wired-lan/patch/20240806132348.880744-1-vitaly.lifshits@intel.com/ I have applied patch from link above and command bellow really doesn't start irq storm. echo 'auto' > /sys/bus/pci/devices/0000:00:1f.6/power/control Problem is that after executing this command and plugging cable to ethernet port, kernel is not able to detect link (LED indicate link is on) so network over cable is not working. > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: ACPI IRQ storm with 6.10 2024-08-22 7:44 ` Petr Valenta @ 2024-08-22 8:33 ` Petr Valenta 2024-08-22 9:18 ` Vitaly Lifshits 0 siblings, 1 reply; 19+ messages in thread From: Petr Valenta @ 2024-08-22 8:33 UTC (permalink / raw) To: Vitaly Lifshits, Bjorn Helgaas, Dima Ruinskiy, Hui Wang Cc: Jiri Slaby, Bjorn Helgaas, Len Brown, linux-acpi@vger.kernel.org, Linux kernel mailing list, Linux regressions mailing list, Tony Nguyen, przemyslaw.kitszel, intel-wired-lan, Rafael J. Wysocki Dne 22. 08. 24 v 9:44 Petr Valenta napsal(a): > > > Dne 21. 08. 24 v 17:17 Vitaly Lifshits napsal(a): >> >> On 8/21/2024 5:59 PM, Bjorn Helgaas wrote: >>> [+to Dima, Vitaly, Hui; beginning of thread at >>> https://lore.kernel.org/r/60ac8988-ace4-4cf0-8c44-028ca741c0a1@kernel.org] >>> >>> On Wed, Aug 21, 2024 at 01:39:11PM +0200, Petr Valenta wrote: >>>> Dne 20. 08. 24 v 23:30 Bjorn Helgaas napsal(a): >>>>> On Tue, Aug 20, 2024 at 11:13:54PM +0200, Petr Valenta wrote: >>>>>> Dne 20. 08. 24 v 20:09 Bjorn Helgaas napsal(a): >>>>>>> On Mon, Aug 19, 2024 at 07:23:42AM +0200, Jiri Slaby wrote: >>>>>>>> On 19. 08. 24, 6:50, Jiri Slaby wrote: >>>>>>>>> CC e1000e guys + Jesse (due to 75a3f93b5383) + Bjorn (due to >>>>>>>>> b2c289415b2b) >>>> ... >>>>> I'm at a loss. You could try reverting the entire b2c289415b2b commit >>>>> (patch for that is below). >>>> This patch didn't help, so I reverted it back. >>>> >>>>> If that doesn't help, I guess you could try reverting the other >>>>> commits Jiri mentioned: >>>>> >>>>> 76a0a3f9cc2f e1000e: fix force smbus during suspend flow >>>>> c93a6f62cb1b e1000e: Fix S0ix residency on corporate systems >>>>> bfd546a552e1 e1000e: move force SMBUS near the end of >>>>> enable_ulp function >>>>> 6918107e2540 net: e1000e & ixgbe: Remove PCI_HEADER_TYPE_MFD >>>>> duplicates >>>>> 1eb2cded45b3 net: annotate writes on dev->mtu from >>>>> ndo_change_mtu() >>>>> b2c289415b2b e1000e: Remove redundant runtime resume for >>>>> ethtool_ops >>>>> 75a3f93b5383 net: intel: implement modern PM ops declarations >>>>> >>>>> If you do this, I would revert 76a0a3f9cc2f, test, then revert >>>>> c93a6f62cb1b in addition, test, then revert bfd546a552e1 in addition, >>>>> etc. >>>> I have created revert patches like this: >>>> git format-patch --stdout -1 76a0a3f9cc2f | interdiff -q /dev/stdin \ >>>> /dev/null > revert_76a0a3f9cc2f.patch >>>> >>>> I have applied revert_76a0a3f9cc2f.patch (rebuild and tested), then in >>>> addition revert_c93a6f62cb1b.patch and after applying >>>> revert_bfd546a552e1.patch irq storm didn't appear. >>>> >>>> I have tested it with 3 subsequent reboots and in all those cases it >>>> was ok. >>> Thanks for all this testing. It sounds like reverting all three of >>> 76a0a3f9cc2f, c93a6f62cb1b, and bfd546a552e1 fixed the IRQ storm, but >>> I'm not clear on the results of other situations. >>> >>> It looks like c93a6f62cb1b could be reverted by itself because it's >>> unrelated to 76a0a3f9cc2f and bfd546a552e1. I added the authors of >>> all three in case they have any insights. >>> >>> Bjorn >> >> >> I doubt that it is related to c93a6f62cb1b, I believe that is more >> probable to be related to the two other patches. >> >> Apart from what I suggested in the other mailing thread (enabling >> e1000e debug and to test if it happens with a cable connected), >> >> I suggest to try to apply this patch and see if it fixes the issue: >> >> https://patchwork.ozlabs.org/project/intel-wired-lan/patch/20240806132348.880744-1-vitaly.lifshits@intel.com/ > > I have applied patch from link above and command bellow really doesn't > start irq storm. > > echo 'auto' > /sys/bus/pci/devices/0000:00:1f.6/power/control > > Problem is that after executing this command and plugging cable to > ethernet port, kernel is not able to detect link (LED indicate link is > on) so network over cable is not working. > >> >> > > From mboxrd@z Thu Jan 1 00:00:00 1970 > Return-Path: <intel-wired-lan-bounces@osuosl.org> > X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on > aws-us-west-2-korg-lkml-1.web.codeaurora.org > Received: from smtp2.osuosl.org (smtp2.osuosl.org [140.211.166.133]) > (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) > (No client certificate requested) > by smtp.lore.kernel.org (Postfix) with ESMTPS id 7319CC531DF > for <intel-wired-lan@archiver.kernel.org>; Thu, 22 Aug 2024 > 07:44:59 +0000 (UTC) > Received: from localhost (localhost [127.0.0.1]) > by smtp2.osuosl.org (Postfix) with ESMTP id 2EE99404B8; > Thu, 22 Aug 2024 07:44:59 +0000 (UTC) > X-Virus-Scanned: amavis at osuosl.org > Received: from smtp2.osuosl.org ([127.0.0.1]) > by localhost (smtp2.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP > id VRgkrPDlq_WW; Thu, 22 Aug 2024 07:44:56 +0000 (UTC) > X-Comment: SPF check N/A for local connections - > client-ip=140.211.166.34; helo=ash.osuosl.org; > envelope-from=intel-wired-lan-bounces@osuosl.org; receiver=<UNKNOWN> > DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org 53F64405BA > DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osuosl.org; > s=default; t=1724312696; > bh=y3v3IIFARTszfLWu7n/j8Du29EOi4VTxMDP3GF4qp7E=; > h=Date:To:References:From:In-Reply-To:Subject:List-Id: > List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: > Cc:From; > b=ZIudOsHGSDoQvtekseiE4SUpOKofnvHlxj7aT3f7bLvqCDMOCfygsO6tctN23YgSh > xYqnq4yBSB4/JQ4v7Juyg0P/wqTcr+XFqhORTc2qBku9GCA+Y4wRKbRUeH4/AUNthL > cf/zG7uEOFEKz4YALwviQFqR5E+HW9gD+YnXahtGUVqYiTjB01HuESDZdYI5huiCLI > eHnQDw/SSwM1YmkjLzQgICjlxtIRVYjUL+shaltRg9f7t4otZa9bvrvLptzw5Mrfc0 > GLvrNRmHckPFKEJOXgmIeQI40IOHckD3MX2dkQ2dQ0VCrkl9JIgtuSRuS3IpB1dr65 > TatTrq9Onm26w== > Received: from ash.osuosl.org (ash.osuosl.org [140.211.166.34]) > by smtp2.osuosl.org (Postfix) with ESMTP id 53F64405BA; > Thu, 22 Aug 2024 07:44:56 +0000 (UTC) > Received: from smtp1.osuosl.org (smtp1.osuosl.org [140.211.166.138]) > by ash.osuosl.org (Postfix) with ESMTP id 81E351BF322 > for <intel-wired-lan@lists.osuosl.org>; Thu, 22 Aug 2024 07:44:54 +0000 > (UTC) > Received: from localhost (localhost [127.0.0.1]) > by smtp1.osuosl.org (Postfix) with ESMTP id 79A0C80A82 > for <intel-wired-lan@lists.osuosl.org>; Thu, 22 Aug 2024 07:44:54 +0000 > (UTC) > X-Virus-Scanned: amavis at osuosl.org > Received: from smtp1.osuosl.org ([127.0.0.1]) > by localhost (smtp1.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP > id m9sJJpu9kR7y for <intel-wired-lan@lists.osuosl.org>; > Thu, 22 Aug 2024 07:44:53 +0000 (UTC) > Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=136.143.188.52; > helo=sender4-of-o52.zoho.com; envelope-from=petr@jevklidu.cz; > receiver=<UNKNOWN> DMARC-Filter: OpenDMARC Filter v1.4.2 > smtp1.osuosl.org 3674B80A59 > DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org 3674B80A59 > Received: from sender4-of-o52.zoho.com (sender4-of-o52.zoho.com > [136.143.188.52]) > by smtp1.osuosl.org (Postfix) with ESMTPS id 3674B80A59 > for <intel-wired-lan@lists.osuosl.org>; Thu, 22 Aug 2024 07:44:51 +0000 > (UTC) > ARC-Seal: i=1; a=rsa-sha256; t=1724312671; cv=none; d=zohomail.com; > s=zohoarc; > b=B0wnUG3UHEcTRfbjC9HSfLJG+WBnpU18yag7r0240QuMQMnP/cHcj9e4oJU2FgxRPLpt6OGnlZOiPNE2GUFgnkBzKBPwzxb7eTHFwW4P8cW+1IrIOQ6jZWd2rhOIyWcRKYMydfCbMPM04Z+RwKVyRlrLTYL5UDBYYKKHOG08Ikc= > ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; > d=zohomail.com; > s=zohoarc; t=1724312671; > h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:MIME-Version:Message-ID:References:Subject:Subject:To:To:Message-Id:Reply-To; > bh=y3v3IIFARTszfLWu7n/j8Du29EOi4VTxMDP3GF4qp7E=; > b=VvZAc/xKVy85rZpNNCAwUwpxquk4r4Xw2QjZmePGlnINwOvJf6oilR9lqx2WDMezV20iKTW9f3dauO4jIjp363HOdh7P21UFfa66a0oK63RODo7IQMHSCqaCwAEoO1PKHfDfTMwz0/BShU1dt+nhtAeSeKwbG7G1qizCcoXTdjo= > ARC-Authentication-Results: i=1; mx.zohomail.com; > dkim=pass header.i=jevklidu.cz; > spf=pass smtp.mailfrom=petr@jevklidu.cz; > dmarc=pass header.from=<petr@jevklidu.cz> > Received: by mx.zohomail.com with SMTPS id 1724312669862808.3168476405893; > Thu, 22 Aug 2024 00:44:29 -0700 (PDT) > Message-ID: <5ba3c7c2-5695-421d-a747-2a23af48db26@jevklidu.cz> > Date: Thu, 22 Aug 2024 09:44:22 +0200 > MIME-Version: 1.0 > User-Agent: Mozilla Thunderbird > To: Vitaly Lifshits <vitaly.lifshits@intel.com>, > Bjorn Helgaas <helgaas@kernel.org>, Dima Ruinskiy > <dima.ruinskiy@intel.com>, > Hui Wang <hui.wang@canonical.com> > References: <20240821145959.GA248604@bhelgaas> > <1041b9b5-cc78-13b1-459a-d1d3a313475a@intel.com> > Content-Language: cs-CZ, en-US > From: Petr Valenta <petr@jevklidu.cz> > In-Reply-To: <1041b9b5-cc78-13b1-459a-d1d3a313475a@intel.com> > Content-Type: text/plain; charset=UTF-8; format=flowed > Content-Transfer-Encoding: 7bit > X-ZohoMailClient: External > X-Mailman-Original-DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; > c=relaxed/relaxed; t=1724312671; s=zoho; d=jevklidu.cz; i=petr@jevklidu.cz; > h=Message-ID:Date:Date:MIME-Version:Subject:Subject:To:To:Cc:Cc:References:From:From:In-Reply-To:Content-Type:Content-Transfer-Encoding:Message-Id:Reply-To; > bh=y3v3IIFARTszfLWu7n/j8Du29EOi4VTxMDP3GF4qp7E=; > b=RSnIQpqoQp2O3ExJNnw4fk9dlt8CX1T5sbtB6GflBDYejiRQJTcrU3zRHn3pRkFq > tm00/cgXr6pF6T5vJFttBkfrHtnRiPiE8cjqni5KsNxCyOXOwri6I5ARAmPcUj42eda > e/xHQX9E3ayXrWSBQDAsun3Ann63tcXQKwlT7ffI= > X-Mailman-Original-Authentication-Results: smtp1.osuosl.org; > dmarc=none (p=none dis=none) > header.from=jevklidu.cz > X-Mailman-Original-Authentication-Results: smtp1.osuosl.org; > dkim=pass (1024-bit key, > unprotected) header.d=jevklidu.cz header.i=petr@jevklidu.cz > header.a=rsa-sha256 header.s=zoho header.b=RSnIQpqo > Subject: Re: [Intel-wired-lan] ACPI IRQ storm with 6.10 > X-BeenThere: intel-wired-lan@osuosl.org > X-Mailman-Version: 2.1.29 > Precedence: list > List-Id: Intel Wired Ethernet Linux Kernel Driver Development > <intel-wired-lan.osuosl.org> > List-Unsubscribe: > <https://lists.osuosl.org/mailman/options/intel-wired-lan>, > <mailto:intel-wired-lan-request@osuosl.org?subject=unsubscribe> > List-Archive: <http://lists.osuosl.org/pipermail/intel-wired-lan/> > List-Post: <mailto:intel-wired-lan@osuosl.org> > List-Help: <mailto:intel-wired-lan-request@osuosl.org?subject=help> > List-Subscribe: > <https://lists.osuosl.org/mailman/listinfo/intel-wired-lan>, > <mailto:intel-wired-lan-request@osuosl.org?subject=subscribe> > Cc: Linux regressions mailing list <regressions@lists.linux.dev>, > "Rafael J. Wysocki" <rafael@kernel.org>, przemyslaw.kitszel@intel.com, > Linux kernel mailing list <linux-kernel@vger.kernel.org>, > "linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>, > Tony Nguyen <anthony.l.nguyen@intel.com>, Bjorn Helgaas > <bhelgaas@google.com>, > intel-wired-lan@lists.osuosl.org, Jiri Slaby <jirislaby@kernel.org>, > Len Brown <lenb@kernel.org> > Errors-To: intel-wired-lan-bounces@osuosl.org > Sender: "Intel-wired-lan" <intel-wired-lan-bounces@osuosl.org> > > > > Dne 21. 08. 24 v 17:17 Vitaly Lifshits napsal(a): >> >> On 8/21/2024 5:59 PM, Bjorn Helgaas wrote: >>> [+to Dima, Vitaly, Hui; beginning of thread at >>> https://lore.kernel.org/r/60ac8988-ace4-4cf0-8c44-028ca741c0a1@kernel.org] >>> >>> On Wed, Aug 21, 2024 at 01:39:11PM +0200, Petr Valenta wrote: >>>> Dne 20. 08. 24 v 23:30 Bjorn Helgaas napsal(a): >>>>> On Tue, Aug 20, 2024 at 11:13:54PM +0200, Petr Valenta wrote: >>>>>> Dne 20. 08. 24 v 20:09 Bjorn Helgaas napsal(a): >>>>>>> On Mon, Aug 19, 2024 at 07:23:42AM +0200, Jiri Slaby wrote: >>>>>>>> On 19. 08. 24, 6:50, Jiri Slaby wrote: >>>>>>>>> CC e1000e guys + Jesse (due to 75a3f93b5383) + Bjorn (due to >>>>>>>>> b2c289415b2b) >>>> ... >>>>> I'm at a loss. You could try reverting the entire b2c289415b2b commit >>>>> (patch for that is below). >>>> This patch didn't help, so I reverted it back. >>>> >>>>> If that doesn't help, I guess you could try reverting the other >>>>> commits Jiri mentioned: >>>>> >>>>> 76a0a3f9cc2f e1000e: fix force smbus during suspend flow >>>>> c93a6f62cb1b e1000e: Fix S0ix residency on corporate systems >>>>> bfd546a552e1 e1000e: move force SMBUS near the end of >>>>> enable_ulp function >>>>> 6918107e2540 net: e1000e & ixgbe: Remove PCI_HEADER_TYPE_MFD >>>>> duplicates >>>>> 1eb2cded45b3 net: annotate writes on dev->mtu from >>>>> ndo_change_mtu() >>>>> b2c289415b2b e1000e: Remove redundant runtime resume for >>>>> ethtool_ops >>>>> 75a3f93b5383 net: intel: implement modern PM ops declarations >>>>> >>>>> If you do this, I would revert 76a0a3f9cc2f, test, then revert >>>>> c93a6f62cb1b in addition, test, then revert bfd546a552e1 in addition, >>>>> etc. >>>> I have created revert patches like this: >>>> git format-patch --stdout -1 76a0a3f9cc2f | interdiff -q /dev/stdin \ >>>> /dev/null > revert_76a0a3f9cc2f.patch >>>> >>>> I have applied revert_76a0a3f9cc2f.patch (rebuild and tested), then in >>>> addition revert_c93a6f62cb1b.patch and after applying >>>> revert_bfd546a552e1.patch irq storm didn't appear. >>>> >>>> I have tested it with 3 subsequent reboots and in all those cases it >>>> was ok. >>> Thanks for all this testing. It sounds like reverting all three of >>> 76a0a3f9cc2f, c93a6f62cb1b, and bfd546a552e1 fixed the IRQ storm, but >>> I'm not clear on the results of other situations. >>> >>> It looks like c93a6f62cb1b could be reverted by itself because it's >>> unrelated to 76a0a3f9cc2f and bfd546a552e1. I added the authors of >>> all three in case they have any insights. >>> >>> Bjorn >> >> >> I doubt that it is related to c93a6f62cb1b, I believe that is more >> probable to be related to the two other patches. >> >> Apart from what I suggested in the other mailing thread (enabling >> e1000e debug and to test if it happens with a cable connected), >> >> I suggest to try to apply this patch and see if it fixes the issue: >> >> https://patchwork.ozlabs.org/project/intel-wired-lan/patch/20240806132348.880744-1-vitaly.lifshits@intel.com/ > > I have applied patch from link above and command bellow really doesn't > start irq storm. > > echo 'auto' > /sys/bus/pci/devices/0000:00:1f.6/power/control > > Problem is that after executing this command and plugging cable to > ethernet port, kernel is not able to detect link (LED indicate link is > on) so network over cable is not working. > I have tested now how it behaves with kernel 6.9.9. There is a new finding. After running "echo 'auto' > /sys/bus/pci/devices/0000:00:1f.6/power/control" network over cable works but irq storm arrises. I have never tested this before because I don't use cable with this laptop at all. After unplugging cable irq storm is gone. A possible workaround would be to turn off power control for the e1000e at the kernel level (if is it possible) so that utilities like powertop don't cause irq storm or broken network. >> >> > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: ACPI IRQ storm with 6.10 2024-08-22 8:33 ` Petr Valenta @ 2024-08-22 9:18 ` Vitaly Lifshits 2024-08-22 10:24 ` Petr Valenta 0 siblings, 1 reply; 19+ messages in thread From: Vitaly Lifshits @ 2024-08-22 9:18 UTC (permalink / raw) To: Petr Valenta, Bjorn Helgaas, Dima Ruinskiy, Hui Wang Cc: Jiri Slaby, Bjorn Helgaas, Len Brown, linux-acpi@vger.kernel.org, Linux kernel mailing list, Linux regressions mailing list, Tony Nguyen, przemyslaw.kitszel, intel-wired-lan, Rafael J. Wysocki On 8/22/2024 11:33 AM, Petr Valenta wrote: > > > Dne 22. 08. 24 v 9:44 Petr Valenta napsal(a): >> >> >> Dne 21. 08. 24 v 17:17 Vitaly Lifshits napsal(a): >>> >>> On 8/21/2024 5:59 PM, Bjorn Helgaas wrote: >>>> [+to Dima, Vitaly, Hui; beginning of thread at >>>> https://lore.kernel.org/r/60ac8988-ace4-4cf0-8c44-028ca741c0a1@kernel.org] >>>> >>>> >>>> On Wed, Aug 21, 2024 at 01:39:11PM +0200, Petr Valenta wrote: >>>>> Dne 20. 08. 24 v 23:30 Bjorn Helgaas napsal(a): >>>>>> On Tue, Aug 20, 2024 at 11:13:54PM +0200, Petr Valenta wrote: >>>>>>> Dne 20. 08. 24 v 20:09 Bjorn Helgaas napsal(a): >>>>>>>> On Mon, Aug 19, 2024 at 07:23:42AM +0200, Jiri Slaby wrote: >>>>>>>>> On 19. 08. 24, 6:50, Jiri Slaby wrote: >>>>>>>>>> CC e1000e guys + Jesse (due to 75a3f93b5383) + Bjorn (due to >>>>>>>>>> b2c289415b2b) >>>>> ... >>>>>> I'm at a loss. You could try reverting the entire b2c289415b2b >>>>>> commit >>>>>> (patch for that is below). >>>>> This patch didn't help, so I reverted it back. >>>>> >>>>>> If that doesn't help, I guess you could try reverting the other >>>>>> commits Jiri mentioned: >>>>>> >>>>>> 76a0a3f9cc2f e1000e: fix force smbus during suspend flow >>>>>> c93a6f62cb1b e1000e: Fix S0ix residency on corporate systems >>>>>> bfd546a552e1 e1000e: move force SMBUS near the end of >>>>>> enable_ulp function >>>>>> 6918107e2540 net: e1000e & ixgbe: Remove PCI_HEADER_TYPE_MFD >>>>>> duplicates >>>>>> 1eb2cded45b3 net: annotate writes on dev->mtu from >>>>>> ndo_change_mtu() >>>>>> b2c289415b2b e1000e: Remove redundant runtime resume for >>>>>> ethtool_ops >>>>>> 75a3f93b5383 net: intel: implement modern PM ops declarations >>>>>> >>>>>> If you do this, I would revert 76a0a3f9cc2f, test, then revert >>>>>> c93a6f62cb1b in addition, test, then revert bfd546a552e1 in >>>>>> addition, >>>>>> etc. >>>>> I have created revert patches like this: >>>>> git format-patch --stdout -1 76a0a3f9cc2f | interdiff -q /dev/stdin \ >>>>> /dev/null > revert_76a0a3f9cc2f.patch >>>>> >>>>> I have applied revert_76a0a3f9cc2f.patch (rebuild and tested), >>>>> then in >>>>> addition revert_c93a6f62cb1b.patch and after applying >>>>> revert_bfd546a552e1.patch irq storm didn't appear. >>>>> >>>>> I have tested it with 3 subsequent reboots and in all those cases >>>>> it was ok. >>>> Thanks for all this testing. It sounds like reverting all three of >>>> 76a0a3f9cc2f, c93a6f62cb1b, and bfd546a552e1 fixed the IRQ storm, but >>>> I'm not clear on the results of other situations. >>>> >>>> It looks like c93a6f62cb1b could be reverted by itself because it's >>>> unrelated to 76a0a3f9cc2f and bfd546a552e1. I added the authors of >>>> all three in case they have any insights. >>>> >>>> Bjorn >>> >>> >>> I doubt that it is related to c93a6f62cb1b, I believe that is more >>> probable to be related to the two other patches. >>> >>> Apart from what I suggested in the other mailing thread (enabling >>> e1000e debug and to test if it happens with a cable connected), >>> >>> I suggest to try to apply this patch and see if it fixes the issue: >>> >>> https://patchwork.ozlabs.org/project/intel-wired-lan/patch/20240806132348.880744-1-vitaly.lifshits@intel.com/ >>> >> >> I have applied patch from link above and command bellow really >> doesn't start irq storm. >> >> echo 'auto' > /sys/bus/pci/devices/0000:00:1f.6/power/control >> >> Problem is that after executing this command and plugging cable to >> ethernet port, kernel is not able to detect link (LED indicate link >> is on) so network over cable is not working. >> >>> >>> >> >> From mboxrd@z Thu Jan 1 00:00:00 1970 >> Return-Path: <intel-wired-lan-bounces@osuosl.org> >> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on >> aws-us-west-2-korg-lkml-1.web.codeaurora.org >> Received: from smtp2.osuosl.org (smtp2.osuosl.org [140.211.166.133]) >> (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 >> bits)) >> (No client certificate requested) >> by smtp.lore.kernel.org (Postfix) with ESMTPS id 7319CC531DF >> for <intel-wired-lan@archiver.kernel.org>; Thu, 22 Aug 2024 >> 07:44:59 +0000 (UTC) >> Received: from localhost (localhost [127.0.0.1]) >> by smtp2.osuosl.org (Postfix) with ESMTP id 2EE99404B8; >> Thu, 22 Aug 2024 07:44:59 +0000 (UTC) >> X-Virus-Scanned: amavis at osuosl.org >> Received: from smtp2.osuosl.org ([127.0.0.1]) >> by localhost (smtp2.osuosl.org [127.0.0.1]) (amavis, port 10024) with >> ESMTP >> id VRgkrPDlq_WW; Thu, 22 Aug 2024 07:44:56 +0000 (UTC) >> X-Comment: SPF check N/A for local connections - >> client-ip=140.211.166.34; helo=ash.osuosl.org; >> envelope-from=intel-wired-lan-bounces@osuosl.org; receiver=<UNKNOWN> >> DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org 53F64405BA >> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osuosl.org; >> s=default; t=1724312696; >> bh=y3v3IIFARTszfLWu7n/j8Du29EOi4VTxMDP3GF4qp7E=; >> h=Date:To:References:From:In-Reply-To:Subject:List-Id: >> List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: >> Cc:From; >> b=ZIudOsHGSDoQvtekseiE4SUpOKofnvHlxj7aT3f7bLvqCDMOCfygsO6tctN23YgSh >> xYqnq4yBSB4/JQ4v7Juyg0P/wqTcr+XFqhORTc2qBku9GCA+Y4wRKbRUeH4/AUNthL >> cf/zG7uEOFEKz4YALwviQFqR5E+HW9gD+YnXahtGUVqYiTjB01HuESDZdYI5huiCLI >> eHnQDw/SSwM1YmkjLzQgICjlxtIRVYjUL+shaltRg9f7t4otZa9bvrvLptzw5Mrfc0 >> GLvrNRmHckPFKEJOXgmIeQI40IOHckD3MX2dkQ2dQ0VCrkl9JIgtuSRuS3IpB1dr65 >> TatTrq9Onm26w== >> Received: from ash.osuosl.org (ash.osuosl.org [140.211.166.34]) >> by smtp2.osuosl.org (Postfix) with ESMTP id 53F64405BA; >> Thu, 22 Aug 2024 07:44:56 +0000 (UTC) >> Received: from smtp1.osuosl.org (smtp1.osuosl.org [140.211.166.138]) >> by ash.osuosl.org (Postfix) with ESMTP id 81E351BF322 >> for <intel-wired-lan@lists.osuosl.org>; Thu, 22 Aug 2024 07:44:54 >> +0000 (UTC) >> Received: from localhost (localhost [127.0.0.1]) >> by smtp1.osuosl.org (Postfix) with ESMTP id 79A0C80A82 >> for <intel-wired-lan@lists.osuosl.org>; Thu, 22 Aug 2024 07:44:54 >> +0000 (UTC) >> X-Virus-Scanned: amavis at osuosl.org >> Received: from smtp1.osuosl.org ([127.0.0.1]) >> by localhost (smtp1.osuosl.org [127.0.0.1]) (amavis, port 10024) with >> ESMTP >> id m9sJJpu9kR7y for <intel-wired-lan@lists.osuosl.org>; >> Thu, 22 Aug 2024 07:44:53 +0000 (UTC) >> Received-SPF: Pass (mailfrom) identity=mailfrom; >> client-ip=136.143.188.52; >> helo=sender4-of-o52.zoho.com; envelope-from=petr@jevklidu.cz; >> receiver=<UNKNOWN> DMARC-Filter: OpenDMARC Filter v1.4.2 >> smtp1.osuosl.org 3674B80A59 >> DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org 3674B80A59 >> Received: from sender4-of-o52.zoho.com (sender4-of-o52.zoho.com >> [136.143.188.52]) >> by smtp1.osuosl.org (Postfix) with ESMTPS id 3674B80A59 >> for <intel-wired-lan@lists.osuosl.org>; Thu, 22 Aug 2024 07:44:51 >> +0000 (UTC) >> ARC-Seal: i=1; a=rsa-sha256; t=1724312671; cv=none; d=zohomail.com; >> s=zohoarc; >> b=B0wnUG3UHEcTRfbjC9HSfLJG+WBnpU18yag7r0240QuMQMnP/cHcj9e4oJU2FgxRPLpt6OGnlZOiPNE2GUFgnkBzKBPwzxb7eTHFwW4P8cW+1IrIOQ6jZWd2rhOIyWcRKYMydfCbMPM04Z+RwKVyRlrLTYL5UDBYYKKHOG08Ikc= >> ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; >> d=zohomail.com; >> s=zohoarc; t=1724312671; >> h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:MIME-Version:Message-ID:References:Subject:Subject:To:To:Message-Id:Reply-To; >> >> bh=y3v3IIFARTszfLWu7n/j8Du29EOi4VTxMDP3GF4qp7E=; >> b=VvZAc/xKVy85rZpNNCAwUwpxquk4r4Xw2QjZmePGlnINwOvJf6oilR9lqx2WDMezV20iKTW9f3dauO4jIjp363HOdh7P21UFfa66a0oK63RODo7IQMHSCqaCwAEoO1PKHfDfTMwz0/BShU1dt+nhtAeSeKwbG7G1qizCcoXTdjo= >> ARC-Authentication-Results: i=1; mx.zohomail.com; >> dkim=pass header.i=jevklidu.cz; >> spf=pass smtp.mailfrom=petr@jevklidu.cz; >> dmarc=pass header.from=<petr@jevklidu.cz> >> Received: by mx.zohomail.com with SMTPS id >> 1724312669862808.3168476405893; >> Thu, 22 Aug 2024 00:44:29 -0700 (PDT) >> Message-ID: <5ba3c7c2-5695-421d-a747-2a23af48db26@jevklidu.cz> >> Date: Thu, 22 Aug 2024 09:44:22 +0200 >> MIME-Version: 1.0 >> User-Agent: Mozilla Thunderbird >> To: Vitaly Lifshits <vitaly.lifshits@intel.com>, >> Bjorn Helgaas <helgaas@kernel.org>, Dima Ruinskiy >> <dima.ruinskiy@intel.com>, >> Hui Wang <hui.wang@canonical.com> >> References: <20240821145959.GA248604@bhelgaas> >> <1041b9b5-cc78-13b1-459a-d1d3a313475a@intel.com> >> Content-Language: cs-CZ, en-US >> From: Petr Valenta <petr@jevklidu.cz> >> In-Reply-To: <1041b9b5-cc78-13b1-459a-d1d3a313475a@intel.com> >> Content-Type: text/plain; charset=UTF-8; format=flowed >> Content-Transfer-Encoding: 7bit >> X-ZohoMailClient: External >> X-Mailman-Original-DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; >> c=relaxed/relaxed; t=1724312671; s=zoho; d=jevklidu.cz; >> i=petr@jevklidu.cz; >> h=Message-ID:Date:Date:MIME-Version:Subject:Subject:To:To:Cc:Cc:References:From:From:In-Reply-To:Content-Type:Content-Transfer-Encoding:Message-Id:Reply-To; >> >> bh=y3v3IIFARTszfLWu7n/j8Du29EOi4VTxMDP3GF4qp7E=; >> b=RSnIQpqoQp2O3ExJNnw4fk9dlt8CX1T5sbtB6GflBDYejiRQJTcrU3zRHn3pRkFq >> tm00/cgXr6pF6T5vJFttBkfrHtnRiPiE8cjqni5KsNxCyOXOwri6I5ARAmPcUj42eda >> e/xHQX9E3ayXrWSBQDAsun3Ann63tcXQKwlT7ffI= >> X-Mailman-Original-Authentication-Results: smtp1.osuosl.org; >> dmarc=none (p=none dis=none) >> header.from=jevklidu.cz >> X-Mailman-Original-Authentication-Results: smtp1.osuosl.org; >> dkim=pass (1024-bit key, >> unprotected) header.d=jevklidu.cz header.i=petr@jevklidu.cz >> header.a=rsa-sha256 header.s=zoho header.b=RSnIQpqo >> Subject: Re: [Intel-wired-lan] ACPI IRQ storm with 6.10 >> X-BeenThere: intel-wired-lan@osuosl.org >> X-Mailman-Version: 2.1.29 >> Precedence: list >> List-Id: Intel Wired Ethernet Linux Kernel Driver Development >> <intel-wired-lan.osuosl.org> >> List-Unsubscribe: >> <https://lists.osuosl.org/mailman/options/intel-wired-lan>, >> <mailto:intel-wired-lan-request@osuosl.org?subject=unsubscribe> >> List-Archive: <http://lists.osuosl.org/pipermail/intel-wired-lan/> >> List-Post: <mailto:intel-wired-lan@osuosl.org> >> List-Help: <mailto:intel-wired-lan-request@osuosl.org?subject=help> >> List-Subscribe: >> <https://lists.osuosl.org/mailman/listinfo/intel-wired-lan>, >> <mailto:intel-wired-lan-request@osuosl.org?subject=subscribe> >> Cc: Linux regressions mailing list <regressions@lists.linux.dev>, >> "Rafael J. Wysocki" <rafael@kernel.org>, przemyslaw.kitszel@intel.com, >> Linux kernel mailing list <linux-kernel@vger.kernel.org>, >> "linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>, >> Tony Nguyen <anthony.l.nguyen@intel.com>, Bjorn Helgaas >> <bhelgaas@google.com>, >> intel-wired-lan@lists.osuosl.org, Jiri Slaby <jirislaby@kernel.org>, >> Len Brown <lenb@kernel.org> >> Errors-To: intel-wired-lan-bounces@osuosl.org >> Sender: "Intel-wired-lan" <intel-wired-lan-bounces@osuosl.org> >> >> >> >> Dne 21. 08. 24 v 17:17 Vitaly Lifshits napsal(a): >>> >>> On 8/21/2024 5:59 PM, Bjorn Helgaas wrote: >>>> [+to Dima, Vitaly, Hui; beginning of thread at >>>> https://lore.kernel.org/r/60ac8988-ace4-4cf0-8c44-028ca741c0a1@kernel.org] >>>> >>>> >>>> On Wed, Aug 21, 2024 at 01:39:11PM +0200, Petr Valenta wrote: >>>>> Dne 20. 08. 24 v 23:30 Bjorn Helgaas napsal(a): >>>>>> On Tue, Aug 20, 2024 at 11:13:54PM +0200, Petr Valenta wrote: >>>>>>> Dne 20. 08. 24 v 20:09 Bjorn Helgaas napsal(a): >>>>>>>> On Mon, Aug 19, 2024 at 07:23:42AM +0200, Jiri Slaby wrote: >>>>>>>>> On 19. 08. 24, 6:50, Jiri Slaby wrote: >>>>>>>>>> CC e1000e guys + Jesse (due to 75a3f93b5383) + Bjorn (due to >>>>>>>>>> b2c289415b2b) >>>>> ... >>>>>> I'm at a loss. You could try reverting the entire b2c289415b2b >>>>>> commit >>>>>> (patch for that is below). >>>>> This patch didn't help, so I reverted it back. >>>>> >>>>>> If that doesn't help, I guess you could try reverting the other >>>>>> commits Jiri mentioned: >>>>>> >>>>>> 76a0a3f9cc2f e1000e: fix force smbus during suspend flow >>>>>> c93a6f62cb1b e1000e: Fix S0ix residency on corporate systems >>>>>> bfd546a552e1 e1000e: move force SMBUS near the end of >>>>>> enable_ulp function >>>>>> 6918107e2540 net: e1000e & ixgbe: Remove PCI_HEADER_TYPE_MFD >>>>>> duplicates >>>>>> 1eb2cded45b3 net: annotate writes on dev->mtu from >>>>>> ndo_change_mtu() >>>>>> b2c289415b2b e1000e: Remove redundant runtime resume for >>>>>> ethtool_ops >>>>>> 75a3f93b5383 net: intel: implement modern PM ops declarations >>>>>> >>>>>> If you do this, I would revert 76a0a3f9cc2f, test, then revert >>>>>> c93a6f62cb1b in addition, test, then revert bfd546a552e1 in >>>>>> addition, >>>>>> etc. >>>>> I have created revert patches like this: >>>>> git format-patch --stdout -1 76a0a3f9cc2f | interdiff -q /dev/stdin \ >>>>> /dev/null > revert_76a0a3f9cc2f.patch >>>>> >>>>> I have applied revert_76a0a3f9cc2f.patch (rebuild and tested), >>>>> then in >>>>> addition revert_c93a6f62cb1b.patch and after applying >>>>> revert_bfd546a552e1.patch irq storm didn't appear. >>>>> >>>>> I have tested it with 3 subsequent reboots and in all those cases >>>>> it was ok. >>>> Thanks for all this testing. It sounds like reverting all three of >>>> 76a0a3f9cc2f, c93a6f62cb1b, and bfd546a552e1 fixed the IRQ storm, but >>>> I'm not clear on the results of other situations. >>>> >>>> It looks like c93a6f62cb1b could be reverted by itself because it's >>>> unrelated to 76a0a3f9cc2f and bfd546a552e1. I added the authors of >>>> all three in case they have any insights. >>>> >>>> Bjorn >>> >>> >>> I doubt that it is related to c93a6f62cb1b, I believe that is more >>> probable to be related to the two other patches. >>> >>> Apart from what I suggested in the other mailing thread (enabling >>> e1000e debug and to test if it happens with a cable connected), >>> >>> I suggest to try to apply this patch and see if it fixes the issue: >>> >>> https://patchwork.ozlabs.org/project/intel-wired-lan/patch/20240806132348.880744-1-vitaly.lifshits@intel.com/ >>> >> >> I have applied patch from link above and command bellow really >> doesn't start irq storm. >> >> echo 'auto' > /sys/bus/pci/devices/0000:00:1f.6/power/control >> >> Problem is that after executing this command and plugging cable to >> ethernet port, kernel is not able to detect link (LED indicate link >> is on) so network over cable is not working. >> > > I have tested now how it behaves with kernel 6.9.9. There is a new > finding. After running "echo 'auto' > > /sys/bus/pci/devices/0000:00:1f.6/power/control" network over cable > works but irq storm arrises. I have never tested this before because I > don't use cable with this laptop at all. After unplugging cable irq > storm is gone. > > A possible workaround would be to turn off power control for the > e1000e at the kernel level (if is it possible) so that utilities like > powertop don't cause irq storm or broken network. > I would like to suggest the following for now as a triage: 1. revert the following patches: i. e1000e: fix force smbus during suspend flow (76a0a3f9cc2f) ii. e1000e: move force SMBUS near the end of enable_ulp function (bfd546a552e1) iii. "e1000e: move force SMBUS from enable ulp function to avoid PHY loss issue ( 861e8086029e) 2. apply this patch: https://patchwork.ozlabs.org/project/intel-wired-lan/patch/20240806132348.880744-1-vitaly.lifshits@intel.com/ I expect this configuration not to have neither the IRQ storm nor the link indication issue (>"Problem is that after executing this command and plugging cable to ethernet port, kernel is not able to detect link (LED indicate link is on) so network over cable is not working.") >>> >>> >> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: ACPI IRQ storm with 6.10 2024-08-22 9:18 ` Vitaly Lifshits @ 2024-08-22 10:24 ` Petr Valenta 0 siblings, 0 replies; 19+ messages in thread From: Petr Valenta @ 2024-08-22 10:24 UTC (permalink / raw) To: Vitaly Lifshits, Bjorn Helgaas, Dima Ruinskiy, Hui Wang Cc: Jiri Slaby, Bjorn Helgaas, Len Brown, linux-acpi@vger.kernel.org, Linux kernel mailing list, Linux regressions mailing list, Tony Nguyen, przemyslaw.kitszel, intel-wired-lan, Rafael J. Wysocki Dne 22. 08. 24 v 11:18 Vitaly Lifshits napsal(a): > > On 8/22/2024 11:33 AM, Petr Valenta wrote: >> >> >> Dne 22. 08. 24 v 9:44 Petr Valenta napsal(a): >>> >>> >>> Dne 21. 08. 24 v 17:17 Vitaly Lifshits napsal(a): >>>> >>>> On 8/21/2024 5:59 PM, Bjorn Helgaas wrote: >>>>> [+to Dima, Vitaly, Hui; beginning of thread at >>>>> https://lore.kernel.org/r/60ac8988-ace4-4cf0-8c44-028ca741c0a1@kernel.org] >>>>> >>>>> On Wed, Aug 21, 2024 at 01:39:11PM +0200, Petr Valenta wrote: >>>>>> Dne 20. 08. 24 v 23:30 Bjorn Helgaas napsal(a): >>>>>>> On Tue, Aug 20, 2024 at 11:13:54PM +0200, Petr Valenta wrote: >>>>>>>> Dne 20. 08. 24 v 20:09 Bjorn Helgaas napsal(a): >>>>>>>>> On Mon, Aug 19, 2024 at 07:23:42AM +0200, Jiri Slaby wrote: >>>>>>>>>> On 19. 08. 24, 6:50, Jiri Slaby wrote: >>>>>>>>>>> CC e1000e guys + Jesse (due to 75a3f93b5383) + Bjorn (due to >>>>>>>>>>> b2c289415b2b) >>>>>> ... >>>>>>> I'm at a loss. You could try reverting the entire b2c289415b2b >>>>>>> commit >>>>>>> (patch for that is below). >>>>>> This patch didn't help, so I reverted it back. >>>>>> >>>>>>> If that doesn't help, I guess you could try reverting the other >>>>>>> commits Jiri mentioned: >>>>>>> >>>>>>> 76a0a3f9cc2f e1000e: fix force smbus during suspend flow >>>>>>> c93a6f62cb1b e1000e: Fix S0ix residency on corporate systems >>>>>>> bfd546a552e1 e1000e: move force SMBUS near the end of >>>>>>> enable_ulp function >>>>>>> 6918107e2540 net: e1000e & ixgbe: Remove PCI_HEADER_TYPE_MFD >>>>>>> duplicates >>>>>>> 1eb2cded45b3 net: annotate writes on dev->mtu from >>>>>>> ndo_change_mtu() >>>>>>> b2c289415b2b e1000e: Remove redundant runtime resume for >>>>>>> ethtool_ops >>>>>>> 75a3f93b5383 net: intel: implement modern PM ops declarations >>>>>>> >>>>>>> If you do this, I would revert 76a0a3f9cc2f, test, then revert >>>>>>> c93a6f62cb1b in addition, test, then revert bfd546a552e1 in >>>>>>> addition, >>>>>>> etc. >>>>>> I have created revert patches like this: >>>>>> git format-patch --stdout -1 76a0a3f9cc2f | interdiff -q /dev/stdin \ >>>>>> /dev/null > revert_76a0a3f9cc2f.patch >>>>>> >>>>>> I have applied revert_76a0a3f9cc2f.patch (rebuild and tested), >>>>>> then in >>>>>> addition revert_c93a6f62cb1b.patch and after applying >>>>>> revert_bfd546a552e1.patch irq storm didn't appear. >>>>>> >>>>>> I have tested it with 3 subsequent reboots and in all those cases >>>>>> it was ok. >>>>> Thanks for all this testing. It sounds like reverting all three of >>>>> 76a0a3f9cc2f, c93a6f62cb1b, and bfd546a552e1 fixed the IRQ storm, but >>>>> I'm not clear on the results of other situations. >>>>> >>>>> It looks like c93a6f62cb1b could be reverted by itself because it's >>>>> unrelated to 76a0a3f9cc2f and bfd546a552e1. I added the authors of >>>>> all three in case they have any insights. >>>>> >>>>> Bjorn >>>> >>>> >>>> I doubt that it is related to c93a6f62cb1b, I believe that is more >>>> probable to be related to the two other patches. >>>> >>>> Apart from what I suggested in the other mailing thread (enabling >>>> e1000e debug and to test if it happens with a cable connected), >>>> >>>> I suggest to try to apply this patch and see if it fixes the issue: >>>> >>>> https://patchwork.ozlabs.org/project/intel-wired-lan/patch/20240806132348.880744-1-vitaly.lifshits@intel.com/ >>> >>> I have applied patch from link above and command bellow really >>> doesn't start irq storm. >>> >>> echo 'auto' > /sys/bus/pci/devices/0000:00:1f.6/power/control >>> >>> Problem is that after executing this command and plugging cable to >>> ethernet port, kernel is not able to detect link (LED indicate link >>> is on) so network over cable is not working. >>> >>>> >>>> >>> >>> From mboxrd@z Thu Jan 1 00:00:00 1970 >>> Return-Path: <intel-wired-lan-bounces@osuosl.org> >>> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on >>> aws-us-west-2-korg-lkml-1.web.codeaurora.org >>> Received: from smtp2.osuosl.org (smtp2.osuosl.org [140.211.166.133]) >>> (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 >>> bits)) >>> (No client certificate requested) >>> by smtp.lore.kernel.org (Postfix) with ESMTPS id 7319CC531DF >>> for <intel-wired-lan@archiver.kernel.org>; Thu, 22 Aug 2024 >>> 07:44:59 +0000 (UTC) >>> Received: from localhost (localhost [127.0.0.1]) >>> by smtp2.osuosl.org (Postfix) with ESMTP id 2EE99404B8; >>> Thu, 22 Aug 2024 07:44:59 +0000 (UTC) >>> X-Virus-Scanned: amavis at osuosl.org >>> Received: from smtp2.osuosl.org ([127.0.0.1]) >>> by localhost (smtp2.osuosl.org [127.0.0.1]) (amavis, port 10024) with >>> ESMTP >>> id VRgkrPDlq_WW; Thu, 22 Aug 2024 07:44:56 +0000 (UTC) >>> X-Comment: SPF check N/A for local connections - >>> client-ip=140.211.166.34; helo=ash.osuosl.org; >>> envelope-from=intel-wired-lan-bounces@osuosl.org; receiver=<UNKNOWN> >>> DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org 53F64405BA >>> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osuosl.org; >>> s=default; t=1724312696; >>> bh=y3v3IIFARTszfLWu7n/j8Du29EOi4VTxMDP3GF4qp7E=; >>> h=Date:To:References:From:In-Reply-To:Subject:List-Id: >>> List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: >>> Cc:From; >>> b=ZIudOsHGSDoQvtekseiE4SUpOKofnvHlxj7aT3f7bLvqCDMOCfygsO6tctN23YgSh >>> xYqnq4yBSB4/JQ4v7Juyg0P/wqTcr+XFqhORTc2qBku9GCA+Y4wRKbRUeH4/AUNthL >>> cf/zG7uEOFEKz4YALwviQFqR5E+HW9gD+YnXahtGUVqYiTjB01HuESDZdYI5huiCLI >>> eHnQDw/SSwM1YmkjLzQgICjlxtIRVYjUL+shaltRg9f7t4otZa9bvrvLptzw5Mrfc0 >>> GLvrNRmHckPFKEJOXgmIeQI40IOHckD3MX2dkQ2dQ0VCrkl9JIgtuSRuS3IpB1dr65 >>> TatTrq9Onm26w== >>> Received: from ash.osuosl.org (ash.osuosl.org [140.211.166.34]) >>> by smtp2.osuosl.org (Postfix) with ESMTP id 53F64405BA; >>> Thu, 22 Aug 2024 07:44:56 +0000 (UTC) >>> Received: from smtp1.osuosl.org (smtp1.osuosl.org [140.211.166.138]) >>> by ash.osuosl.org (Postfix) with ESMTP id 81E351BF322 >>> for <intel-wired-lan@lists.osuosl.org>; Thu, 22 Aug 2024 07:44:54 >>> +0000 (UTC) >>> Received: from localhost (localhost [127.0.0.1]) >>> by smtp1.osuosl.org (Postfix) with ESMTP id 79A0C80A82 >>> for <intel-wired-lan@lists.osuosl.org>; Thu, 22 Aug 2024 07:44:54 >>> +0000 (UTC) >>> X-Virus-Scanned: amavis at osuosl.org >>> Received: from smtp1.osuosl.org ([127.0.0.1]) >>> by localhost (smtp1.osuosl.org [127.0.0.1]) (amavis, port 10024) with >>> ESMTP >>> id m9sJJpu9kR7y for <intel-wired-lan@lists.osuosl.org>; >>> Thu, 22 Aug 2024 07:44:53 +0000 (UTC) >>> Received-SPF: Pass (mailfrom) identity=mailfrom; >>> client-ip=136.143.188.52; >>> helo=sender4-of-o52.zoho.com; envelope-from=petr@jevklidu.cz; >>> receiver=<UNKNOWN> DMARC-Filter: OpenDMARC Filter v1.4.2 >>> smtp1.osuosl.org 3674B80A59 >>> DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org 3674B80A59 >>> Received: from sender4-of-o52.zoho.com (sender4-of-o52.zoho.com >>> [136.143.188.52]) >>> by smtp1.osuosl.org (Postfix) with ESMTPS id 3674B80A59 >>> for <intel-wired-lan@lists.osuosl.org>; Thu, 22 Aug 2024 07:44:51 >>> +0000 (UTC) >>> ARC-Seal: i=1; a=rsa-sha256; t=1724312671; cv=none; d=zohomail.com; >>> s=zohoarc; >>> b=B0wnUG3UHEcTRfbjC9HSfLJG+WBnpU18yag7r0240QuMQMnP/cHcj9e4oJU2FgxRPLpt6OGnlZOiPNE2GUFgnkBzKBPwzxb7eTHFwW4P8cW+1IrIOQ6jZWd2rhOIyWcRKYMydfCbMPM04Z+RwKVyRlrLTYL5UDBYYKKHOG08Ikc= >>> ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; >>> d=zohomail.com; >>> s=zohoarc; t=1724312671; >>> h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:MIME-Version:Message-ID:References:Subject:Subject:To:To:Message-Id:Reply-To; >>> bh=y3v3IIFARTszfLWu7n/j8Du29EOi4VTxMDP3GF4qp7E=; >>> b=VvZAc/xKVy85rZpNNCAwUwpxquk4r4Xw2QjZmePGlnINwOvJf6oilR9lqx2WDMezV20iKTW9f3dauO4jIjp363HOdh7P21UFfa66a0oK63RODo7IQMHSCqaCwAEoO1PKHfDfTMwz0/BShU1dt+nhtAeSeKwbG7G1qizCcoXTdjo= >>> ARC-Authentication-Results: i=1; mx.zohomail.com; >>> dkim=pass header.i=jevklidu.cz; >>> spf=pass smtp.mailfrom=petr@jevklidu.cz; >>> dmarc=pass header.from=<petr@jevklidu.cz> >>> Received: by mx.zohomail.com with SMTPS id >>> 1724312669862808.3168476405893; >>> Thu, 22 Aug 2024 00:44:29 -0700 (PDT) >>> Message-ID: <5ba3c7c2-5695-421d-a747-2a23af48db26@jevklidu.cz> >>> Date: Thu, 22 Aug 2024 09:44:22 +0200 >>> MIME-Version: 1.0 >>> User-Agent: Mozilla Thunderbird >>> To: Vitaly Lifshits <vitaly.lifshits@intel.com>, >>> Bjorn Helgaas <helgaas@kernel.org>, Dima Ruinskiy >>> <dima.ruinskiy@intel.com>, >>> Hui Wang <hui.wang@canonical.com> >>> References: <20240821145959.GA248604@bhelgaas> >>> <1041b9b5-cc78-13b1-459a-d1d3a313475a@intel.com> >>> Content-Language: cs-CZ, en-US >>> From: Petr Valenta <petr@jevklidu.cz> >>> In-Reply-To: <1041b9b5-cc78-13b1-459a-d1d3a313475a@intel.com> >>> Content-Type: text/plain; charset=UTF-8; format=flowed >>> Content-Transfer-Encoding: 7bit >>> X-ZohoMailClient: External >>> X-Mailman-Original-DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; >>> c=relaxed/relaxed; t=1724312671; s=zoho; d=jevklidu.cz; >>> i=petr@jevklidu.cz; >>> h=Message-ID:Date:Date:MIME-Version:Subject:Subject:To:To:Cc:Cc:References:From:From:In-Reply-To:Content-Type:Content-Transfer-Encoding:Message-Id:Reply-To; >>> bh=y3v3IIFARTszfLWu7n/j8Du29EOi4VTxMDP3GF4qp7E=; >>> b=RSnIQpqoQp2O3ExJNnw4fk9dlt8CX1T5sbtB6GflBDYejiRQJTcrU3zRHn3pRkFq >>> tm00/cgXr6pF6T5vJFttBkfrHtnRiPiE8cjqni5KsNxCyOXOwri6I5ARAmPcUj42eda >>> e/xHQX9E3ayXrWSBQDAsun3Ann63tcXQKwlT7ffI= >>> X-Mailman-Original-Authentication-Results: smtp1.osuosl.org; >>> dmarc=none (p=none dis=none) >>> header.from=jevklidu.cz >>> X-Mailman-Original-Authentication-Results: smtp1.osuosl.org; >>> dkim=pass (1024-bit key, >>> unprotected) header.d=jevklidu.cz header.i=petr@jevklidu.cz >>> header.a=rsa-sha256 header.s=zoho header.b=RSnIQpqo >>> Subject: Re: [Intel-wired-lan] ACPI IRQ storm with 6.10 >>> X-BeenThere: intel-wired-lan@osuosl.org >>> X-Mailman-Version: 2.1.29 >>> Precedence: list >>> List-Id: Intel Wired Ethernet Linux Kernel Driver Development >>> <intel-wired-lan.osuosl.org> >>> List-Unsubscribe: >>> <https://lists.osuosl.org/mailman/options/intel-wired-lan>, >>> <mailto:intel-wired-lan-request@osuosl.org?subject=unsubscribe> >>> List-Archive: <http://lists.osuosl.org/pipermail/intel-wired-lan/> >>> List-Post: <mailto:intel-wired-lan@osuosl.org> >>> List-Help: <mailto:intel-wired-lan-request@osuosl.org?subject=help> >>> List-Subscribe: >>> <https://lists.osuosl.org/mailman/listinfo/intel-wired-lan>, >>> <mailto:intel-wired-lan-request@osuosl.org?subject=subscribe> >>> Cc: Linux regressions mailing list <regressions@lists.linux.dev>, >>> "Rafael J. Wysocki" <rafael@kernel.org>, przemyslaw.kitszel@intel.com, >>> Linux kernel mailing list <linux-kernel@vger.kernel.org>, >>> "linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>, >>> Tony Nguyen <anthony.l.nguyen@intel.com>, Bjorn Helgaas >>> <bhelgaas@google.com>, >>> intel-wired-lan@lists.osuosl.org, Jiri Slaby <jirislaby@kernel.org>, >>> Len Brown <lenb@kernel.org> >>> Errors-To: intel-wired-lan-bounces@osuosl.org >>> Sender: "Intel-wired-lan" <intel-wired-lan-bounces@osuosl.org> >>> >>> >>> >>> Dne 21. 08. 24 v 17:17 Vitaly Lifshits napsal(a): >>>> >>>> On 8/21/2024 5:59 PM, Bjorn Helgaas wrote: >>>>> [+to Dima, Vitaly, Hui; beginning of thread at >>>>> https://lore.kernel.org/r/60ac8988-ace4-4cf0-8c44-028ca741c0a1@kernel.org] >>>>> >>>>> On Wed, Aug 21, 2024 at 01:39:11PM +0200, Petr Valenta wrote: >>>>>> Dne 20. 08. 24 v 23:30 Bjorn Helgaas napsal(a): >>>>>>> On Tue, Aug 20, 2024 at 11:13:54PM +0200, Petr Valenta wrote: >>>>>>>> Dne 20. 08. 24 v 20:09 Bjorn Helgaas napsal(a): >>>>>>>>> On Mon, Aug 19, 2024 at 07:23:42AM +0200, Jiri Slaby wrote: >>>>>>>>>> On 19. 08. 24, 6:50, Jiri Slaby wrote: >>>>>>>>>>> CC e1000e guys + Jesse (due to 75a3f93b5383) + Bjorn (due to >>>>>>>>>>> b2c289415b2b) >>>>>> ... >>>>>>> I'm at a loss. You could try reverting the entire b2c289415b2b >>>>>>> commit >>>>>>> (patch for that is below). >>>>>> This patch didn't help, so I reverted it back. >>>>>> >>>>>>> If that doesn't help, I guess you could try reverting the other >>>>>>> commits Jiri mentioned: >>>>>>> >>>>>>> 76a0a3f9cc2f e1000e: fix force smbus during suspend flow >>>>>>> c93a6f62cb1b e1000e: Fix S0ix residency on corporate systems >>>>>>> bfd546a552e1 e1000e: move force SMBUS near the end of >>>>>>> enable_ulp function >>>>>>> 6918107e2540 net: e1000e & ixgbe: Remove PCI_HEADER_TYPE_MFD >>>>>>> duplicates >>>>>>> 1eb2cded45b3 net: annotate writes on dev->mtu from >>>>>>> ndo_change_mtu() >>>>>>> b2c289415b2b e1000e: Remove redundant runtime resume for >>>>>>> ethtool_ops >>>>>>> 75a3f93b5383 net: intel: implement modern PM ops declarations >>>>>>> >>>>>>> If you do this, I would revert 76a0a3f9cc2f, test, then revert >>>>>>> c93a6f62cb1b in addition, test, then revert bfd546a552e1 in >>>>>>> addition, >>>>>>> etc. >>>>>> I have created revert patches like this: >>>>>> git format-patch --stdout -1 76a0a3f9cc2f | interdiff -q /dev/stdin \ >>>>>> /dev/null > revert_76a0a3f9cc2f.patch >>>>>> >>>>>> I have applied revert_76a0a3f9cc2f.patch (rebuild and tested), >>>>>> then in >>>>>> addition revert_c93a6f62cb1b.patch and after applying >>>>>> revert_bfd546a552e1.patch irq storm didn't appear. >>>>>> >>>>>> I have tested it with 3 subsequent reboots and in all those cases >>>>>> it was ok. >>>>> Thanks for all this testing. It sounds like reverting all three of >>>>> 76a0a3f9cc2f, c93a6f62cb1b, and bfd546a552e1 fixed the IRQ storm, but >>>>> I'm not clear on the results of other situations. >>>>> >>>>> It looks like c93a6f62cb1b could be reverted by itself because it's >>>>> unrelated to 76a0a3f9cc2f and bfd546a552e1. I added the authors of >>>>> all three in case they have any insights. >>>>> >>>>> Bjorn >>>> >>>> >>>> I doubt that it is related to c93a6f62cb1b, I believe that is more >>>> probable to be related to the two other patches. >>>> >>>> Apart from what I suggested in the other mailing thread (enabling >>>> e1000e debug and to test if it happens with a cable connected), >>>> >>>> I suggest to try to apply this patch and see if it fixes the issue: >>>> >>>> https://patchwork.ozlabs.org/project/intel-wired-lan/patch/20240806132348.880744-1-vitaly.lifshits@intel.com/ >>> >>> I have applied patch from link above and command bellow really >>> doesn't start irq storm. >>> >>> echo 'auto' > /sys/bus/pci/devices/0000:00:1f.6/power/control >>> >>> Problem is that after executing this command and plugging cable to >>> ethernet port, kernel is not able to detect link (LED indicate link >>> is on) so network over cable is not working. >>> >> >> I have tested now how it behaves with kernel 6.9.9. There is a new >> finding. After running "echo 'auto' > >> /sys/bus/pci/devices/0000:00:1f.6/power/control" network over cable >> works but irq storm arrises. I have never tested this before because I >> don't use cable with this laptop at all. After unplugging cable irq >> storm is gone. >> >> A possible workaround would be to turn off power control for the >> e1000e at the kernel level (if is it possible) so that utilities like >> powertop don't cause irq storm or broken network. >> > I would like to suggest the following for now as a triage: > > 1. revert the following patches: > > i. e1000e: fix force smbus during suspend flow (76a0a3f9cc2f) > > ii. e1000e: move force SMBUS near the end of enable_ulp function > (bfd546a552e1) > > iii. "e1000e: move force SMBUS from enable ulp function to avoid PHY > loss issue ( 861e8086029e) > > > 2. apply this patch: > https://patchwork.ozlabs.org/project/intel-wired-lan/patch/20240806132348.880744-1-vitaly.lifshits@intel.com/ > > > I expect this configuration not to have neither the IRQ storm nor the > link indication issue (>"Problem is that after executing this command > and plugging cable to ethernet port, kernel is not able to detect link > (LED indicate link is on) so network over cable is not working.") > Link detection is fine now but irq storm is still present when cable is plugged. irq/9-acpi utilizes 85% of one cpu core. >>>> >>>> >>> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: ACPI IRQ storm with 6.10 2024-08-19 5:23 ` Jiri Slaby 2024-08-19 16:47 ` Bjorn Helgaas 2024-08-20 18:09 ` Bjorn Helgaas @ 2024-08-20 18:44 ` Bjorn Helgaas 2 siblings, 0 replies; 19+ messages in thread From: Bjorn Helgaas @ 2024-08-20 18:44 UTC (permalink / raw) To: Tony Nguyen, Przemek Kitszel Cc: Bjorn Helgaas, Len Brown, linux-acpi@vger.kernel.org, Linux kernel mailing list, Linux regressions mailing list, intel-wired-lan, Rafael J. Wysocki, Petr Valenta, Jiri Slaby [+to Tony, Przemek for e1000e questions; -cc Jesse] On Mon, Aug 19, 2024 at 07:23:42AM +0200, Jiri Slaby wrote: > On 19. 08. 24, 6:50, Jiri Slaby wrote: > > CC e1000e guys + Jesse (due to 75a3f93b5383) + Bjorn (due to b2c289415b2b) > > Bjorn, > > I am confused by these changes: > ========================================== > @@ -291,16 +288,13 @@ static int e1000_set_link_ksettings(struct net_device > *net > dev, > * duplex is forced. > */ > if (cmd->base.eth_tp_mdix_ctrl) { > - if (hw->phy.media_type != e1000_media_type_copper) { > - ret_val = -EOPNOTSUPP; > - goto out; > - } > + if (hw->phy.media_type != e1000_media_type_copper) > + return -EOPNOTSUPP; > > if ((cmd->base.eth_tp_mdix_ctrl != ETH_TP_MDI_AUTO) && > (cmd->base.autoneg != AUTONEG_ENABLE)) { > e_err("forcing MDI/MDI-X state is not supported when > lin > k speed and/or duplex are forced\n"); > - ret_val = -EINVAL; > - goto out; > + return -EINVAL; > } > } > > @@ -347,7 +341,6 @@ static int e1000_set_link_ksettings(struct net_device > *netde > v, > } > > out: > - pm_runtime_put_sync(netdev->dev.parent); > clear_bit(__E1000_RESETTING, &adapter->state); > return ret_val; > } > ========================================== > > So no more clear_bit(__E1000_RESETTING in the above fail paths. Is that > intentional? I don't remember if it was intentional, but the use of __E1000_RESETTING is a bit subtle and I don't know what is correct. Here's how it was used before I changed it with b2c289415b2b, i.e., in https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/intel/e1000e/ethtool.c?id=39f59c72ad3a: e1000_set_link_ksettings(...) { if (hw->phy.ops.check_reset_block(hw)) { ret_val = -EINVAL; goto out; } while (test_and_set_bit(__E1000_RESETTING, &adapter->state)) usleep_range(1000, 2000); if (err) { ret_val = -EINVAL; goto out; } ... out: clear_bit(__E1000_RESETTING, &adapter->state); } In this case, we *always* clear __E1000_RESETTING, even if we bail out before the test_and_set_bit(__E1000_RESETTING). It makes sense to me that we clear __E1000_RESETTING after we've set it via test_and_set_bit() because we know it was set *here*. But it seems wrong to me that we clear __E1000_RESETTING even when we haven't done the test_and_set_bit() because it may have been set by a concurrent thread executing a different operation. e1000_set_ringparam(...) { if ((ring->rx_mini_pending) || (ring->rx_jumbo_pending)) return -EINVAL; while (test_and_set_bit(__E1000_RESETTING, &adapter->state)) usleep_range(1000, 2000); err = e1000e_setup_tx_resources(...); if (err) goto out; ... out: clear_bit(__E1000_RESETTING, &adapter->state); } But here, we *don't* clear __E1000_RESETTING if we bail out before the test_and_set_bit(__E1000_RESETTING). This seems like the correct behavior. In the e1000 driver (not the e1000e driver), e1000_set_link_ksettings() does *not* clear __E1000_RESETTING unless it has already done the test_and_set_bit(). b2c289415b2b changed e1000e to work that way, too. FWIW, 3ef672ab1862 ("e1000e: ethtool unnecessarily takes device out of RPM suspend") changed e1000e e1000_set_link_ksettings() to clear __E1000_RESETTING even when bailing out before the test_and_set_bit(). That part of 3ef672ab1862 looks possibly buggy to me. Bjorn ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2024-08-22 10:25 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-14 5:22 ACPI IRQ storm with 6.10 Jiri Slaby
2024-08-14 6:47 ` Jiri Slaby
2024-08-16 18:29 ` Rafael J. Wysocki
2024-08-16 21:36 ` Petr Valenta
2024-08-17 17:57 ` Petr Valenta
2024-08-19 4:50 ` Jiri Slaby
2024-08-19 5:23 ` Jiri Slaby
2024-08-19 16:47 ` Bjorn Helgaas
2024-08-20 18:09 ` Bjorn Helgaas
2024-08-20 21:13 ` Petr Valenta
2024-08-20 21:30 ` Bjorn Helgaas
2024-08-21 5:09 ` Jiri Slaby
2024-08-21 11:39 ` Petr Valenta
2024-08-21 14:59 ` Bjorn Helgaas
[not found] ` <1041b9b5-cc78-13b1-459a-d1d3a313475a@intel.com>
2024-08-22 7:44 ` Petr Valenta
2024-08-22 8:33 ` Petr Valenta
2024-08-22 9:18 ` Vitaly Lifshits
2024-08-22 10:24 ` Petr Valenta
2024-08-20 18:44 ` Bjorn Helgaas
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox