linux-input.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [REGRESSSION] on linux-next (next-20250509)
@ 2025-05-28 10:08 Borah, Chaitanya Kumar
  2025-05-28 13:07 ` Luke Jones
  0 siblings, 1 reply; 9+ messages in thread
From: Borah, Chaitanya Kumar @ 2025-05-28 10:08 UTC (permalink / raw)
  To: luke@ljones.dev
  Cc: intel-xe@lists.freedesktop.org, intel-gfx@lists.freedesktop.org,
	Saarinen, Jani, Kurmi, Suresh Kumar, De Marchi, Lucas,
	Nikula, Jani, linux-input@vger.kernel.org,
	platform-driver-x86@vger.kernel.org

Hello Luke,

Hope you are doing well. I am Chaitanya from the linux graphics team in Intel.

This mail is regarding a regression we are seeing in our CI runs[1] on linux-next repository.

Since the version next-20250509 [2], we are seeing the following regression

`````````````````````````````````````````````````````````````````````````````````
<4>[    5.400826] ------------[ cut here ]------------
<4>[    5.400832] list_add double add: new=ffffffffa07c0ca0, prev=ffffffff837e9a60, next=ffffffffa07c0ca0.
<4>[    5.400845] WARNING: CPU: 0 PID: 379 at lib/list_debug.c:35 __list_add_valid_or_report+0xdc/0xf0
<4>[    5.400850] Modules linked in: cmdlinepart(+) eeepc_wmi(+) asus_nb_wmi(+) asus_wmi spi_nor(+) sparse_keymap mei_pxp mtd platform_profile kvm_intel(+) mei_hdcp wmi_bmof kvm irqbypass polyval_clmulni usbhid ghash_clmulni_intel snd_hda_intel hid sha1_ssse3 r8152(+) binfmt_misc aesni_intel snd_intel_dspcfg mii r8169 snd_hda_codec rapl video snd_hda_core intel_cstate snd_hwdep realtek snd_pcm snd_timer mei_me snd i2c_i801 i2c_mux spi_intel_pci idma64 soundcore spi_intel i2c_smbus mei intel_pmc_core nls_iso8859_1 pmt_telemetry pmt_class intel_pmc_ssram_telemetry pinctrl_alderlake intel_vsec acpi_tad wmi acpi_pad dm_multipath msr nvme_fabrics fuse efi_pstore nfnetlink ip_tables x_tables autofs4
<4>[    5.400904] CPU: 0 UID: 0 PID: 379 Comm: (udev-worker) Tainted: G S                  6.15.0-rc7-next-20250526-next-20250526-g3be1a7a31fbd+ #1 PREEMPT(voluntary) 
<4>[    5.400907] Tainted: [S]=CPU_OUT_OF_SPEC
<4>[    5.400908] Hardware name: ASUS System Product Name/PRIME Z790-P WIFI, BIOS 0812 02/24/2023
<4>[    5.400909] RIP: 0010:__list_add_valid_or_report+0xdc/0xf0
<4>[    5.400912] Code: 16 48 89 f1 4c 89 e6 e8 a2 c5 5f ff 0f 0b 31 c0 e9 72 ff ff ff 48 89 f2 4c 89 e1 48 89 fe 48 c7 c7 68 ba 0f 83 e8 84 c5 5f ff <0f> 0b 31 c0 e9 54 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 90 90
<4>[    5.400914] RSP: 0018:ffffc90002763588 EFLAGS: 00010246
<4>[    5.400916] RAX: 0000000000000000 RBX: ffffffffa07c0ca0 RCX: 0000000000000000
<4>[    5.400918] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
<4>[    5.400919] RBP: ffffc90002763598 R08: 0000000000000000 R09: 0000000000000000
<4>[    5.400920] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa07c0ca0
<4>[    5.400921] R13: ffffffffa07c0ca0 R14: 0000000000000000 R15: ffff8881212d6da0
<4>[    5.400923] FS:  0000778637b418c0(0000) GS:ffff8888dad0c000(0000) knlGS:0000000000000000
<4>[    5.400926] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[    5.400928] CR2: 00007786373b80b2 CR3: 0000000116faa000 CR4: 0000000000f50ef0
<4>[    5.400931] PKRU: 55555554
<4>[    5.400933] Call Trace:
<4>[    5.400935]  <TASK>
<4>[    5.400937]  ? lock_system_sleep+0x2b/0x40
<4>[    5.400942]  acpi_register_lps0_dev+0x58/0xb0
<4>[    5.400949]  asus_wmi_probe+0x7f/0x1930 [asus_wmi]
<4>[    5.400956]  ? kernfs_create_link+0x69/0xe0
`````````````````````````````````````````````````````````````````````````````````
Detailed log can be found in [3].

After bisecting the tree, the following patch [4] seems to be the first "bad"
commit

`````````````````````````````````````````````````````````````````````````````````````````````````````````
commit feea7bd6b02d43a794e3f065650d89cf8d8e8e59
Author: Luke D. Jones mailto:luke@ljones.dev
Date:   Sun Mar 23 15:34:21 2025 +1300

    platform/x86: asus-wmi: Refactor Ally suspend/resume
`````````````````````````````````````````````````````````````````````````````````````````````````````````

We could not revert the patch because of merge conflict but resetting to the parent of the commit seems to fix the issue

Could you please check why the patch causes this regression and provide a fix if necessary?

Thank you.

Regards

Chaitanya

[1] https://intel-gfx-ci.01.org/tree/linux-next/combined-alt.html?
[2] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20250509 
[3] https://intel-gfx-ci.01.org/tree/linux-next/next-20250526/bat-rpls-4/boot0.txt 
[4] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20250509&id=feea7bd6b02d43a794e3f065650d89cf8d8e8e59



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [REGRESSSION] on linux-next (next-20250509)
  2025-05-28 10:08 [REGRESSSION] on linux-next (next-20250509) Borah, Chaitanya Kumar
@ 2025-05-28 13:07 ` Luke Jones
  2025-05-28 15:40   ` Kurt Borja
                     ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Luke Jones @ 2025-05-28 13:07 UTC (permalink / raw)
  To: Borah, Chaitanya Kumar
  Cc: intel-xe@lists.freedesktop.org, intel-gfx@lists.freedesktop.org,
	Saarinen, Jani, Kurmi, Suresh Kumar, De Marchi, Lucas,
	Nikula, Jani, linux-input@vger.kernel.org,
	platform-driver-x86@vger.kernel.org

On Wed, 28 May 2025, at 12:08 PM, Borah, Chaitanya Kumar wrote:
> Hello Luke,
>
> Hope you are doing well. I am Chaitanya from the linux graphics team in Intel.
>
> This mail is regarding a regression we are seeing in our CI runs[1] on 
> linux-next repository.

Can you tell me if the fix here was included?
https://lkml.org/lkml/2025/5/24/152

I could change to:
static void asus_s2idle_check_register(void)
{
    // Only register for Ally devices
    if (dmi_check_system(asus_rog_ally_device)) {
        if (acpi_register_lps0_dev(&asus_ally_s2idle_dev_ops))
            pr_warn("failed to register LPS0 sleep handler in asus-wmi\n");
    }
}

but I don't really understand what is happening here. The inner lps0 functions won't run unless use_ally_mcu_hack is set.

I will do my best to fix but I need to understand what happened a bit better.

regards,
Luke.

> Since the version next-20250509 [2], we are seeing the following regression
>
> `````````````````````````````````````````````````````````````````````````````````
> <4>[    5.400826] ------------[ cut here ]------------
> <4>[    5.400832] list_add double add: new=ffffffffa07c0ca0, 
> prev=ffffffff837e9a60, next=ffffffffa07c0ca0.
> <4>[    5.400845] WARNING: CPU: 0 PID: 379 at lib/list_debug.c:35 
> __list_add_valid_or_report+0xdc/0xf0
> <4>[    5.400850] Modules linked in: cmdlinepart(+) eeepc_wmi(+) 
> asus_nb_wmi(+) asus_wmi spi_nor(+) sparse_keymap mei_pxp mtd 
> platform_profile kvm_intel(+) mei_hdcp wmi_bmof kvm irqbypass 
> polyval_clmulni usbhid ghash_clmulni_intel snd_hda_intel hid sha1_ssse3 
> r8152(+) binfmt_misc aesni_intel snd_intel_dspcfg mii r8169 
> snd_hda_codec rapl video snd_hda_core intel_cstate snd_hwdep realtek 
> snd_pcm snd_timer mei_me snd i2c_i801 i2c_mux spi_intel_pci idma64 
> soundcore spi_intel i2c_smbus mei intel_pmc_core nls_iso8859_1 
> pmt_telemetry pmt_class intel_pmc_ssram_telemetry pinctrl_alderlake 
> intel_vsec acpi_tad wmi acpi_pad dm_multipath msr nvme_fabrics fuse 
> efi_pstore nfnetlink ip_tables x_tables autofs4
> <4>[    5.400904] CPU: 0 UID: 0 PID: 379 Comm: (udev-worker) Tainted: G 
> S                  
> 6.15.0-rc7-next-20250526-next-20250526-g3be1a7a31fbd+ #1 
> PREEMPT(voluntary) 
> <4>[    5.400907] Tainted: [S]=CPU_OUT_OF_SPEC
> <4>[    5.400908] Hardware name: ASUS System Product Name/PRIME Z790-P 
> WIFI, BIOS 0812 02/24/2023
> <4>[    5.400909] RIP: 0010:__list_add_valid_or_report+0xdc/0xf0
> <4>[    5.400912] Code: 16 48 89 f1 4c 89 e6 e8 a2 c5 5f ff 0f 0b 31 c0 
> e9 72 ff ff ff 48 89 f2 4c 89 e1 48 89 fe 48 c7 c7 68 ba 0f 83 e8 84 c5 
> 5f ff <0f> 0b 31 c0 e9 54 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 90 
> 90
> <4>[    5.400914] RSP: 0018:ffffc90002763588 EFLAGS: 00010246
> <4>[    5.400916] RAX: 0000000000000000 RBX: ffffffffa07c0ca0 RCX: 
> 0000000000000000
> <4>[    5.400918] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
> 0000000000000000
> <4>[    5.400919] RBP: ffffc90002763598 R08: 0000000000000000 R09: 
> 0000000000000000
> <4>[    5.400920] R10: 0000000000000000 R11: 0000000000000000 R12: 
> ffffffffa07c0ca0
> <4>[    5.400921] R13: ffffffffa07c0ca0 R14: 0000000000000000 R15: 
> ffff8881212d6da0
> <4>[    5.400923] FS:  0000778637b418c0(0000) GS:ffff8888dad0c000(0000) 
> knlGS:0000000000000000
> <4>[    5.400926] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4>[    5.400928] CR2: 00007786373b80b2 CR3: 0000000116faa000 CR4: 
> 0000000000f50ef0
> <4>[    5.400931] PKRU: 55555554
> <4>[    5.400933] Call Trace:
> <4>[    5.400935]  <TASK>
> <4>[    5.400937]  ? lock_system_sleep+0x2b/0x40
> <4>[    5.400942]  acpi_register_lps0_dev+0x58/0xb0
> <4>[    5.400949]  asus_wmi_probe+0x7f/0x1930 [asus_wmi]
> <4>[    5.400956]  ? kernfs_create_link+0x69/0xe0
> `````````````````````````````````````````````````````````````````````````````````
> Detailed log can be found in [3].
>
> After bisecting the tree, the following patch [4] seems to be the first "bad"
> commit
>
> `````````````````````````````````````````````````````````````````````````````````````````````````````````
> commit feea7bd6b02d43a794e3f065650d89cf8d8e8e59
> Author: Luke D. Jones mailto:luke@ljones.dev
> Date:   Sun Mar 23 15:34:21 2025 +1300
>
>     platform/x86: asus-wmi: Refactor Ally suspend/resume
> `````````````````````````````````````````````````````````````````````````````````````````````````````````
>
> We could not revert the patch because of merge conflict but resetting 
> to the parent of the commit seems to fix the issue
>
> Could you please check why the patch causes this regression and provide 
> a fix if necessary?
>
> Thank you.
>
> Regards
>
> Chaitanya
>
> [1] https://intel-gfx-ci.01.org/tree/linux-next/combined-alt.html?
> [2] 
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20250509 
> [3] 
> https://intel-gfx-ci.01.org/tree/linux-next/next-20250526/bat-rpls-4/boot0.txt 
> [4] 
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20250509&id=feea7bd6b02d43a794e3f065650d89cf8d8e8e59

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [REGRESSSION] on linux-next (next-20250509)
  2025-05-28 13:07 ` Luke Jones
@ 2025-05-28 15:40   ` Kurt Borja
  2025-06-09 11:06     ` [REGRESSION] " Borah, Chaitanya Kumar
  2025-06-02 14:28   ` [REGRESSSION] " Borah, Chaitanya Kumar
  2025-07-03 14:43   ` Lucas De Marchi
  2 siblings, 1 reply; 9+ messages in thread
From: Kurt Borja @ 2025-05-28 15:40 UTC (permalink / raw)
  To: Luke Jones, Borah, Chaitanya Kumar
  Cc: intel-xe@lists.freedesktop.org, intel-gfx@lists.freedesktop.org,
	Saarinen, Jani, Kurmi, Suresh Kumar, De Marchi, Lucas,
	Nikula, Jani, linux-input@vger.kernel.org,
	platform-driver-x86@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 5995 bytes --]

Hi Luke,

On Wed May 28, 2025 at 10:07 AM -03, Luke Jones wrote:
> On Wed, 28 May 2025, at 12:08 PM, Borah, Chaitanya Kumar wrote:
>> Hello Luke,
>>
>> Hope you are doing well. I am Chaitanya from the linux graphics team in Intel.
>>
>> This mail is regarding a regression we are seeing in our CI runs[1] on 
>> linux-next repository.
>
> Can you tell me if the fix here was included?
> https://lkml.org/lkml/2025/5/24/152
>
> I could change to:
> static void asus_s2idle_check_register(void)
> {
>     // Only register for Ally devices
>     if (dmi_check_system(asus_rog_ally_device)) {
>         if (acpi_register_lps0_dev(&asus_ally_s2idle_dev_ops))
>             pr_warn("failed to register LPS0 sleep handler in asus-wmi\n");
>     }
> }
>
> but I don't really understand what is happening here. The inner lps0 functions won't run unless use_ally_mcu_hack is set.

The RIP is caused by a "list_add double add" warning.

After reading the log, I believe this is happening because
asus_wmi_register_driver() is called a second time by eeepc_wmi after
asus_nb_wmi, which implies

	asus_wmi_probe()
	  -> acpi_register_lps0_dev(&asus_ally_s2idle_dev_ops)

is called twice and the warning is triggered.

Line [1] makes me think this could be a race condition, as
asus_wmi_register_driver() may be called concurrently.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86.git/tree/drivers/platform/x86/asus-wmi.c?h=for-next#n5101

>
> I will do my best to fix but I need to understand what happened a bit better.
>
> regards,
> Luke.
>
>> Since the version next-20250509 [2], we are seeing the following regression
>>
>> `````````````````````````````````````````````````````````````````````````````````
>> <4>[    5.400826] ------------[ cut here ]------------
>> <4>[    5.400832] list_add double add: new=ffffffffa07c0ca0, 
>> prev=ffffffff837e9a60, next=ffffffffa07c0ca0.
>> <4>[    5.400845] WARNING: CPU: 0 PID: 379 at lib/list_debug.c:35 
>> __list_add_valid_or_report+0xdc/0xf0
>> <4>[    5.400850] Modules linked in: cmdlinepart(+) eeepc_wmi(+) 
>> asus_nb_wmi(+) asus_wmi spi_nor(+) sparse_keymap mei_pxp mtd 
>> platform_profile kvm_intel(+) mei_hdcp wmi_bmof kvm irqbypass 
>> polyval_clmulni usbhid ghash_clmulni_intel snd_hda_intel hid sha1_ssse3 
>> r8152(+) binfmt_misc aesni_intel snd_intel_dspcfg mii r8169 
>> snd_hda_codec rapl video snd_hda_core intel_cstate snd_hwdep realtek 
>> snd_pcm snd_timer mei_me snd i2c_i801 i2c_mux spi_intel_pci idma64 
>> soundcore spi_intel i2c_smbus mei intel_pmc_core nls_iso8859_1 
>> pmt_telemetry pmt_class intel_pmc_ssram_telemetry pinctrl_alderlake 
>> intel_vsec acpi_tad wmi acpi_pad dm_multipath msr nvme_fabrics fuse 
>> efi_pstore nfnetlink ip_tables x_tables autofs4
>> <4>[    5.400904] CPU: 0 UID: 0 PID: 379 Comm: (udev-worker) Tainted: G 
>> S                  
>> 6.15.0-rc7-next-20250526-next-20250526-g3be1a7a31fbd+ #1 
>> PREEMPT(voluntary) 
>> <4>[    5.400907] Tainted: [S]=CPU_OUT_OF_SPEC
>> <4>[    5.400908] Hardware name: ASUS System Product Name/PRIME Z790-P 
>> WIFI, BIOS 0812 02/24/2023
>> <4>[    5.400909] RIP: 0010:__list_add_valid_or_report+0xdc/0xf0
>> <4>[    5.400912] Code: 16 48 89 f1 4c 89 e6 e8 a2 c5 5f ff 0f 0b 31 c0 
>> e9 72 ff ff ff 48 89 f2 4c 89 e1 48 89 fe 48 c7 c7 68 ba 0f 83 e8 84 c5 
>> 5f ff <0f> 0b 31 c0 e9 54 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 90 
>> 90
>> <4>[    5.400914] RSP: 0018:ffffc90002763588 EFLAGS: 00010246
>> <4>[    5.400916] RAX: 0000000000000000 RBX: ffffffffa07c0ca0 RCX: 
>> 0000000000000000
>> <4>[    5.400918] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
>> 0000000000000000
>> <4>[    5.400919] RBP: ffffc90002763598 R08: 0000000000000000 R09: 
>> 0000000000000000
>> <4>[    5.400920] R10: 0000000000000000 R11: 0000000000000000 R12: 
>> ffffffffa07c0ca0
>> <4>[    5.400921] R13: ffffffffa07c0ca0 R14: 0000000000000000 R15: 
>> ffff8881212d6da0
>> <4>[    5.400923] FS:  0000778637b418c0(0000) GS:ffff8888dad0c000(0000) 
>> knlGS:0000000000000000
>> <4>[    5.400926] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> <4>[    5.400928] CR2: 00007786373b80b2 CR3: 0000000116faa000 CR4: 
>> 0000000000f50ef0
>> <4>[    5.400931] PKRU: 55555554
>> <4>[    5.400933] Call Trace:
>> <4>[    5.400935]  <TASK>
>> <4>[    5.400937]  ? lock_system_sleep+0x2b/0x40
>> <4>[    5.400942]  acpi_register_lps0_dev+0x58/0xb0
>> <4>[    5.400949]  asus_wmi_probe+0x7f/0x1930 [asus_wmi]
>> <4>[    5.400956]  ? kernfs_create_link+0x69/0xe0
>> `````````````````````````````````````````````````````````````````````````````````
>> Detailed log can be found in [3].
>>
>> After bisecting the tree, the following patch [4] seems to be the first "bad"
>> commit
>>
>> `````````````````````````````````````````````````````````````````````````````````````````````````````````
>> commit feea7bd6b02d43a794e3f065650d89cf8d8e8e59
>> Author: Luke D. Jones mailto:luke@ljones.dev
>> Date:   Sun Mar 23 15:34:21 2025 +1300
>>
>>     platform/x86: asus-wmi: Refactor Ally suspend/resume
>> `````````````````````````````````````````````````````````````````````````````````````````````````````````
>>
>> We could not revert the patch because of merge conflict but resetting 
>> to the parent of the commit seems to fix the issue
>>
>> Could you please check why the patch causes this regression and provide 
>> a fix if necessary?
>>
>> Thank you.
>>
>> Regards
>>
>> Chaitanya
>>
>> [1] https://intel-gfx-ci.01.org/tree/linux-next/combined-alt.html?
>> [2] 
>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20250509 
>> [3] 
>> https://intel-gfx-ci.01.org/tree/linux-next/next-20250526/bat-rpls-4/boot0.txt 
>> [4] 
>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20250509&id=feea7bd6b02d43a794e3f065650d89cf8d8e8e59


-- 
 ~ Kurt


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [REGRESSSION] on linux-next (next-20250509)
  2025-05-28 13:07 ` Luke Jones
  2025-05-28 15:40   ` Kurt Borja
@ 2025-06-02 14:28   ` Borah, Chaitanya Kumar
  2025-07-03 14:43   ` Lucas De Marchi
  2 siblings, 0 replies; 9+ messages in thread
From: Borah, Chaitanya Kumar @ 2025-06-02 14:28 UTC (permalink / raw)
  To: Luke Jones
  Cc: intel-xe@lists.freedesktop.org, intel-gfx@lists.freedesktop.org,
	Saarinen, Jani, Kurmi, Suresh Kumar, De Marchi, Lucas,
	Nikula, Jani, linux-input@vger.kernel.org,
	platform-driver-x86@vger.kernel.org



> -----Original Message-----
> From: Luke Jones <luke@ljones.dev>
> Sent: Wednesday, May 28, 2025 6:38 PM
> To: Borah, Chaitanya Kumar <chaitanya.kumar.borah@intel.com>
> Cc: intel-xe@lists.freedesktop.org; intel-gfx@lists.freedesktop.org; Saarinen,
> Jani <jani.saarinen@intel.com>; Kurmi, Suresh Kumar
> <suresh.kumar.kurmi@intel.com>; De Marchi, Lucas
> <lucas.demarchi@intel.com>; Nikula, Jani <jani.nikula@intel.com>; linux-
> input@vger.kernel.org; platform-driver-x86@vger.kernel.org
> Subject: Re: [REGRESSSION] on linux-next (next-20250509)
> 
> On Wed, 28 May 2025, at 12:08 PM, Borah, Chaitanya Kumar wrote:
> > Hello Luke,
> >
> > Hope you are doing well. I am Chaitanya from the linux graphics team in
> Intel.
> >
> > This mail is regarding a regression we are seeing in our CI runs[1] on
> > linux-next repository.
> 
> Can you tell me if the fix here was included?
> https://lkml.org/lkml/2025/5/24/152
> 

We already have this change in the "bad" version.

> I could change to:
> static void asus_s2idle_check_register(void) {
>     // Only register for Ally devices
>     if (dmi_check_system(asus_rog_ally_device)) {
>         if (acpi_register_lps0_dev(&asus_ally_s2idle_dev_ops))
>             pr_warn("failed to register LPS0 sleep handler in asus-wmi\n");
>     }
> }
> 

With this change issue is not seen.

Regards

Chaitanya

> but I don't really understand what is happening here. The inner lps0 functions
> won't run unless use_ally_mcu_hack is set.
> 
> I will do my best to fix but I need to understand what happened a bit better.
> 
> regards,
> Luke.
> 
> > Since the version next-20250509 [2], we are seeing the following
> > regression
> >
> > `````````````````````````````````````````````````````````````````````````````````
> > <4>[    5.400826] ------------[ cut here ]------------
> > <4>[    5.400832] list_add double add: new=ffffffffa07c0ca0,
> > prev=ffffffff837e9a60, next=ffffffffa07c0ca0.
> > <4>[    5.400845] WARNING: CPU: 0 PID: 379 at lib/list_debug.c:35
> > __list_add_valid_or_report+0xdc/0xf0
> > <4>[    5.400850] Modules linked in: cmdlinepart(+) eeepc_wmi(+)
> > asus_nb_wmi(+) asus_wmi spi_nor(+) sparse_keymap mei_pxp mtd
> > platform_profile kvm_intel(+) mei_hdcp wmi_bmof kvm irqbypass
> > polyval_clmulni usbhid ghash_clmulni_intel snd_hda_intel hid
> > sha1_ssse3
> > r8152(+) binfmt_misc aesni_intel snd_intel_dspcfg mii r8169
> > snd_hda_codec rapl video snd_hda_core intel_cstate snd_hwdep realtek
> > snd_pcm snd_timer mei_me snd i2c_i801 i2c_mux spi_intel_pci idma64
> > soundcore spi_intel i2c_smbus mei intel_pmc_core nls_iso8859_1
> > pmt_telemetry pmt_class intel_pmc_ssram_telemetry pinctrl_alderlake
> > intel_vsec acpi_tad wmi acpi_pad dm_multipath msr nvme_fabrics fuse
> > efi_pstore nfnetlink ip_tables x_tables autofs4
> > <4>[    5.400904] CPU: 0 UID: 0 PID: 379 Comm: (udev-worker) Tainted: G
> > S
> > 6.15.0-rc7-next-20250526-next-20250526-g3be1a7a31fbd+ #1
> > PREEMPT(voluntary)
> > <4>[    5.400907] Tainted: [S]=CPU_OUT_OF_SPEC
> > <4>[    5.400908] Hardware name: ASUS System Product Name/PRIME
> Z790-P
> > WIFI, BIOS 0812 02/24/2023
> > <4>[    5.400909] RIP: 0010:__list_add_valid_or_report+0xdc/0xf0
> > <4>[    5.400912] Code: 16 48 89 f1 4c 89 e6 e8 a2 c5 5f ff 0f 0b 31 c0
> > e9 72 ff ff ff 48 89 f2 4c 89 e1 48 89 fe 48 c7 c7 68 ba 0f 83 e8 84
> > c5 5f ff <0f> 0b 31 c0 e9 54 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00
> > 90
> > 90
> > <4>[    5.400914] RSP: 0018:ffffc90002763588 EFLAGS: 00010246
> > <4>[    5.400916] RAX: 0000000000000000 RBX: ffffffffa07c0ca0 RCX:
> > 0000000000000000
> > <4>[    5.400918] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> > 0000000000000000
> > <4>[    5.400919] RBP: ffffc90002763598 R08: 0000000000000000 R09:
> > 0000000000000000
> > <4>[    5.400920] R10: 0000000000000000 R11: 0000000000000000 R12:
> > ffffffffa07c0ca0
> > <4>[    5.400921] R13: ffffffffa07c0ca0 R14: 0000000000000000 R15:
> > ffff8881212d6da0
> > <4>[    5.400923] FS:  0000778637b418c0(0000)
> GS:ffff8888dad0c000(0000)
> > knlGS:0000000000000000
> > <4>[    5.400926] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > <4>[    5.400928] CR2: 00007786373b80b2 CR3: 0000000116faa000 CR4:
> > 0000000000f50ef0
> > <4>[    5.400931] PKRU: 55555554
> > <4>[    5.400933] Call Trace:
> > <4>[    5.400935]  <TASK>
> > <4>[    5.400937]  ? lock_system_sleep+0x2b/0x40
> > <4>[    5.400942]  acpi_register_lps0_dev+0x58/0xb0
> > <4>[    5.400949]  asus_wmi_probe+0x7f/0x1930 [asus_wmi]
> > <4>[    5.400956]  ? kernfs_create_link+0x69/0xe0
> > ``````````````````````````````````````````````````````````````````````
> > ```````````
> > Detailed log can be found in [3].
> >
> > After bisecting the tree, the following patch [4] seems to be the first "bad"
> > commit
> >
> > ``````````````````````````````````````````````````````````````````````
> > ```````````````````````````````````
> > commit feea7bd6b02d43a794e3f065650d89cf8d8e8e59
> > Author: Luke D. Jones mailto:luke@ljones.dev
> > Date:   Sun Mar 23 15:34:21 2025 +1300
> >
> >     platform/x86: asus-wmi: Refactor Ally suspend/resume
> > ``````````````````````````````````````````````````````````````````````
> > ```````````````````````````````````
> >
> > We could not revert the patch because of merge conflict but resetting
> > to the parent of the commit seems to fix the issue
> >
> > Could you please check why the patch causes this regression and
> > provide a fix if necessary?
> >
> > Thank you.
> >
> > Regards
> >
> > Chaitanya
> >
> > [1] https://intel-gfx-ci.01.org/tree/linux-next/combined-alt.html?
> > [2]
> > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/co
> > mmit/?h=next-20250509
> > [3]
> > https://intel-gfx-ci.01.org/tree/linux-next/next-20250526/bat-rpls-4/b
> > oot0.txt
> > [4]
> > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-
> next.git/commit/?h=next-
> 20250509&id=feea7bd6b02d43a794e3f065650d89cf8d8e8e59

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [REGRESSION] on linux-next (next-20250509)
  2025-05-28 15:40   ` Kurt Borja
@ 2025-06-09 11:06     ` Borah, Chaitanya Kumar
  2025-06-10  0:30       ` Luke Jones
  0 siblings, 1 reply; 9+ messages in thread
From: Borah, Chaitanya Kumar @ 2025-06-09 11:06 UTC (permalink / raw)
  To: Kurt Borja, Luke Jones
  Cc: intel-xe@lists.freedesktop.org, intel-gfx@lists.freedesktop.org,
	Saarinen, Jani, Kurmi, Suresh Kumar, De Marchi, Lucas,
	Nikula, Jani, linux-input@vger.kernel.org,
	platform-driver-x86@vger.kernel.org

Hi Luke,


> -----Original Message-----
> From: Kurt Borja <kuurtb@gmail.com>
> Sent: Wednesday, May 28, 2025 9:11 PM
> To: Luke Jones <luke@ljones.dev>; Borah, Chaitanya Kumar
> <chaitanya.kumar.borah@intel.com>
> Cc: intel-xe@lists.freedesktop.org; intel-gfx@lists.freedesktop.org; Saarinen,
> Jani <jani.saarinen@intel.com>; Kurmi, Suresh Kumar
> <suresh.kumar.kurmi@intel.com>; De Marchi, Lucas
> <lucas.demarchi@intel.com>; Nikula, Jani <jani.nikula@intel.com>; linux-
> input@vger.kernel.org; platform-driver-x86@vger.kernel.org
> Subject: Re: [REGRESSSION] on linux-next (next-20250509)
> 
> Hi Luke,
> 
> On Wed May 28, 2025 at 10:07 AM -03, Luke Jones wrote:
> > On Wed, 28 May 2025, at 12:08 PM, Borah, Chaitanya Kumar wrote:
> >> Hello Luke,
> >>
> >> Hope you are doing well. I am Chaitanya from the linux graphics team in
> Intel.
> >>
> >> This mail is regarding a regression we are seeing in our CI runs[1]
> >> on linux-next repository.
> >
> > Can you tell me if the fix here was included?
> > https://lkml.org/lkml/2025/5/24/152
> >
> > I could change to:
> > static void asus_s2idle_check_register(void) {
> >     // Only register for Ally devices
> >     if (dmi_check_system(asus_rog_ally_device)) {
> >         if (acpi_register_lps0_dev(&asus_ally_s2idle_dev_ops))
> >             pr_warn("failed to register LPS0 sleep handler in asus-wmi\n");
> >     }
> > }
> >
> > but I don't really understand what is happening here. The inner lps0
> functions won't run unless use_ally_mcu_hack is set.
> 
> The RIP is caused by a "list_add double add" warning.
> 
> After reading the log, I believe this is happening because
> asus_wmi_register_driver() is called a second time by eeepc_wmi after
> asus_nb_wmi, which implies
> 
> 	asus_wmi_probe()
> 	  -> acpi_register_lps0_dev(&asus_ally_s2idle_dev_ops)
> 
> is called twice and the warning is triggered.
> 
> Line [1] makes me think this could be a race condition, as
> asus_wmi_register_driver() may be called concurrently.
> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-
> x86.git/tree/drivers/platform/x86/asus-wmi.c?h=for-next#n5101
> 

Any update on this? It has now hit  6.16-rc1

https://intel-gfx-ci.01.org/tree/drm-tip/igt@runner@aborted.html

Regards

Chaitanya

> >
> > I will do my best to fix but I need to understand what happened a bit better.
> >
> > regards,
> > Luke.
> >
> >> Since the version next-20250509 [2], we are seeing the following
> >> regression
> >>
> >> `````````````````````````````````````````````````````````````````````````````````
> >> <4>[    5.400826] ------------[ cut here ]------------
> >> <4>[    5.400832] list_add double add: new=ffffffffa07c0ca0,
> >> prev=ffffffff837e9a60, next=ffffffffa07c0ca0.
> >> <4>[    5.400845] WARNING: CPU: 0 PID: 379 at lib/list_debug.c:35
> >> __list_add_valid_or_report+0xdc/0xf0
> >> <4>[    5.400850] Modules linked in: cmdlinepart(+) eeepc_wmi(+)
> >> asus_nb_wmi(+) asus_wmi spi_nor(+) sparse_keymap mei_pxp mtd
> >> platform_profile kvm_intel(+) mei_hdcp wmi_bmof kvm irqbypass
> >> polyval_clmulni usbhid ghash_clmulni_intel snd_hda_intel hid
> >> sha1_ssse3
> >> r8152(+) binfmt_misc aesni_intel snd_intel_dspcfg mii r8169
> >> snd_hda_codec rapl video snd_hda_core intel_cstate snd_hwdep realtek
> >> snd_pcm snd_timer mei_me snd i2c_i801 i2c_mux spi_intel_pci idma64
> >> soundcore spi_intel i2c_smbus mei intel_pmc_core nls_iso8859_1
> >> pmt_telemetry pmt_class intel_pmc_ssram_telemetry pinctrl_alderlake
> >> intel_vsec acpi_tad wmi acpi_pad dm_multipath msr nvme_fabrics fuse
> >> efi_pstore nfnetlink ip_tables x_tables autofs4
> >> <4>[    5.400904] CPU: 0 UID: 0 PID: 379 Comm: (udev-worker) Tainted: G
> >> S
> >> 6.15.0-rc7-next-20250526-next-20250526-g3be1a7a31fbd+ #1
> >> PREEMPT(voluntary)
> >> <4>[    5.400907] Tainted: [S]=CPU_OUT_OF_SPEC
> >> <4>[    5.400908] Hardware name: ASUS System Product Name/PRIME
> Z790-P
> >> WIFI, BIOS 0812 02/24/2023
> >> <4>[    5.400909] RIP: 0010:__list_add_valid_or_report+0xdc/0xf0
> >> <4>[    5.400912] Code: 16 48 89 f1 4c 89 e6 e8 a2 c5 5f ff 0f 0b 31 c0
> >> e9 72 ff ff ff 48 89 f2 4c 89 e1 48 89 fe 48 c7 c7 68 ba 0f 83 e8 84
> >> c5 5f ff <0f> 0b 31 c0 e9 54 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00
> >> 00 90
> >> 90
> >> <4>[    5.400914] RSP: 0018:ffffc90002763588 EFLAGS: 00010246
> >> <4>[    5.400916] RAX: 0000000000000000 RBX: ffffffffa07c0ca0 RCX:
> >> 0000000000000000
> >> <4>[    5.400918] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> >> 0000000000000000
> >> <4>[    5.400919] RBP: ffffc90002763598 R08: 0000000000000000 R09:
> >> 0000000000000000
> >> <4>[    5.400920] R10: 0000000000000000 R11: 0000000000000000 R12:
> >> ffffffffa07c0ca0
> >> <4>[    5.400921] R13: ffffffffa07c0ca0 R14: 0000000000000000 R15:
> >> ffff8881212d6da0
> >> <4>[    5.400923] FS:  0000778637b418c0(0000) GS:ffff8888dad0c000(0000)
> >> knlGS:0000000000000000
> >> <4>[    5.400926] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> <4>[    5.400928] CR2: 00007786373b80b2 CR3: 0000000116faa000 CR4:
> >> 0000000000f50ef0
> >> <4>[    5.400931] PKRU: 55555554
> >> <4>[    5.400933] Call Trace:
> >> <4>[    5.400935]  <TASK>
> >> <4>[    5.400937]  ? lock_system_sleep+0x2b/0x40
> >> <4>[    5.400942]  acpi_register_lps0_dev+0x58/0xb0
> >> <4>[    5.400949]  asus_wmi_probe+0x7f/0x1930 [asus_wmi]
> >> <4>[    5.400956]  ? kernfs_create_link+0x69/0xe0
> >> `````````````````````````````````````````````````````````````````````
> >> ````````````
> >> Detailed log can be found in [3].
> >>
> >> After bisecting the tree, the following patch [4] seems to be the first "bad"
> >> commit
> >>
> >> `````````````````````````````````````````````````````````````````````
> >> ````````````````````````````````````
> >> commit feea7bd6b02d43a794e3f065650d89cf8d8e8e59
> >> Author: Luke D. Jones mailto:luke@ljones.dev
> >> Date:   Sun Mar 23 15:34:21 2025 +1300
> >>
> >>     platform/x86: asus-wmi: Refactor Ally suspend/resume
> >> `````````````````````````````````````````````````````````````````````
> >> ````````````````````````````````````
> >>
> >> We could not revert the patch because of merge conflict but resetting
> >> to the parent of the commit seems to fix the issue
> >>
> >> Could you please check why the patch causes this regression and
> >> provide a fix if necessary?
> >>
> >> Thank you.
> >>
> >> Regards
> >>
> >> Chaitanya
> >>
> >> [1] https://intel-gfx-ci.01.org/tree/linux-next/combined-alt.html?
> >> [2]
> >> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/c
> >> ommit/?h=next-20250509
> >> [3]
> >> https://intel-gfx-ci.01.org/tree/linux-next/next-20250526/bat-rpls-4/
> >> boot0.txt
> >> [4]
> >> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/c
> >> ommit/?h=next-
> 20250509&id=feea7bd6b02d43a794e3f065650d89cf8d8e8e59
> 
> 
> --
>  ~ Kurt


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [REGRESSION] on linux-next (next-20250509)
  2025-06-09 11:06     ` [REGRESSION] " Borah, Chaitanya Kumar
@ 2025-06-10  0:30       ` Luke Jones
  2025-06-17 12:32         ` Borah, Chaitanya Kumar
  0 siblings, 1 reply; 9+ messages in thread
From: Luke Jones @ 2025-06-10  0:30 UTC (permalink / raw)
  To: Chaitanya Kumar Borah, Kurt Borja
  Cc: intel-xe@lists.freedesktop.org, intel-gfx@lists.freedesktop.org,
	Jani Saarinen, Suresh Kumar Kurmi, De Marchi, Lucas, Jani Nikula,
	linux-input@vger.kernel.org, platform-driver-x86@vger.kernel.org

On Mon, 9 Jun 2025, at 11:06 PM, Borah, Chaitanya Kumar wrote:
> Hi Luke,
>
>
>> -----Original Message-----
>> From: Kurt Borja <kuurtb@gmail.com>
>> Sent: Wednesday, May 28, 2025 9:11 PM
>> To: Luke Jones <luke@ljones.dev>; Borah, Chaitanya Kumar
>> <chaitanya.kumar.borah@intel.com>
>> Cc: intel-xe@lists.freedesktop.org; intel-gfx@lists.freedesktop.org; Saarinen,
>> Jani <jani.saarinen@intel.com>; Kurmi, Suresh Kumar
>> <suresh.kumar.kurmi@intel.com>; De Marchi, Lucas
>> <lucas.demarchi@intel.com>; Nikula, Jani <jani.nikula@intel.com>; linux-
>> input@vger.kernel.org; platform-driver-x86@vger.kernel.org
>> Subject: Re: [REGRESSSION] on linux-next (next-20250509)
>> 
>> Hi Luke,
>> 
>> On Wed May 28, 2025 at 10:07 AM -03, Luke Jones wrote:
>> > On Wed, 28 May 2025, at 12:08 PM, Borah, Chaitanya Kumar wrote:
>> >> Hello Luke,
>> >>
>> >> Hope you are doing well. I am Chaitanya from the linux graphics team in
>> Intel.
>> >>
>> >> This mail is regarding a regression we are seeing in our CI runs[1]
>> >> on linux-next repository.
>> >
>> > Can you tell me if the fix here was included?
>> > https://lkml.org/lkml/2025/5/24/152
>> >
>> > I could change to:
>> > static void asus_s2idle_check_register(void) {
>> >     // Only register for Ally devices
>> >     if (dmi_check_system(asus_rog_ally_device)) {
>> >         if (acpi_register_lps0_dev(&asus_ally_s2idle_dev_ops))
>> >             pr_warn("failed to register LPS0 sleep handler in asus-wmi\n");
>> >     }
>> > }
>> >
>> > but I don't really understand what is happening here. The inner lps0
>> functions won't run unless use_ally_mcu_hack is set.
>> 
>> The RIP is caused by a "list_add double add" warning.
>> 
>> After reading the log, I believe this is happening because
>> asus_wmi_register_driver() is called a second time by eeepc_wmi after
>> asus_nb_wmi, which implies
>> 
>> 	asus_wmi_probe()
>> 	  -> acpi_register_lps0_dev(&asus_ally_s2idle_dev_ops)
>> 
>> is called twice and the warning is triggered.
>> 
>> Line [1] makes me think this could be a race condition, as
>> asus_wmi_register_driver() may be called concurrently.
>> 
>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-
>> x86.git/tree/drivers/platform/x86/asus-wmi.c?h=for-next#n5101
>> 
>
> Any update on this? It has now hit  6.16-rc1
>
> https://intel-gfx-ci.01.org/tree/drm-tip/igt@runner@aborted.html

I will send a patch asap. Haven't been able to do so with work and 3 days of flights.

> Regards
>
> Chaitanya
>
>> >
>> > I will do my best to fix but I need to understand what happened a bit better.
>> >
>> > regards,
>> > Luke.
>> >
>> >> Since the version next-20250509 [2], we are seeing the following
>> >> regression
>> >>
>> >> `````````````````````````````````````````````````````````````````````````````````
>> >> <4>[    5.400826] ------------[ cut here ]------------
>> >> <4>[    5.400832] list_add double add: new=ffffffffa07c0ca0,
>> >> prev=ffffffff837e9a60, next=ffffffffa07c0ca0.
>> >> <4>[    5.400845] WARNING: CPU: 0 PID: 379 at lib/list_debug.c:35
>> >> __list_add_valid_or_report+0xdc/0xf0
>> >> <4>[    5.400850] Modules linked in: cmdlinepart(+) eeepc_wmi(+)
>> >> asus_nb_wmi(+) asus_wmi spi_nor(+) sparse_keymap mei_pxp mtd
>> >> platform_profile kvm_intel(+) mei_hdcp wmi_bmof kvm irqbypass
>> >> polyval_clmulni usbhid ghash_clmulni_intel snd_hda_intel hid
>> >> sha1_ssse3
>> >> r8152(+) binfmt_misc aesni_intel snd_intel_dspcfg mii r8169
>> >> snd_hda_codec rapl video snd_hda_core intel_cstate snd_hwdep realtek
>> >> snd_pcm snd_timer mei_me snd i2c_i801 i2c_mux spi_intel_pci idma64
>> >> soundcore spi_intel i2c_smbus mei intel_pmc_core nls_iso8859_1
>> >> pmt_telemetry pmt_class intel_pmc_ssram_telemetry pinctrl_alderlake
>> >> intel_vsec acpi_tad wmi acpi_pad dm_multipath msr nvme_fabrics fuse
>> >> efi_pstore nfnetlink ip_tables x_tables autofs4
>> >> <4>[    5.400904] CPU: 0 UID: 0 PID: 379 Comm: (udev-worker) Tainted: G
>> >> S
>> >> 6.15.0-rc7-next-20250526-next-20250526-g3be1a7a31fbd+ #1
>> >> PREEMPT(voluntary)
>> >> <4>[    5.400907] Tainted: [S]=CPU_OUT_OF_SPEC
>> >> <4>[    5.400908] Hardware name: ASUS System Product Name/PRIME
>> Z790-P
>> >> WIFI, BIOS 0812 02/24/2023
>> >> <4>[    5.400909] RIP: 0010:__list_add_valid_or_report+0xdc/0xf0
>> >> <4>[    5.400912] Code: 16 48 89 f1 4c 89 e6 e8 a2 c5 5f ff 0f 0b 31 c0
>> >> e9 72 ff ff ff 48 89 f2 4c 89 e1 48 89 fe 48 c7 c7 68 ba 0f 83 e8 84
>> >> c5 5f ff <0f> 0b 31 c0 e9 54 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00
>> >> 00 90
>> >> 90
>> >> <4>[    5.400914] RSP: 0018:ffffc90002763588 EFLAGS: 00010246
>> >> <4>[    5.400916] RAX: 0000000000000000 RBX: ffffffffa07c0ca0 RCX:
>> >> 0000000000000000
>> >> <4>[    5.400918] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
>> >> 0000000000000000
>> >> <4>[    5.400919] RBP: ffffc90002763598 R08: 0000000000000000 R09:
>> >> 0000000000000000
>> >> <4>[    5.400920] R10: 0000000000000000 R11: 0000000000000000 R12:
>> >> ffffffffa07c0ca0
>> >> <4>[    5.400921] R13: ffffffffa07c0ca0 R14: 0000000000000000 R15:
>> >> ffff8881212d6da0
>> >> <4>[    5.400923] FS:  0000778637b418c0(0000) GS:ffff8888dad0c000(0000)
>> >> knlGS:0000000000000000
>> >> <4>[    5.400926] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> >> <4>[    5.400928] CR2: 00007786373b80b2 CR3: 0000000116faa000 CR4:
>> >> 0000000000f50ef0
>> >> <4>[    5.400931] PKRU: 55555554
>> >> <4>[    5.400933] Call Trace:
>> >> <4>[    5.400935]  <TASK>
>> >> <4>[    5.400937]  ? lock_system_sleep+0x2b/0x40
>> >> <4>[    5.400942]  acpi_register_lps0_dev+0x58/0xb0
>> >> <4>[    5.400949]  asus_wmi_probe+0x7f/0x1930 [asus_wmi]
>> >> <4>[    5.400956]  ? kernfs_create_link+0x69/0xe0
>> >> `````````````````````````````````````````````````````````````````````
>> >> ````````````
>> >> Detailed log can be found in [3].
>> >>
>> >> After bisecting the tree, the following patch [4] seems to be the first "bad"
>> >> commit
>> >>
>> >> `````````````````````````````````````````````````````````````````````
>> >> ````````````````````````````````````
>> >> commit feea7bd6b02d43a794e3f065650d89cf8d8e8e59
>> >> Author: Luke D. Jones mailto:luke@ljones.dev
>> >> Date:   Sun Mar 23 15:34:21 2025 +1300
>> >>
>> >>     platform/x86: asus-wmi: Refactor Ally suspend/resume
>> >> `````````````````````````````````````````````````````````````````````
>> >> ````````````````````````````````````
>> >>
>> >> We could not revert the patch because of merge conflict but resetting
>> >> to the parent of the commit seems to fix the issue
>> >>
>> >> Could you please check why the patch causes this regression and
>> >> provide a fix if necessary?
>> >>
>> >> Thank you.
>> >>
>> >> Regards
>> >>
>> >> Chaitanya
>> >>
>> >> [1] https://intel-gfx-ci.01.org/tree/linux-next/combined-alt.html?
>> >> [2]
>> >> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/c
>> >> ommit/?h=next-20250509
>> >> [3]
>> >> https://intel-gfx-ci.01.org/tree/linux-next/next-20250526/bat-rpls-4/
>> >> boot0.txt
>> >> [4]
>> >> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/c
>> >> ommit/?h=next-
>> 20250509&id=feea7bd6b02d43a794e3f065650d89cf8d8e8e59
>> 
>> 
>> --
>>  ~ Kurt

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [REGRESSION] on linux-next (next-20250509)
  2025-06-10  0:30       ` Luke Jones
@ 2025-06-17 12:32         ` Borah, Chaitanya Kumar
  0 siblings, 0 replies; 9+ messages in thread
From: Borah, Chaitanya Kumar @ 2025-06-17 12:32 UTC (permalink / raw)
  To: Luke Jones, Kurt Borja
  Cc: intel-xe@lists.freedesktop.org, intel-gfx@lists.freedesktop.org,
	Saarinen, Jani, Kurmi, Suresh Kumar, De Marchi, Lucas,
	Nikula, Jani, linux-input@vger.kernel.org,
	platform-driver-x86@vger.kernel.org

Hi Luke,

> -----Original Message-----
> From: Luke Jones <luke@ljones.dev>
> Sent: Tuesday, June 10, 2025 6:00 AM
> To: Borah, Chaitanya Kumar <chaitanya.kumar.borah@intel.com>; Kurt Borja
> <kuurtb@gmail.com>
> Cc: intel-xe@lists.freedesktop.org; intel-gfx@lists.freedesktop.org; Saarinen,
> Jani <jani.saarinen@intel.com>; Kurmi, Suresh Kumar
> <suresh.kumar.kurmi@intel.com>; De Marchi, Lucas
> <lucas.demarchi@intel.com>; Nikula, Jani <jani.nikula@intel.com>; linux-
> input@vger.kernel.org; platform-driver-x86@vger.kernel.org
> Subject: Re: [REGRESSION] on linux-next (next-20250509)
> 
> On Mon, 9 Jun 2025, at 11:06 PM, Borah, Chaitanya Kumar wrote:
> > Hi Luke,
> >
> >
> >> -----Original Message-----
> >> From: Kurt Borja <kuurtb@gmail.com>
> >> Sent: Wednesday, May 28, 2025 9:11 PM
> >> To: Luke Jones <luke@ljones.dev>; Borah, Chaitanya Kumar
> >> <chaitanya.kumar.borah@intel.com>
> >> Cc: intel-xe@lists.freedesktop.org; intel-gfx@lists.freedesktop.org;
> >> Saarinen, Jani <jani.saarinen@intel.com>; Kurmi, Suresh Kumar
> >> <suresh.kumar.kurmi@intel.com>; De Marchi, Lucas
> >> <lucas.demarchi@intel.com>; Nikula, Jani <jani.nikula@intel.com>;
> >> linux- input@vger.kernel.org; platform-driver-x86@vger.kernel.org
> >> Subject: Re: [REGRESSSION] on linux-next (next-20250509)
> >>
> >> Hi Luke,
> >>
> >> On Wed May 28, 2025 at 10:07 AM -03, Luke Jones wrote:
> >> > On Wed, 28 May 2025, at 12:08 PM, Borah, Chaitanya Kumar wrote:
> >> >> Hello Luke,
> >> >>
> >> >> Hope you are doing well. I am Chaitanya from the linux graphics
> >> >> team in
> >> Intel.
> >> >>
> >> >> This mail is regarding a regression we are seeing in our CI
> >> >> runs[1] on linux-next repository.
> >> >
> >> > Can you tell me if the fix here was included?
> >> > https://lkml.org/lkml/2025/5/24/152
> >> >
> >> > I could change to:
> >> > static void asus_s2idle_check_register(void) {
> >> >     // Only register for Ally devices
> >> >     if (dmi_check_system(asus_rog_ally_device)) {
> >> >         if (acpi_register_lps0_dev(&asus_ally_s2idle_dev_ops))
> >> >             pr_warn("failed to register LPS0 sleep handler in asus-wmi\n");
> >> >     }
> >> > }
> >> >
> >> > but I don't really understand what is happening here. The inner
> >> > lps0
> >> functions won't run unless use_ally_mcu_hack is set.
> >>
> >> The RIP is caused by a "list_add double add" warning.
> >>
> >> After reading the log, I believe this is happening because
> >> asus_wmi_register_driver() is called a second time by eeepc_wmi after
> >> asus_nb_wmi, which implies
> >>
> >> 	asus_wmi_probe()
> >> 	  -> acpi_register_lps0_dev(&asus_ally_s2idle_dev_ops)
> >>
> >> is called twice and the warning is triggered.
> >>
> >> Line [1] makes me think this could be a race condition, as
> >> asus_wmi_register_driver() may be called concurrently.
> >>
> >> [1]
> >> https://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-driver
> >> s-
> >> x86.git/tree/drivers/platform/x86/asus-wmi.c?h=for-next#n5101
> >>
> >
> > Any update on this? It has now hit  6.16-rc1
> >
> > https://intel-gfx-ci.01.org/tree/drm-tip/igt@runner@aborted.html
> 
> I will send a patch asap. Haven't been able to do so with work and 3 days of
> flights.
> 

Gentle reminder. 

> > Regards
> >
> > Chaitanya
> >
> >> >
> >> > I will do my best to fix but I need to understand what happened a bit
> better.
> >> >
> >> > regards,
> >> > Luke.
> >> >
> >> >> Since the version next-20250509 [2], we are seeing the following
> >> >> regression
> >> >>
> >> >> `````````````````````````````````````````````````````````````````````````````````
> >> >> <4>[    5.400826] ------------[ cut here ]------------
> >> >> <4>[    5.400832] list_add double add: new=ffffffffa07c0ca0,
> >> >> prev=ffffffff837e9a60, next=ffffffffa07c0ca0.
> >> >> <4>[    5.400845] WARNING: CPU: 0 PID: 379 at lib/list_debug.c:35
> >> >> __list_add_valid_or_report+0xdc/0xf0
> >> >> <4>[    5.400850] Modules linked in: cmdlinepart(+) eeepc_wmi(+)
> >> >> asus_nb_wmi(+) asus_wmi spi_nor(+) sparse_keymap mei_pxp mtd
> >> >> platform_profile kvm_intel(+) mei_hdcp wmi_bmof kvm irqbypass
> >> >> polyval_clmulni usbhid ghash_clmulni_intel snd_hda_intel hid
> >> >> sha1_ssse3
> >> >> r8152(+) binfmt_misc aesni_intel snd_intel_dspcfg mii r8169
> >> >> snd_hda_codec rapl video snd_hda_core intel_cstate snd_hwdep
> >> >> realtek snd_pcm snd_timer mei_me snd i2c_i801 i2c_mux
> >> >> spi_intel_pci idma64 soundcore spi_intel i2c_smbus mei
> >> >> intel_pmc_core nls_iso8859_1 pmt_telemetry pmt_class
> >> >> intel_pmc_ssram_telemetry pinctrl_alderlake intel_vsec acpi_tad
> >> >> wmi acpi_pad dm_multipath msr nvme_fabrics fuse efi_pstore nfnetlink
> ip_tables x_tables autofs4
> >> >> <4>[    5.400904] CPU: 0 UID: 0 PID: 379 Comm: (udev-worker) Tainted:
> G
> >> >> S
> >> >> 6.15.0-rc7-next-20250526-next-20250526-g3be1a7a31fbd+ #1
> >> >> PREEMPT(voluntary)
> >> >> <4>[    5.400907] Tainted: [S]=CPU_OUT_OF_SPEC
> >> >> <4>[    5.400908] Hardware name: ASUS System Product Name/PRIME
> >> Z790-P
> >> >> WIFI, BIOS 0812 02/24/2023
> >> >> <4>[    5.400909] RIP: 0010:__list_add_valid_or_report+0xdc/0xf0
> >> >> <4>[    5.400912] Code: 16 48 89 f1 4c 89 e6 e8 a2 c5 5f ff 0f 0b 31 c0
> >> >> e9 72 ff ff ff 48 89 f2 4c 89 e1 48 89 fe 48 c7 c7 68 ba 0f 83 e8
> >> >> 84
> >> >> c5 5f ff <0f> 0b 31 c0 e9 54 ff ff ff 66 66 2e 0f 1f 84 00 00 00
> >> >> 00
> >> >> 00 90
> >> >> 90
> >> >> <4>[    5.400914] RSP: 0018:ffffc90002763588 EFLAGS: 00010246
> >> >> <4>[    5.400916] RAX: 0000000000000000 RBX: ffffffffa07c0ca0 RCX:
> >> >> 0000000000000000
> >> >> <4>[    5.400918] RDX: 0000000000000000 RSI: 0000000000000000
> RDI:
> >> >> 0000000000000000
> >> >> <4>[    5.400919] RBP: ffffc90002763598 R08: 0000000000000000
> R09:
> >> >> 0000000000000000
> >> >> <4>[    5.400920] R10: 0000000000000000 R11: 0000000000000000
> R12:
> >> >> ffffffffa07c0ca0
> >> >> <4>[    5.400921] R13: ffffffffa07c0ca0 R14: 0000000000000000 R15:
> >> >> ffff8881212d6da0
> >> >> <4>[    5.400923] FS:  0000778637b418c0(0000)
> GS:ffff8888dad0c000(0000)
> >> >> knlGS:0000000000000000
> >> >> <4>[    5.400926] CS:  0010 DS: 0000 ES: 0000 CR0:
> 0000000080050033
> >> >> <4>[    5.400928] CR2: 00007786373b80b2 CR3: 0000000116faa000
> CR4:
> >> >> 0000000000f50ef0
> >> >> <4>[    5.400931] PKRU: 55555554
> >> >> <4>[    5.400933] Call Trace:
> >> >> <4>[    5.400935]  <TASK>
> >> >> <4>[    5.400937]  ? lock_system_sleep+0x2b/0x40
> >> >> <4>[    5.400942]  acpi_register_lps0_dev+0x58/0xb0
> >> >> <4>[    5.400949]  asus_wmi_probe+0x7f/0x1930 [asus_wmi]
> >> >> <4>[    5.400956]  ? kernfs_create_link+0x69/0xe0
> >> >> ``````````````````````````````````````````````````````````````````
> >> >> ```
> >> >> ````````````
> >> >> Detailed log can be found in [3].
> >> >>
> >> >> After bisecting the tree, the following patch [4] seems to be the first
> "bad"
> >> >> commit
> >> >>
> >> >> ``````````````````````````````````````````````````````````````````
> >> >> ``` ````````````````````````````````````
> >> >> commit feea7bd6b02d43a794e3f065650d89cf8d8e8e59
> >> >> Author: Luke D. Jones mailto:luke@ljones.dev
> >> >> Date:   Sun Mar 23 15:34:21 2025 +1300
> >> >>
> >> >>     platform/x86: asus-wmi: Refactor Ally suspend/resume
> >> >> ``````````````````````````````````````````````````````````````````
> >> >> ``` ````````````````````````````````````
> >> >>
> >> >> We could not revert the patch because of merge conflict but
> >> >> resetting to the parent of the commit seems to fix the issue
> >> >>
> >> >> Could you please check why the patch causes this regression and
> >> >> provide a fix if necessary?
> >> >>
> >> >> Thank you.
> >> >>
> >> >> Regards
> >> >>
> >> >> Chaitanya
> >> >>
> >> >> [1] https://intel-gfx-ci.01.org/tree/linux-next/combined-alt.html?
> >> >> [2]
> >> >> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.gi
> >> >> t/c
> >> >> ommit/?h=next-20250509
> >> >> [3]
> >> >> https://intel-gfx-ci.01.org/tree/linux-next/next-20250526/bat-rpls
> >> >> -4/
> >> >> boot0.txt
> >> >> [4]
> >> >> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.gi
> >> >> t/c
> >> >> ommit/?h=next-
> >> 20250509&id=feea7bd6b02d43a794e3f065650d89cf8d8e8e59
> >>
> >>
> >> --
> >>  ~ Kurt

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [REGRESSSION] on linux-next (next-20250509)
  2025-05-28 13:07 ` Luke Jones
  2025-05-28 15:40   ` Kurt Borja
  2025-06-02 14:28   ` [REGRESSSION] " Borah, Chaitanya Kumar
@ 2025-07-03 14:43   ` Lucas De Marchi
  2025-08-21 19:26     ` Rodrigo Vivi
  2 siblings, 1 reply; 9+ messages in thread
From: Lucas De Marchi @ 2025-07-03 14:43 UTC (permalink / raw)
  To: Luke Jones
  Cc: Borah, Chaitanya Kumar, intel-xe@lists.freedesktop.org,
	intel-gfx@lists.freedesktop.org, Saarinen, Jani,
	Kurmi, Suresh Kumar, Nikula, Jani, linux-input@vger.kernel.org,
	platform-driver-x86@vger.kernel.org, Corentin Chary,
	Hans de Goede, Ilpo Järvinen

Hi,

On Wed, May 28, 2025 at 03:07:51PM +0200, Luke Jones wrote:
>On Wed, 28 May 2025, at 12:08 PM, Borah, Chaitanya Kumar wrote:
>> Hello Luke,
>>
>> Hope you are doing well. I am Chaitanya from the linux graphics team in Intel.
>>
>> This mail is regarding a regression we are seeing in our CI runs[1] on
>> linux-next repository.
>
>Can you tell me if the fix here was included?
>https://lkml.org/lkml/2025/5/24/152
>
>I could change to:
>static void asus_s2idle_check_register(void)
>{
>    // Only register for Ally devices
>    if (dmi_check_system(asus_rog_ally_device)) {
>        if (acpi_register_lps0_dev(&asus_ally_s2idle_dev_ops))
>            pr_warn("failed to register LPS0 sleep handler in asus-wmi\n");
>    }
>}
>
>but I don't really understand what is happening here. The inner lps0 functions won't run unless use_ally_mcu_hack is set.
>
>I will do my best to fix but I need to understand what happened a bit better.

Any updates here? This is basically killing our tests for drm-xe-fixes
we are submitting to 6.16 since it taints the kernel. If we can't fix,
maybe it's already late enough in RCs that we should need a revert?

FWIW, for 6.17 we have a branch on the side we also merge before testing
and we've been including the change above to stop it from killing the
rest of our CI:
https://gitlab.freedesktop.org/drm/i915/kernel/-/commit/e9d0926aa1c6afcc920013c39d5bd6dd85f581fb

Lucas De Marchi

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [REGRESSSION] on linux-next (next-20250509)
  2025-07-03 14:43   ` Lucas De Marchi
@ 2025-08-21 19:26     ` Rodrigo Vivi
  0 siblings, 0 replies; 9+ messages in thread
From: Rodrigo Vivi @ 2025-08-21 19:26 UTC (permalink / raw)
  To: Lucas De Marchi
  Cc: Luke Jones, Borah, Chaitanya Kumar,
	intel-xe@lists.freedesktop.org, intel-gfx@lists.freedesktop.org,
	Saarinen, Jani, Kurmi, Suresh Kumar, Nikula, Jani,
	linux-input@vger.kernel.org, platform-driver-x86@vger.kernel.org,
	Corentin Chary, Hans de Goede, Ilpo Järvinen

On Thu, Jul 03, 2025 at 09:43:41AM -0500, Lucas De Marchi wrote:
> Hi,
> 
> On Wed, May 28, 2025 at 03:07:51PM +0200, Luke Jones wrote:
> > On Wed, 28 May 2025, at 12:08 PM, Borah, Chaitanya Kumar wrote:
> > > Hello Luke,
> > > 
> > > Hope you are doing well. I am Chaitanya from the linux graphics team in Intel.
> > > 
> > > This mail is regarding a regression we are seeing in our CI runs[1] on
> > > linux-next repository.
> > 
> > Can you tell me if the fix here was included?
> > https://lkml.org/lkml/2025/5/24/152
> > 
> > I could change to:
> > static void asus_s2idle_check_register(void)
> > {
> >    // Only register for Ally devices
> >    if (dmi_check_system(asus_rog_ally_device)) {
> >        if (acpi_register_lps0_dev(&asus_ally_s2idle_dev_ops))
> >            pr_warn("failed to register LPS0 sleep handler in asus-wmi\n");
> >    }
> > }
> > 
> > but I don't really understand what is happening here. The inner lps0 functions won't run unless use_ally_mcu_hack is set.
> > 
> > I will do my best to fix but I need to understand what happened a bit better.

Hi Luke, is there anything we could do to help here? Any log or info that
could help from this machine?

This bug is blocking some of our CI runs here.

Thanks,
Rodrigo.

> 
> Any updates here? This is basically killing our tests for drm-xe-fixes
> we are submitting to 6.16 since it taints the kernel. If we can't fix,
> maybe it's already late enough in RCs that we should need a revert?
> 
> FWIW, for 6.17 we have a branch on the side we also merge before testing
> and we've been including the change above to stop it from killing the
> rest of our CI:
> https://gitlab.freedesktop.org/drm/i915/kernel/-/commit/e9d0926aa1c6afcc920013c39d5bd6dd85f581fb
> 
> Lucas De Marchi

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-08-21 19:26 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-28 10:08 [REGRESSSION] on linux-next (next-20250509) Borah, Chaitanya Kumar
2025-05-28 13:07 ` Luke Jones
2025-05-28 15:40   ` Kurt Borja
2025-06-09 11:06     ` [REGRESSION] " Borah, Chaitanya Kumar
2025-06-10  0:30       ` Luke Jones
2025-06-17 12:32         ` Borah, Chaitanya Kumar
2025-06-02 14:28   ` [REGRESSSION] " Borah, Chaitanya Kumar
2025-07-03 14:43   ` Lucas De Marchi
2025-08-21 19:26     ` Rodrigo Vivi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).