Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [REGRESSION] on linux-next (next-20250507)
@ 2025-05-12  6:00 Borah, Chaitanya Kumar
  2025-05-12 12:21 ` Jason Gunthorpe
                   ` (8 more replies)
  0 siblings, 9 replies; 13+ messages in thread
From: Borah, Chaitanya Kumar @ 2025-05-12  6:00 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: intel-xe@lists.freedesktop.org, intel-gfx@lists.freedesktop.org,
	Saarinen, Jani, Nikula, Jani, Kurmi, Suresh Kumar,
	De Marchi, Lucas, iommu@lists.linux.dev

Hello Jason,

Hope you are doing well. I am Chaitanya from the linux graphics team in Intel.

This mail is regarding a regression we are seeing in our CI runs[1] on linux-next repository.

Since the version next-20250507 [2], we are seeing the following regression (currently being masked by another regression)

`````````````````````````````````````````````````````````````````````````````````
<1> [47.242057] BUG: kernel NULL pointer dereference, address: 0000000000000698
<1> [47.242061] #PF: supervisor read access in kernel mode
<1> [47.242063] #PF: error_code(0x0000) - not-present page
<6> [47.242065] PGD 0 P4D 0 
<4> [47.242068] Oops: Oops: 0000 [#1] SMP NOPTI
<4> [47.242070] CPU: 10 UID: 0 PID: 201 Comm: kworker/10:1 Tainted: G S   U              6.15.0-rc5-next-20250507-next-20250507-g08710e696081+ #1 PREEMPT(voluntary) 
<4> [47.242075] Tainted: [S]=CPU_OUT_OF_SPEC, [U]=USER
<4> [47.242077] Hardware name: ASUS System Product Name/PRIME Z790-P WIFI, BIOS 0812 02/24/2023
<4> [47.242080] Workqueue: events delayed_fput
<4> [47.242083] RIP: 0010:__lruvec_stat_mod_folio+0x9b/0x250
<4> [47.242086] Code: 0f 85 4e 01 00 00 49 63 d4 48 85 db 0f 84 5c 01 00 00 0f 1f 44 00 00 49 63 86 80 a3 02 00 48 8b 9c c3 50 09 00 00 48 83 c3 40 <4c> 3b b3 58 06 00 00 0f 85 94 01 00 00 44 89 ee 4c 89 f7 e8 ad 64
<4> [47.242091] RSP: 0018:ffffc90000314d38 EFLAGS: 00010002
<4> [47.242093] RAX: 0000000000000000 RBX: 0000000000000040 RCX: 0000000000000000
<4> [47.242096] RDX: ffffffffffffff00 RSI: 0000000000000000 RDI: 0000000000000000
<4> [47.242098] RBP: ffffc90000314d68 R08: 0000000000000000 R09: 0000000000000000
<4> [47.242101] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000ffffff00
<4> [47.242103] R13: 0000000000000027 R14: ffff88887f352e40 R15: ffff88885f04b010
<4> [47.242105] FS:  0000000000000000(0000) GS:ffff8888db212000(0000) knlGS:0000000000000000
<4> [47.242108] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [47.242111] CR2: 0000000000000698 CR3: 000000012d335000 CR4: 0000000000f52ef0
<4> [47.242113] PKRU: 55555554
<4> [47.242115] Call Trace:
<4> [47.242116]  <IRQ>
<4> [47.242119]  lruvec_stat_mod_folio.constprop.0+0x35/0x90
<4> [47.242122]  __iommu_free_desc+0x64/0xb0
<4> [47.242125]  iommu_put_pages_list+0x27/0x50
<4> [47.242127]  fq_ring_free_locked+0x3f/0xa0
<4> [47.242131]  fq_flush_timeout+0x81/0x120
<4> [47.242134]  ? __pfx_fq_flush_timeout+0x10/0x10
<4> [47.242137]  call_timer_fn+0xa1/0x2a0
<4> [47.242140]  ? __pfx_fq_flush_timeout+0x10/0x10
<4> [47.242143]  __run_timers+0x231/0x310
<4> [47.242146]  run_timer_softirq+0x76/0xe0
<4> [47.242149]  handle_softirqs+0xd4/0x4d0
<4> [47.242152]  __irq_exit_rcu+0x13f/0x160
<4> [47.242154]  irq_exit_rcu+0xe/0x20
<4> [47.242156]  sysvec_apic_timer_interrupt+0xa0/0xc0
<4> [47.242160]  </IRQ>
`````````````````````````````````````````````````````````````````````````````````
Detailed log can be found in [3].

After bisecting the tree, the following patch [4] seems to be the first "bad"
commit

`````````````````````````````````````````````````````````````````````````````````````````````````````````
commit 13f43d7cf3e0570004a0d960bc1be23db827c2ff
Author: Jason Gunthorpe mailto:jgg@nvidia.com
Date:   Tue Apr 8 13:53:56 2025 -0300

    iommu/pages: Formalize the freelist API
`````````````````````````````````````````````````````````````````````````````````````````````````````````

We could not revert the patch because of merge conflicts but resetting to the parent of the commit seems to fix the issue.

Could you please check why the patch causes this regression and provide a fix if necessary?

Thank you.

Regards

Chaitanya

[1] https://intel-gfx-ci.01.org/tree/linux-next/combined-alt.html?
[2] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20250507
[3] https://intel-gfx-ci.01.org/tree/linux-next/next-20250507/bat-rpls-4/igt@gem_exec_gttfill@basic.html
[4] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20250507&id=13f43d7cf3e0570004a0d960bc1be23db827c2ff


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-05-26 21:05 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-12  6:00 [REGRESSION] on linux-next (next-20250507) Borah, Chaitanya Kumar
2025-05-12 12:21 ` Jason Gunthorpe
2025-05-14 12:40   ` Borah, Chaitanya Kumar
2025-05-18  6:17     ` Leon Romanovsky
2025-05-18  6:45       ` Leon Romanovsky
2025-05-12 13:53 ` ✓ CI.Patch_applied: success for " Patchwork
2025-05-12 13:53 ` ✗ CI.checkpatch: warning " Patchwork
2025-05-12 13:54 ` ✓ CI.KUnit: success " Patchwork
2025-05-12 13:59 ` ✗ CI.Build: failure " Patchwork
2025-05-26 20:59 ` ✓ CI.Patch_applied: success " Patchwork
2025-05-26 20:59 ` ✗ CI.checkpatch: warning " Patchwork
2025-05-26 21:00 ` ✓ CI.KUnit: success " Patchwork
2025-05-26 21:05 ` ✗ CI.Build: failure " Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox