* TLB Invalidation time out on i915 SR-IOV passthrough
@ 2025-01-28 8:54 MARDI Youness
2025-01-29 14:09 ` Rodrigo Vivi
0 siblings, 1 reply; 3+ messages in thread
From: MARDI Youness @ 2025-01-28 8:54 UTC (permalink / raw)
To: intel-gfx@lists.freedesktop.org; +Cc: CHEVRIE Thomas
[-- Attachment #1: Type: text/plain, Size: 3137 bytes --]
Hello,
Could you help us on this issue: https://github.com/intel/linux-intel-lts/issues/54
Host environment
Operating system: Gentoo Base System release 2.14
OS/kernel version: https://github.com/intel/linux-intel-lts/tree/lts-v6.6.34-linux-240626T131354Z
Architecture: x86_64
QEMU flavor: qemu-system-x86_64
QEMU version: latest qemu (master branch)
CPU: 12th Gen Intel(R) Core(TM) i7-1270P
igpu: Alder Lake-P
firmware: https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/snapshot/linux-firmware-20241110.tar.gz
Emulated/Virtualized environment
Operating system: Windows 10 21H1
Description of problem
After setting up SR-IOV (kernel compilation, kernel cmdline, vfio-pci driver attribution to the new pci..)
I've got my two new pci.
00:02.0 VGA compatible controller: Intel Corporation Alder Lake-P Integrated Graphics Controller (rev 0c)
DeviceName: Onboard IGD
Subsystem: Hewlett-Packard Company Alder Lake-P Integrated Graphics Controller
Kernel driver in use: i915
00:02.1 VGA compatible controller: Intel Corporation Alder Lake-P Integrated Graphics Controller (rev 0c)
Subsystem: Hewlett-Packard Company Alder Lake-P Integrated Graphics Controller
Kernel driver in use: vfio-pci
00:02.2 VGA compatible controller: Intel Corporation Alder Lake-P Integrated Graphics Controller (rev 0c)
Subsystem: Hewlett-Packard Company Alder Lake-P Integrated Graphics Controller
Kernel driver in use: vfio-pci
I gave one of those pci to my VM with this qemu cmdline:
-cpu host,migratable=on,hv-time,hv-relaxed,hv-vapic,hv-spinlocks=0x1fff,hv-passthrough,hv-vendor-id=IrisXE
...
-device vfio-pci-nohotplug,host=0000:00:02.1,id=hostdev0,bus=pci.4,addr=0x0
Sometimes it working properly when I start the qemu cmdline but most of the time I've got those kernel errors and a GPU hang:
kernel [ 2252.208134] i915 0000:00:02.0: [drm] ERROR GT0: GUC: TLB invalidation response timed out for seqno 9679
kernel [ 2252.208134] i915 0000:00:02.0: [drm] ERROR GT0: GUC: TLB invalidation response timed out for seqno 9679
kernel i915 0000:00:02.0: [drm] ERROR GT0: GUC: TLB invalidation response timed out for seqno 9679
kernel i915 0000:00:02.0: [drm] ERROR GT0: GUC: TLB invalidation response timed out for seqno 9679
....
kernel Fence expiration time out i915-0000:00:02.0:renderThread22381:6e0!
kernel i915 0000:00:02.0: [drm] GT0: GuC firmware i915/adlp_guc_70.bin version 70.13.1
kernel i915 0000:00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin version 7.9.3
kernel i915 0000:00:02.0: [drm] GT0: HuC: authenticated for all workloads
kernel i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
kernel i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled
kernel [ 2730.991019] i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:85dfbfff, in renderThread [22381]
kernel [ 2730.991084] i915 0000:00:02.0: [drm] renderThread22381 context reset due to GPU hang
It mostly appears when Qemu is starting..
Any help would be appreciated, thanks a lot
Best Regards,
Youness MARDI
C2 - Usage restreint
[-- Attachment #2: Type: text/html, Size: 10603 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: TLB Invalidation time out on i915 SR-IOV passthrough
2025-01-28 8:54 TLB Invalidation time out on i915 SR-IOV passthrough MARDI Youness
@ 2025-01-29 14:09 ` Rodrigo Vivi
2025-01-30 18:04 ` Michal Wajdeczko
0 siblings, 1 reply; 3+ messages in thread
From: Rodrigo Vivi @ 2025-01-29 14:09 UTC (permalink / raw)
To: MARDI Youness, Michal Wajdeczko, Nikkanen, Kimmo
Cc: intel-gfx@lists.freedesktop.org, CHEVRIE Thomas
On Tue, Jan 28, 2025 at 08:54:10AM +0000, MARDI Youness wrote:
> Hello,
>
>
>
> Could you help us on this issue:
> [1]https://github.com/intel/linux-intel-lts/issues/54
>
>
>
> Host environment
>
> Operating system: Gentoo Base System release 2.14
> OS/kernel version:
> [2]https://github.com/intel/linux-intel-lts/tree/lts-v6.6.34-linux-240626T131354Z
https://github.com/intel/linux-intel-lts/blob/lts-v6.6.34-linux-240626T131354Z/drivers/gpu/drm/i915/README.sriov
Michal, could you please help here?
Thanks,
Rodrigo.
> Architecture: x86_64
> QEMU flavor: qemu-system-x86_64
> QEMU version: latest qemu (master branch)
> CPU: 12th Gen Intel(R) Core(TM) i7-1270P
> igpu: Alder Lake-P
> firmware:
> [3]https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/snapshot/linux-firmware-20241110.tar.gz
>
>
>
> Emulated/Virtualized environment
>
> Operating system: Windows 10 21H1
>
>
>
>
>
> Description of problem
>
> After setting up SR-IOV (kernel compilation, kernel cmdline, vfio-pci
> driver attribution to the new pci..)
> I've got my two new pci.
>
>
>
>
>
> 00:02.0 VGA compatible controller: Intel Corporation Alder Lake-P
> Integrated Graphics Controller (rev 0c)
>
> DeviceName: Onboard IGD
>
>
>
> Subsystem: Hewlett-Packard Company Alder Lake-P Integrated Graphics
> Controller
>
> Kernel driver in use: i915
>
>
>
> 00:02.1 VGA compatible controller: Intel Corporation Alder Lake-P
> Integrated Graphics Controller (rev 0c)
>
> Subsystem: Hewlett-Packard Company Alder Lake-P Integrated Graphics
> Controller
>
> Kernel driver in use: vfio-pci
>
>
>
> 00:02.2 VGA compatible controller: Intel Corporation Alder Lake-P
> Integrated Graphics Controller (rev 0c)
>
> Subsystem: Hewlett-Packard Company Alder Lake-P Integrated Graphics
> Controller
>
> Kernel driver in use: vfio-pci
>
>
>
> I gave one of those pci to my VM with this qemu cmdline:
>
>
>
> -cpu
> host,migratable=on,hv-time,hv-relaxed,hv-vapic,hv-spinlocks=0x1fff,hv-passthrough,hv-vendor-id=IrisXE
>
> ...
>
> -device
> vfio-pci-nohotplug,host=0000:00:02.1,id=hostdev0,bus=pci.4,addr=0x0
>
>
>
> Sometimes it working properly when I start the qemu cmdline but most of
> the time I've got those kernel errors and a GPU hang:
>
>
>
> kernel [ 2252.208134] i915 0000:00:02.0: [drm] ERROR GT0: GUC: TLB
> invalidation response timed out for seqno 9679
>
> kernel [ 2252.208134] i915 0000:00:02.0: [drm] ERROR GT0: GUC: TLB
> invalidation response timed out for seqno 9679
>
> kernel i915 0000:00:02.0: [drm] ERROR GT0: GUC: TLB invalidation
> response timed out for seqno 9679
>
> kernel i915 0000:00:02.0: [drm] ERROR GT0: GUC: TLB invalidation
> response timed out for seqno 9679
>
> ....
>
> kernel Fence expiration time out
> i915-0000:00:02.0:renderThread22381:6e0!
>
> kernel i915 0000:00:02.0: [drm] GT0: GuC firmware i915/adlp_guc_70.bin
> version 70.13.1
>
> kernel i915 0000:00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin
> version 7.9.3
>
> kernel i915 0000:00:02.0: [drm] GT0: HuC: authenticated for all
> workloads
>
> kernel i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
>
> kernel i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled
>
> kernel [ 2730.991019] i915 0000:00:02.0: [drm] GPU HANG: ecode
> 12:1:85dfbfff, in renderThread [22381]
>
> kernel [ 2730.991084] i915 0000:00:02.0: [drm] renderThread22381
> context reset due to GPU hang
>
>
>
>
>
> It mostly appears when Qemu is starting..
> Any help would be appreciated, thanks a lot
>
>
>
> Best Regards,
>
>
>
> Youness MARDI
>
>
>
> C2 – Usage restreint
>
> References
>
> Visible links
> 1. https://github.com/intel/linux-intel-lts/issues/54
> 2. https://github.com/intel/linux-intel-lts/tree/lts-v6.6.34-linux-240626T131354Z
> 3. https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/snapshot/linux-firmware-20241110.tar.gz
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: TLB Invalidation time out on i915 SR-IOV passthrough
2025-01-29 14:09 ` Rodrigo Vivi
@ 2025-01-30 18:04 ` Michal Wajdeczko
0 siblings, 0 replies; 3+ messages in thread
From: Michal Wajdeczko @ 2025-01-30 18:04 UTC (permalink / raw)
To: Rodrigo Vivi, MARDI Youness, Nikkanen, Kimmo
Cc: intel-gfx@lists.freedesktop.org, CHEVRIE Thomas
Hi,
On 29.01.2025 15:09, Rodrigo Vivi wrote:
> On Tue, Jan 28, 2025 at 08:54:10AM +0000, MARDI Youness wrote:
>> Hello,
>>
>>
>>
>> Could you help us on this issue:
>> [1]https://github.com/intel/linux-intel-lts/issues/54
Once you enabled all VFs, try to capture and attach to [1] all SRIOV
provisioning details, you may use something like:
$ grep . -r /sys/class/drm/card0/iov
Also attach full dmesg and GuC log right after the failure.
For larger GuC log buffer please select CONFIG_DRM_I915_DEBUG_GUC and
use modparam i915.guc_log_level=4
You can also try with (once VFs are enabled, but before starting VMs):
- set explicit "execution_quantum_ms" for PF and all VFs to 20
- set explicit "preemption_timeout_us" for PF and all VFs to 20000
- enable "engine_reset" policy
$ echo 20 > /sys/class/drm/card0/iov/pf/gt0/execution_quantum_ms
$ echo 20 > /sys/class/drm/card0/iov/vf1/gt0/execution_quantum_ms
...
$ echo 1 > /sys/class/drm/card0/iov/pf/gt0/policies/engine_reset
>>
>>
>>
>> Host environment
>>
>> Operating system: Gentoo Base System release 2.14
>> OS/kernel version:
>> [2]https://github.com/intel/linux-intel-lts/tree/lts-v6.6.34-linux-240626T131354Z
>
> https://github.com/intel/linux-intel-lts/blob/lts-v6.6.34-linux-240626T131354Z/drivers/gpu/drm/i915/README.sriov
>
> Michal, could you please help here?
>
> Thanks,
> Rodrigo.
>
>> Architecture: x86_64
>> QEMU flavor: qemu-system-x86_64
>> QEMU version: latest qemu (master branch)
>> CPU: 12th Gen Intel(R) Core(TM) i7-1270P
>> igpu: Alder Lake-P
>> firmware:
>> [3]https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/snapshot/linux-firmware-20241110.tar.gz
>>
>>
>>
>> Emulated/Virtualized environment
>>
>> Operating system: Windows 10 21H1
>>
>>
>>
>>
>>
>> Description of problem
>>
>> After setting up SR-IOV (kernel compilation, kernel cmdline, vfio-pci
>> driver attribution to the new pci..)
>> I've got my two new pci.
>>
>>
>>
>>
>>
>> 00:02.0 VGA compatible controller: Intel Corporation Alder Lake-P
>> Integrated Graphics Controller (rev 0c)
>>
>> DeviceName: Onboard IGD
>>
>>
>>
>> Subsystem: Hewlett-Packard Company Alder Lake-P Integrated Graphics
>> Controller
>>
>> Kernel driver in use: i915
>>
>>
>>
>> 00:02.1 VGA compatible controller: Intel Corporation Alder Lake-P
>> Integrated Graphics Controller (rev 0c)
>>
>> Subsystem: Hewlett-Packard Company Alder Lake-P Integrated Graphics
>> Controller
>>
>> Kernel driver in use: vfio-pci
>>
>>
>>
>> 00:02.2 VGA compatible controller: Intel Corporation Alder Lake-P
>> Integrated Graphics Controller (rev 0c)
>>
>> Subsystem: Hewlett-Packard Company Alder Lake-P Integrated Graphics
>> Controller
>>
>> Kernel driver in use: vfio-pci
>>
>>
>>
>> I gave one of those pci to my VM with this qemu cmdline:
>>
>>
>>
>> -cpu
>> host,migratable=on,hv-time,hv-relaxed,hv-vapic,hv-spinlocks=0x1fff,hv-passthrough,hv-vendor-id=IrisXE
>>
>> ...
>>
>> -device
>> vfio-pci-nohotplug,host=0000:00:02.1,id=hostdev0,bus=pci.4,addr=0x0
>>
>>
>>
>> Sometimes it working properly when I start the qemu cmdline but most of
>> the time I've got those kernel errors and a GPU hang:
>>
>>
>>
>> kernel [ 2252.208134] i915 0000:00:02.0: [drm] ERROR GT0: GUC: TLB
>> invalidation response timed out for seqno 9679
>>
>> kernel [ 2252.208134] i915 0000:00:02.0: [drm] ERROR GT0: GUC: TLB
>> invalidation response timed out for seqno 9679
>>
>> kernel i915 0000:00:02.0: [drm] ERROR GT0: GUC: TLB invalidation
>> response timed out for seqno 9679
>>
>> kernel i915 0000:00:02.0: [drm] ERROR GT0: GUC: TLB invalidation
>> response timed out for seqno 9679
>>
>> ....
>>
>> kernel Fence expiration time out
>> i915-0000:00:02.0:renderThread22381:6e0!
>>
>> kernel i915 0000:00:02.0: [drm] GT0: GuC firmware i915/adlp_guc_70.bin
>> version 70.13.1
>>
>> kernel i915 0000:00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin
>> version 7.9.3
>>
>> kernel i915 0000:00:02.0: [drm] GT0: HuC: authenticated for all
>> workloads
>>
>> kernel i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
>>
>> kernel i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled
>>
>> kernel [ 2730.991019] i915 0000:00:02.0: [drm] GPU HANG: ecode
>> 12:1:85dfbfff, in renderThread [22381]
>>
>> kernel [ 2730.991084] i915 0000:00:02.0: [drm] renderThread22381
>> context reset due to GPU hang
>>
>>
>>
>>
>>
>> It mostly appears when Qemu is starting..
>> Any help would be appreciated, thanks a lot
>>
>>
>>
>> Best Regards,
>>
>>
>>
>> Youness MARDI
>>
>>
>>
>> C2 – Usage restreint
>>
>> References
>>
>> Visible links
>> 1. https://github.com/intel/linux-intel-lts/issues/54
>> 2. https://github.com/intel/linux-intel-lts/tree/lts-v6.6.34-linux-240626T131354Z
>> 3. https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/snapshot/linux-firmware-20241110.tar.gz
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-01-30 18:04 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-28 8:54 TLB Invalidation time out on i915 SR-IOV passthrough MARDI Youness
2025-01-29 14:09 ` Rodrigo Vivi
2025-01-30 18:04 ` Michal Wajdeczko
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox