Intel-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
* TLB Invalidation time out on i915 SR-IOV passthrough
@ 2025-01-28  8:54 MARDI Youness
  2025-01-29 14:09 ` Rodrigo Vivi
  0 siblings, 1 reply; 3+ messages in thread
From: MARDI Youness @ 2025-01-28  8:54 UTC (permalink / raw)
  To: intel-gfx@lists.freedesktop.org; +Cc: CHEVRIE Thomas

[-- Attachment #1: Type: text/plain, Size: 3137 bytes --]


Hello,

Could you help us on this issue: https://github.com/intel/linux-intel-lts/issues/54

Host environment
Operating system: Gentoo Base System release 2.14
OS/kernel version: https://github.com/intel/linux-intel-lts/tree/lts-v6.6.34-linux-240626T131354Z
Architecture: x86_64
QEMU flavor: qemu-system-x86_64
QEMU version: latest qemu (master branch)
CPU: 12th Gen Intel(R) Core(TM) i7-1270P
igpu: Alder Lake-P
firmware: https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/snapshot/linux-firmware-20241110.tar.gz

Emulated/Virtualized environment
Operating system: Windows 10 21H1


Description of problem
After setting up SR-IOV (kernel compilation, kernel cmdline, vfio-pci driver attribution to the new pci..)
I've got my two new pci.


00:02.0 VGA compatible controller: Intel Corporation Alder Lake-P Integrated Graphics Controller (rev 0c)
DeviceName: Onboard IGD

Subsystem: Hewlett-Packard Company Alder Lake-P Integrated Graphics Controller
Kernel driver in use: i915

00:02.1 VGA compatible controller: Intel Corporation Alder Lake-P Integrated Graphics Controller (rev 0c)
Subsystem: Hewlett-Packard Company Alder Lake-P Integrated Graphics Controller
Kernel driver in use: vfio-pci

00:02.2 VGA compatible controller: Intel Corporation Alder Lake-P Integrated Graphics Controller (rev 0c)
Subsystem: Hewlett-Packard Company Alder Lake-P Integrated Graphics Controller
Kernel driver in use: vfio-pci

I gave one of those pci to my VM with this qemu cmdline:

-cpu host,migratable=on,hv-time,hv-relaxed,hv-vapic,hv-spinlocks=0x1fff,hv-passthrough,hv-vendor-id=IrisXE
...
-device vfio-pci-nohotplug,host=0000:00:02.1,id=hostdev0,bus=pci.4,addr=0x0

Sometimes it working properly when I start the qemu cmdline but most of the time I've got those kernel errors and a GPU hang:

    kernel [ 2252.208134] i915 0000:00:02.0: [drm] ERROR GT0: GUC: TLB invalidation response timed out for seqno 9679
    kernel [ 2252.208134] i915 0000:00:02.0: [drm] ERROR GT0: GUC: TLB invalidation response timed out for seqno 9679
    kernel i915 0000:00:02.0: [drm] ERROR GT0: GUC: TLB invalidation response timed out for seqno 9679
    kernel i915 0000:00:02.0: [drm] ERROR GT0: GUC: TLB invalidation response timed out for seqno 9679
    ....
    kernel Fence expiration time out i915-0000:00:02.0:renderThread22381:6e0!
    kernel i915 0000:00:02.0: [drm] GT0: GuC firmware i915/adlp_guc_70.bin version 70.13.1
    kernel i915 0000:00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin version 7.9.3
    kernel i915 0000:00:02.0: [drm] GT0: HuC: authenticated for all workloads
    kernel i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
    kernel i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled
    kernel [ 2730.991019] i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:85dfbfff, in renderThread [22381]
    kernel [ 2730.991084] i915 0000:00:02.0: [drm] renderThread22381 context reset due to GPU hang


It mostly appears when Qemu is starting..
Any help would be appreciated, thanks a lot

Best Regards,

Youness MARDI



C2 - Usage restreint

[-- Attachment #2: Type: text/html, Size: 10603 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: TLB Invalidation time out on i915 SR-IOV passthrough
  2025-01-28  8:54 TLB Invalidation time out on i915 SR-IOV passthrough MARDI Youness
@ 2025-01-29 14:09 ` Rodrigo Vivi
  2025-01-30 18:04   ` Michal Wajdeczko
  0 siblings, 1 reply; 3+ messages in thread
From: Rodrigo Vivi @ 2025-01-29 14:09 UTC (permalink / raw)
  To: MARDI Youness, Michal Wajdeczko, Nikkanen, Kimmo
  Cc: intel-gfx@lists.freedesktop.org, CHEVRIE Thomas

On Tue, Jan 28, 2025 at 08:54:10AM +0000, MARDI Youness wrote:
>    Hello,                                                                       
>                                                                                 
>                                                                                 
>                                                                                 
>    Could you help us on this issue:                                             
>    [1]https://github.com/intel/linux-intel-lts/issues/54                        
>                                                                                 
>                                                                                 
>                                                                                 
>    Host environment                                                             
>                                                                                 
>    Operating system: Gentoo Base System release 2.14                            
>    OS/kernel version:                                                           
>    [2]https://github.com/intel/linux-intel-lts/tree/lts-v6.6.34-linux-240626T131354Z

https://github.com/intel/linux-intel-lts/blob/lts-v6.6.34-linux-240626T131354Z/drivers/gpu/drm/i915/README.sriov

Michal, could you please help here?

Thanks,
Rodrigo.

>    Architecture: x86_64                                                         
>    QEMU flavor: qemu-system-x86_64                                              
>    QEMU version: latest qemu (master branch)                                    
>    CPU: 12th Gen Intel(R) Core(TM) i7-1270P                                     
>    igpu: Alder Lake-P                                                           
>    firmware:                                                                    
>    [3]https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/snapshot/linux-firmware-20241110.tar.gz
>                                                                                 
>                                                                                 
>                                                                                 
>    Emulated/Virtualized environment                                             
>                                                                                 
>    Operating system: Windows 10 21H1                                            
>                                                                                 
>                                                                                 
>                                                                                 
>                                                                                 
>                                                                                 
>    Description of problem                                                       
>                                                                                 
>    After setting up SR-IOV (kernel compilation, kernel cmdline, vfio-pci        
>    driver attribution to the new pci..)                                         
>    I've got my two new pci.                                                     
>                                                                                 
>                                                                                 
>                                                                                 
>                                                                                 
>                                                                                 
>    00:02.0 VGA compatible controller: Intel Corporation Alder Lake-P            
>    Integrated Graphics Controller (rev 0c)                                      
>                                                                                 
>    DeviceName: Onboard IGD                                                      
>                                                                                 
>                                                                                 
>                                                                                 
>    Subsystem: Hewlett-Packard Company Alder Lake-P Integrated Graphics          
>    Controller                                                                   
>                                                                                 
>    Kernel driver in use: i915                                                   
>                                                                                 
>                                                                                 
>                                                                                 
>    00:02.1 VGA compatible controller: Intel Corporation Alder Lake-P            
>    Integrated Graphics Controller (rev 0c)                                      
>                                                                                 
>    Subsystem: Hewlett-Packard Company Alder Lake-P Integrated Graphics          
>    Controller                                                                   
>                                                                                 
>    Kernel driver in use: vfio-pci                                               
>                                                                                 
>                                                                                 
>                                                                                 
>    00:02.2 VGA compatible controller: Intel Corporation Alder Lake-P            
>    Integrated Graphics Controller (rev 0c)                                      
>                                                                                 
>    Subsystem: Hewlett-Packard Company Alder Lake-P Integrated Graphics          
>    Controller                                                                   
>                                                                                 
>    Kernel driver in use: vfio-pci                                               
>                                                                                 
>                                                                                 
>                                                                                 
>    I gave one of those pci to my VM with this qemu cmdline:                     
>                                                                                 
>                                                                                 
>                                                                                 
>    -cpu                                                                         
>    host,migratable=on,hv-time,hv-relaxed,hv-vapic,hv-spinlocks=0x1fff,hv-passthrough,hv-vendor-id=IrisXE
>                                                                                 
>    ...                                                                          
>                                                                                 
>    -device                                                                      
>    vfio-pci-nohotplug,host=0000:00:02.1,id=hostdev0,bus=pci.4,addr=0x0          
>                                                                                 
>                                                                                 
>                                                                                 
>    Sometimes it working properly when I start the qemu cmdline but most of      
>    the time I've got those kernel errors and a GPU hang:                        
>                                                                                 
>                                                                                 
>                                                                                 
>        kernel [ 2252.208134] i915 0000:00:02.0: [drm] ERROR GT0: GUC: TLB       
>    invalidation response timed out for seqno 9679                               
>                                                                                 
>        kernel [ 2252.208134] i915 0000:00:02.0: [drm] ERROR GT0: GUC: TLB       
>    invalidation response timed out for seqno 9679                               
>                                                                                 
>        kernel i915 0000:00:02.0: [drm] ERROR GT0: GUC: TLB invalidation         
>    response timed out for seqno 9679                                            
>                                                                                 
>        kernel i915 0000:00:02.0: [drm] ERROR GT0: GUC: TLB invalidation         
>    response timed out for seqno 9679                                            
>                                                                                 
>        ....                                                                     
>                                                                                 
>        kernel Fence expiration time out                                         
>    i915-0000:00:02.0:renderThread22381:6e0!                                     
>                                                                                 
>        kernel i915 0000:00:02.0: [drm] GT0: GuC firmware i915/adlp_guc_70.bin   
>    version 70.13.1                                                              
>                                                                                 
>        kernel i915 0000:00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin       
>    version 7.9.3                                                                
>                                                                                 
>        kernel i915 0000:00:02.0: [drm] GT0: HuC: authenticated for all          
>    workloads                                                                    
>                                                                                 
>        kernel i915 0000:00:02.0: [drm] GT0: GUC: submission enabled             
>                                                                                 
>        kernel i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled                   
>                                                                                 
>        kernel [ 2730.991019] i915 0000:00:02.0: [drm] GPU HANG: ecode           
>    12:1:85dfbfff, in renderThread [22381]                                       
>                                                                                 
>        kernel [ 2730.991084] i915 0000:00:02.0: [drm] renderThread22381         
>    context reset due to GPU hang                                                
>                                                                                 
>                                                                                 
>                                                                                 
>                                                                                 
>                                                                                 
>    It mostly appears when Qemu is starting..                                    
>    Any help would be appreciated, thanks a lot                                  
>                                                                                 
>                                                                                 
>                                                                                 
>    Best Regards,                                                                
>                                                                                 
>                                                                                 
>                                                                                 
>    Youness MARDI                                                                
>                                                                                 
>                                                                                 
>                                                                                 
>    C2 – Usage restreint                                                         
> 
> References
> 
>    Visible links
>    1. https://github.com/intel/linux-intel-lts/issues/54
>    2. https://github.com/intel/linux-intel-lts/tree/lts-v6.6.34-linux-240626T131354Z
>    3. https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/snapshot/linux-firmware-20241110.tar.gz

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: TLB Invalidation time out on i915 SR-IOV passthrough
  2025-01-29 14:09 ` Rodrigo Vivi
@ 2025-01-30 18:04   ` Michal Wajdeczko
  0 siblings, 0 replies; 3+ messages in thread
From: Michal Wajdeczko @ 2025-01-30 18:04 UTC (permalink / raw)
  To: Rodrigo Vivi, MARDI Youness, Nikkanen, Kimmo
  Cc: intel-gfx@lists.freedesktop.org, CHEVRIE Thomas

Hi,

On 29.01.2025 15:09, Rodrigo Vivi wrote:
> On Tue, Jan 28, 2025 at 08:54:10AM +0000, MARDI Youness wrote:
>>    Hello,                                                                       
>>                                                                                 
>>                                                                                 
>>                                                                                 
>>    Could you help us on this issue:                                             
>>    [1]https://github.com/intel/linux-intel-lts/issues/54                        

Once you enabled all VFs, try to capture and attach to [1] all SRIOV
provisioning details, you may use something like:

 $ grep . -r /sys/class/drm/card0/iov

Also attach full dmesg and GuC log right after the failure.

For larger GuC log buffer please select CONFIG_DRM_I915_DEBUG_GUC and
use modparam i915.guc_log_level=4

You can also try with (once VFs are enabled, but before starting VMs):
- set explicit "execution_quantum_ms" for PF and all VFs to 20
- set explicit "preemption_timeout_us" for PF and all VFs to 20000
- enable "engine_reset" policy

 $ echo 20 > /sys/class/drm/card0/iov/pf/gt0/execution_quantum_ms
 $ echo 20 > /sys/class/drm/card0/iov/vf1/gt0/execution_quantum_ms
 ...
 $ echo 1 > /sys/class/drm/card0/iov/pf/gt0/policies/engine_reset

>>                                                                                 
>>                                                                                 
>>                                                                                 
>>    Host environment                                                             
>>                                                                                 
>>    Operating system: Gentoo Base System release 2.14                            
>>    OS/kernel version:                                                           
>>    [2]https://github.com/intel/linux-intel-lts/tree/lts-v6.6.34-linux-240626T131354Z
> 
> https://github.com/intel/linux-intel-lts/blob/lts-v6.6.34-linux-240626T131354Z/drivers/gpu/drm/i915/README.sriov
> 
> Michal, could you please help here?
> 
> Thanks,
> Rodrigo.
> 
>>    Architecture: x86_64                                                         
>>    QEMU flavor: qemu-system-x86_64                                              
>>    QEMU version: latest qemu (master branch)                                    
>>    CPU: 12th Gen Intel(R) Core(TM) i7-1270P                                     
>>    igpu: Alder Lake-P                                                           
>>    firmware:                                                                    
>>    [3]https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/snapshot/linux-firmware-20241110.tar.gz
>>                                                                                 
>>                                                                                 
>>                                                                                 
>>    Emulated/Virtualized environment                                             
>>                                                                                 
>>    Operating system: Windows 10 21H1                                            
>>                                                                                 
>>                                                                                 
>>                                                                                 
>>                                                                                 
>>                                                                                 
>>    Description of problem                                                       
>>                                                                                 
>>    After setting up SR-IOV (kernel compilation, kernel cmdline, vfio-pci        
>>    driver attribution to the new pci..)                                         
>>    I've got my two new pci.                                                     
>>                                                                                 
>>                                                                                 
>>                                                                                 
>>                                                                                 
>>                                                                                 
>>    00:02.0 VGA compatible controller: Intel Corporation Alder Lake-P            
>>    Integrated Graphics Controller (rev 0c)                                      
>>                                                                                 
>>    DeviceName: Onboard IGD                                                      
>>                                                                                 
>>                                                                                 
>>                                                                                 
>>    Subsystem: Hewlett-Packard Company Alder Lake-P Integrated Graphics          
>>    Controller                                                                   
>>                                                                                 
>>    Kernel driver in use: i915                                                   
>>                                                                                 
>>                                                                                 
>>                                                                                 
>>    00:02.1 VGA compatible controller: Intel Corporation Alder Lake-P            
>>    Integrated Graphics Controller (rev 0c)                                      
>>                                                                                 
>>    Subsystem: Hewlett-Packard Company Alder Lake-P Integrated Graphics          
>>    Controller                                                                   
>>                                                                                 
>>    Kernel driver in use: vfio-pci                                               
>>                                                                                 
>>                                                                                 
>>                                                                                 
>>    00:02.2 VGA compatible controller: Intel Corporation Alder Lake-P            
>>    Integrated Graphics Controller (rev 0c)                                      
>>                                                                                 
>>    Subsystem: Hewlett-Packard Company Alder Lake-P Integrated Graphics          
>>    Controller                                                                   
>>                                                                                 
>>    Kernel driver in use: vfio-pci                                               
>>                                                                                 
>>                                                                                 
>>                                                                                 
>>    I gave one of those pci to my VM with this qemu cmdline:                     
>>                                                                                 
>>                                                                                 
>>                                                                                 
>>    -cpu                                                                         
>>    host,migratable=on,hv-time,hv-relaxed,hv-vapic,hv-spinlocks=0x1fff,hv-passthrough,hv-vendor-id=IrisXE
>>                                                                                 
>>    ...                                                                          
>>                                                                                 
>>    -device                                                                      
>>    vfio-pci-nohotplug,host=0000:00:02.1,id=hostdev0,bus=pci.4,addr=0x0          
>>                                                                                 
>>                                                                                 
>>                                                                                 
>>    Sometimes it working properly when I start the qemu cmdline but most of      
>>    the time I've got those kernel errors and a GPU hang:                        
>>                                                                                 
>>                                                                                 
>>                                                                                 
>>        kernel [ 2252.208134] i915 0000:00:02.0: [drm] ERROR GT0: GUC: TLB       
>>    invalidation response timed out for seqno 9679                               
>>                                                                                 
>>        kernel [ 2252.208134] i915 0000:00:02.0: [drm] ERROR GT0: GUC: TLB       
>>    invalidation response timed out for seqno 9679                               
>>                                                                                 
>>        kernel i915 0000:00:02.0: [drm] ERROR GT0: GUC: TLB invalidation         
>>    response timed out for seqno 9679                                            
>>                                                                                 
>>        kernel i915 0000:00:02.0: [drm] ERROR GT0: GUC: TLB invalidation         
>>    response timed out for seqno 9679                                            
>>                                                                                 
>>        ....                                                                     
>>                                                                                 
>>        kernel Fence expiration time out                                         
>>    i915-0000:00:02.0:renderThread22381:6e0!                                     
>>                                                                                 
>>        kernel i915 0000:00:02.0: [drm] GT0: GuC firmware i915/adlp_guc_70.bin   
>>    version 70.13.1                                                              
>>                                                                                 
>>        kernel i915 0000:00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin       
>>    version 7.9.3                                                                
>>                                                                                 
>>        kernel i915 0000:00:02.0: [drm] GT0: HuC: authenticated for all          
>>    workloads                                                                    
>>                                                                                 
>>        kernel i915 0000:00:02.0: [drm] GT0: GUC: submission enabled             
>>                                                                                 
>>        kernel i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled                   
>>                                                                                 
>>        kernel [ 2730.991019] i915 0000:00:02.0: [drm] GPU HANG: ecode           
>>    12:1:85dfbfff, in renderThread [22381]                                       
>>                                                                                 
>>        kernel [ 2730.991084] i915 0000:00:02.0: [drm] renderThread22381         
>>    context reset due to GPU hang                                                
>>                                                                                 
>>                                                                                 
>>                                                                                 
>>                                                                                 
>>                                                                                 
>>    It mostly appears when Qemu is starting..                                    
>>    Any help would be appreciated, thanks a lot                                  
>>                                                                                 
>>                                                                                 
>>                                                                                 
>>    Best Regards,                                                                
>>                                                                                 
>>                                                                                 
>>                                                                                 
>>    Youness MARDI                                                                
>>                                                                                 
>>                                                                                 
>>                                                                                 
>>    C2 – Usage restreint                                                         
>>
>> References
>>
>>    Visible links
>>    1. https://github.com/intel/linux-intel-lts/issues/54
>>    2. https://github.com/intel/linux-intel-lts/tree/lts-v6.6.34-linux-240626T131354Z
>>    3. https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/snapshot/linux-firmware-20241110.tar.gz


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-01-30 18:04 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-28  8:54 TLB Invalidation time out on i915 SR-IOV passthrough MARDI Youness
2025-01-29 14:09 ` Rodrigo Vivi
2025-01-30 18:04   ` Michal Wajdeczko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox