From: Aravind Iddamsetty <aravind.iddamsetty@linux.intel.com>
To: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: intel-xe@lists.freedesktop.org, thomas.hellstrom@linux.intel.com,
lucas.demarchi@intel.com, dri-devel@lists.freedesktop.org
Subject: Re: [PATCH v2 0/4] drm/xe: Support PCIe FLR
Date: Mon, 8 Apr 2024 11:20:34 +0530 [thread overview]
Message-ID: <45e36b75-54c8-45cb-9683-b16faab15833@linux.intel.com> (raw)
In-Reply-To: <6cbe2744-d184-4c7b-8972-eb09a87b5295@linux.intel.com>
On 05/04/24 10:30, Aravind Iddamsetty wrote:
> On 05/04/24 03:55, Rodrigo Vivi wrote:
>> On Tue, Apr 02, 2024 at 02:28:55PM +0530, Aravind Iddamsetty wrote:
>>> PCI subsystem provides callbacks to inform the driver about a request to
>>> do function level reset by user, initiated by writing to sysfs entry
>>> /sys/bus/pci/devices/.../reset. This will allow the driver to handle FLR
>>> without the need to do unbind and rebind as the driver needs to
>>> reinitialize the device afresh post FLR.
>>>
>>> v2:
>> all the patches looks good to me here. feel free to use
>>
>> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>
>> on them.
> Thank you!
>
>> but I do have some concerns (below)
>>
>>> 1. Directly expose the devm_drm_dev_release_action instead of introducing
>>> a helper (Rodrigo)
>>> 2. separate out gt idle and pci save/restore to a separate patch (Lucas)
>>> 3. Fixed the warnings seen around xe_guc_submit_stop, xe_guc_puc_fini
>> On this I also had to fight to get something working on the wedged_mode=2:
>> lore.kernel.org/all/20240403150732.102678-4-rodrigo.vivi@intel.com
>>
>> perhaps we can unify things here.
> I guess we dealing with different scenarios, in this the warning in xe_guc_submit_stop
> was because not invoking xe_uc_reset_prepare before. and we needn't invoke
> xe_guc_pc_fini as guc is already in stopped.
>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>>>
>>> dmesg snip showing FLR recovery:
>> things came different at my DG2 here with display working and all:
> after you mentioned this i tested on DG2 i got warnings but no segmentation fault
> and NPD, i have tested my branch which might not be update to date, will re test with the
> latest branch.
While I check upon this is it ok to have this version of series to be merged, as I see
even with warnings with display the device and driver are recoverable.
Thanks,
Aravind.
>
>
> Thanks,
> Aravnd.
>> root@rdvivi-desk:/sys/module/xe/drivers/pci:xe/0000:03:00.0# echo 1 > reset
>> Segmentation fault
>>
>> and many kernel warnings
>> WARNING: CPU: 8 PID: 2389 at drivers/gpu/drm/i915/display/intel_display_power_well.c:281 hsw_wait_for_power_well_enable+0x3e7/0x570 [xe]
>> WARNING: CPU: 9 PID: 1700 at drivers/gpu/drm/drm_mm.c:999 drm_mm_takedown+0x41/0x60
>>
>> [ 117.128330] KASAN: null-ptr-deref in range [0x00000000000004e8-0x00000000000004ef]
>> [ 117.128332] CPU: 13 PID: 2389 Comm: bash Tainted: G W 6.9.0-rc1+ #9
>> [ 117.135501] ? exc_invalid_op+0x13/0x40
>> [ 117.143626] Hardware name: iBUYPOWER INTEL/B660 DS3H AC DDR4-Y1, BIOS F5 12/17/2021
>> [ 117.143627] RIP: 0010:__mutex_lock+0x124/0x14a0
>> [ 117.143631] Code: d0 7c 08 84 d2 0f 85 62 0f 00 00 8b 0d 85 c8 8f 04 85 c9 75 29 48 b8 00 00 00 00 00 fc ff df 49 8d 7f 68 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 46 0f 00 00 4d 3b 7f 68 0f 85 aa 0e 00 00 bf 01
>> [ 117.150630] ? asm_exc_invalid_op+0x16/0x20
>> [ 117.156401] RSP: 0018:ffffc90005a37690 EFLAGS: 00010202
>> [ 117.156403] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
>> [ 117.163571] ? drm_buddy_fini+0x181/0x220
>>
>>
>> and more issues.
>>
>> so it looks like we are still missing some parts of the puzzle here...
>>
>>
>>> [ 590.486336] xe 0000:4d:00.0: enabling device (0140 -> 0142)
>>> [ 590.506933] xe 0000:4d:00.0: [drm] Using GuC firmware from
>>> xe/pvc_guc_70.20.0.bin version 70.20.0
>>> [ 590.542355] xe 0000:4d:00.0: [drm] Using GuC firmware from
>>> xe/pvc_guc_70.20.0.bin version 70.20.0
>>> [ 590.578532] xe 0000:4d:00.0: [drm] VISIBLE VRAM: 0x0000202000000000,
>>> 0x0000002000000000
>>> [ 590.578556] xe 0000:4d:00.0: [drm] VRAM[0, 0]: Actual physical size
>>> 0x0000001000000000, usable size exclude stolen 0x0000000fff000000, CPU
>>> accessible size 0x0000000fff000000
>>> [ 590.578560] xe 0000:4d:00.0: [drm] VRAM[0, 0]: DPA range:
>>> [0x0000000000000000-1000000000], io range:
>>> [0x0000202000000000-202fff000000]
>>> [ 590.578585] xe 0000:4d:00.0: [drm] VRAM[1, 1]: Actual physical size
>>> 0x0000001000000000, usable size exclude stolen 0x0000000fff000000, CPU
>>> accessible size 0x0000000fff000000
>>> [ 590.578589] xe 0000:4d:00.0: [drm] VRAM[1, 1]: DPA range:
>>> [0x0000001000000000-2000000000], io range:
>>> [0x0000203000000000-203fff000000]
>>> [ 590.578592] xe 0000:4d:00.0: [drm] Total VRAM: 0x0000202000000000,
>>> 0x0000002000000000
>>> [ 590.578594] xe 0000:4d:00.0: [drm] Available VRAM:
>>> 0x0000202000000000, 0x0000001ffe000000
>>> [ 590.738899] xe 0000:4d:00.0: [drm] GT0: CCS_MODE=0 config:00400000,
>>> num_engines:1, num_slices:4
>>> [ 590.889991] xe 0000:4d:00.0: [drm] GT1: CCS_MODE=0 config:00400000,
>>> num_engines:1, num_slices:4
>>> [ 590.892835] [drm] Initialized xe 1.1.0 20201103 for 0000:4d:00.0 on
>>> minor 1
>>> [ 590.900215] xe 0000:9a:00.0: enabling device (0140 -> 0142)
>>> [ 590.915991] xe 0000:9a:00.0: [drm] Using GuC firmware from
>>> xe/pvc_guc_70.20.0.bin version 70.20.0
>>> [ 590.957450] xe 0000:9a:00.0: [drm] Using GuC firmware from
>>> xe/pvc_guc_70.20.0.bin version 70.20.0
>>> [ 590.989863] xe 0000:9a:00.0: [drm] VISIBLE VRAM: 0x000020e000000000,
>>> 0x0000002000000000
>>> [ 590.989888] xe 0000:9a:00.0: [drm] VRAM[0, 0]: Actual physical size
>>> 0x0000001000000000, usable size exclude stolen 0x0000000fff000000, CPU
>>> accessible size 0x0000000fff000000
>>> [ 590.989893] xe 0000:9a:00.0: [drm] VRAM[0, 0]: DPA range:
>>> [0x0000000000000000-1000000000], io range:
>>> [0x000020e000000000-20efff000000]
>>> [ 590.989918] xe 0000:9a:00.0: [drm] VRAM[1, 1]: Actual physical size
>>> 0x0000001000000000, usable size exclude stolen 0x0000000fff000000, CPU
>>> accessible size 0x0000000fff000000
>>> [ 590.989921] xe 0000:9a:00.0: [drm] VRAM[1, 1]: DPA range:
>>> [0x0000001000000000-2000000000], io range:
>>> [0x000020f000000000-20ffff000000]
>>> [ 590.989924] xe 0000:9a:00.0: [drm] Total VRAM: 0x000020e000000000,
>>> 0x0000002000000000
>>> [ 590.989927] xe 0000:9a:00.0: [drm] Available VRAM:
>>> 0x000020e000000000, 0x0000001ffe000000
>>> [ 591.142061] xe 0000:9a:00.0: [drm] GT0: CCS_MODE=0 config:00400000,
>>> num_engines:1, num_slices:4
>>> [ 591.293505] xe 0000:9a:00.0: [drm] GT1: CCS_MODE=0 config:00400000,
>>> num_engines:1, num_slices:4
>>> [ 591.295487] [drm] Initialized xe 1.1.0 20201103 for 0000:9a:00.0 on
>>> minor 2
>>> [ 610.685993] Console: switching to colour dummy device 80x25
>>> [ 610.686118] [IGT] xe_exec_basic: executing
>>> [ 610.755398] xe 0000:4d:00.0: [drm] GT0: CCS_MODE=0 config:00400000,
>>> num_engines:1, num_slices:4
>>> [ 610.771783] xe 0000:4d:00.0: [drm] GT1: CCS_MODE=0 config:00400000,
>>> num_engines:1, num_slices:4
>>> [ 610.773542] [IGT] xe_exec_basic: starting subtest once-basic
>>> [ 610.960251] [IGT] xe_exec_basic: finished subtest once-basic, SUCCESS
>>> [ 610.962741] [IGT] xe_exec_basic: exiting, ret=0
>>> [ 610.977203] Console: switching to colour frame buffer device 128x48
>>> [ 611.006675] xe_exec_basic (3237) used greatest stack depth: 11128
>>> bytes left
>>> [ 644.682201] xe 0000:4d:00.0: [drm] GT0: CCS_MODE=0 config:00400000,
>>> num_engines:1, num_slices:4
>>> [ 644.699060] xe 0000:4d:00.0: [drm] GT1: CCS_MODE=0 config:00400000,
>>> num_engines:1, num_slices:4
>>> [ 644.699118] xe 0000:4d:00.0: preparing for PCIe FLR reset
>>> [ 644.699149] xe 0000:4d:00.0: [drm] removing device access to
>>> userspace
>>> [ 644.928577] xe 0000:4d:00.0: PCI device went through FLR, reenabling
>>> the device
>>> [ 656.104233] xe 0000:4d:00.0: [drm] Using GuC firmware from
>>> xe/pvc_guc_70.20.0.bin version 70.20.0
>>> [ 656.149525] xe 0000:4d:00.0: [drm] Using GuC firmware from
>>> xe/pvc_guc_70.20.0.bin version 70.20.0
>>> [ 656.182711] xe 0000:4d:00.0: [drm] VISIBLE VRAM: 0x0000202000000000,
>>> 0x0000002000000000
>>> [ 656.182737] xe 0000:4d:00.0: [drm] VRAM[0, 0]: Actual physical size
>>> 0x0000001000000000, usable size exclude stolen 0x0000000fff000000, CPU
>>> accessible size 0x0000000fff000000
>>> [ 656.182742] xe 0000:4d:00.0: [drm] VRAM[0, 0]: DPA range:
>>> [0x0000000000000000-1000000000], io range:
>>> [0x0000202000000000-202fff000000]
>>> [ 656.182768] xe 0000:4d:00.0: [drm] VRAM[1, 1]: Actual physical size
>>> 0x0000001000000000, usable size exclude stolen 0x0000000fff000000, CPU
>>> accessible size 0x0000000fff000000
>>> [ 656.182772] xe 0000:4d:00.0: [drm] VRAM[1, 1]: DPA range:
>>> [0x0000001000000000-2000000000], io range:
>>> [0x0000203000000000-203fff000000]
>>> [ 656.182775] xe 0000:4d:00.0: [drm] Total VRAM: 0x0000202000000000,
>>> 0x0000002000000000
>>> [ 656.182778] xe 0000:4d:00.0: [drm] Available VRAM:
>>> 0x0000202000000000, 0x0000001ffe000000
>>> [ 656.348657] xe 0000:4d:00.0: [drm] GT0: CCS_MODE=0 config:00400000,
>>> num_engines:1, num_slices:4
>>> [ 656.507619] xe 0000:4d:00.0: [drm] GT1: CCS_MODE=0 config:00400000,
>>> num_engines:1, num_slices:4
>>> [ 656.510848] [drm] Initialized xe 1.1.0 20201103 for 0000:4d:00.0 on
>>> minor 1
>>> [ 665.754402] Console: switching to colour dummy device 80x25
>>> [ 665.754484] [IGT] xe_exec_basic: executing
>>> [ 665.805853] xe 0000:4d:00.0: [drm] GT0: CCS_MODE=0 config:00400000,
>>> num_engines:1, num_slices:4
>>> [ 665.819825] xe 0000:4d:00.0: [drm] GT1: CCS_MODE=0 config:00400000,
>>> num_engines:1, num_slices:4
>>> [ 665.820359] [IGT] xe_exec_basic: starting subtest once-basic
>>> [ 665.968899] [IGT] xe_exec_basic: finished subtest once-basic, SUCCESS
>>> [ 665.969534] [IGT] xe_exec_basic: exiting, ret=0
>>> [ 665.981027] Console: switching to colour frame buffer device 128x48
>>>
>>>
>>> Aravind Iddamsetty (4):
>>> drm: add devm release action
>>> drm/xe: Save and restore PCI state
>>> drm/xe: Extract xe_gt_idle() helper
>>> drm/xe/FLR: Support PCIe FLR
>>>
>>> drivers/gpu/drm/drm_drv.c | 6 ++
>>> drivers/gpu/drm/xe/Makefile | 1 +
>>> drivers/gpu/drm/xe/xe_device_types.h | 6 ++
>>> drivers/gpu/drm/xe/xe_gt.c | 31 +++++++---
>>> drivers/gpu/drm/xe/xe_gt.h | 1 +
>>> drivers/gpu/drm/xe/xe_guc_pc.c | 4 ++
>>> drivers/gpu/drm/xe/xe_pci.c | 57 +++++++++++++++--
>>> drivers/gpu/drm/xe/xe_pci.h | 6 +-
>>> drivers/gpu/drm/xe/xe_pci_err.c | 93 ++++++++++++++++++++++++++++
>>> drivers/gpu/drm/xe/xe_pci_err.h | 13 ++++
>>> include/drm/drm_drv.h | 2 +
>>> 11 files changed, 205 insertions(+), 15 deletions(-)
>>> create mode 100644 drivers/gpu/drm/xe/xe_pci_err.c
>>> create mode 100644 drivers/gpu/drm/xe/xe_pci_err.h
>>>
>>> --
>>> 2.25.1
>>>
next prev parent reply other threads:[~2024-04-08 5:47 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-02 8:58 [PATCH v2 0/4] drm/xe: Support PCIe FLR Aravind Iddamsetty
2024-04-02 8:58 ` [PATCH v2 1/4] drm: add devm release action Aravind Iddamsetty
2024-04-15 4:42 ` Aravind Iddamsetty
2024-04-16 10:28 ` Aravind Iddamsetty
2024-04-02 8:58 ` [PATCH 2/4] drm/xe: Save and restore PCI state Aravind Iddamsetty
2024-04-02 8:58 ` [PATCH 3/4] drm/xe: Extract xe_gt_idle() helper Aravind Iddamsetty
2024-04-02 8:58 ` [PATCH v2 4/4] drm/xe/FLR: Support PCIe FLR Aravind Iddamsetty
2024-04-02 9:01 ` ✓ CI.Patch_applied: success for drm/xe: Support PCIe FLR (rev2) Patchwork
2024-04-02 9:01 ` ✗ CI.checkpatch: warning " Patchwork
2024-04-02 9:02 ` ✓ CI.KUnit: success " Patchwork
2024-04-02 9:57 ` ✓ CI.Build: " Patchwork
2024-04-02 9:59 ` ✓ CI.Hooks: " Patchwork
2024-04-02 10:01 ` ✗ CI.checksparse: warning " Patchwork
2024-04-02 10:33 ` ✓ CI.BAT: success " Patchwork
2024-04-04 22:25 ` [PATCH v2 0/4] drm/xe: Support PCIe FLR Rodrigo Vivi
2024-04-05 5:00 ` Aravind Iddamsetty
2024-04-08 5:50 ` Aravind Iddamsetty [this message]
-- strict thread matches above, loose matches on Subject: below --
2024-04-17 8:41 Aravind Iddamsetty
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=45e36b75-54c8-45cb-9683-b16faab15833@linux.intel.com \
--to=aravind.iddamsetty@linux.intel.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-xe@lists.freedesktop.org \
--cc=lucas.demarchi@intel.com \
--cc=rodrigo.vivi@intel.com \
--cc=thomas.hellstrom@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox