All of lore.kernel.org
 help / color / mirror / Atom feed
From: Aravind Iddamsetty <aravind.iddamsetty@linux.intel.com>
To: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: intel-xe@lists.freedesktop.org, thomas.hellstrom@linux.intel.com,
	lucas.demarchi@intel.com, dri-devel@lists.freedesktop.org
Subject: Re: [PATCH v2 0/4] drm/xe: Support PCIe FLR
Date: Mon, 8 Apr 2024 11:20:34 +0530	[thread overview]
Message-ID: <45e36b75-54c8-45cb-9683-b16faab15833@linux.intel.com> (raw)
In-Reply-To: <6cbe2744-d184-4c7b-8972-eb09a87b5295@linux.intel.com>


On 05/04/24 10:30, Aravind Iddamsetty wrote:
> On 05/04/24 03:55, Rodrigo Vivi wrote:
>> On Tue, Apr 02, 2024 at 02:28:55PM +0530, Aravind Iddamsetty wrote:
>>> PCI subsystem provides callbacks to inform the driver about a request to
>>> do function level reset by user, initiated by writing to sysfs entry
>>> /sys/bus/pci/devices/.../reset. This will allow the driver to handle FLR
>>> without the need to do unbind and rebind as the driver needs to
>>> reinitialize the device afresh post FLR.
>>>
>>> v2:
>> all the patches looks good to me here. feel free to use
>>
>> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>
>> on them.
> Thank you!
>
>> but I do have some concerns (below)
>>
>>> 1. Directly expose the devm_drm_dev_release_action instead of introducing
>>> a helper (Rodrigo)
>>> 2. separate out gt idle and pci save/restore to a separate patch (Lucas)
>>> 3. Fixed the warnings seen around xe_guc_submit_stop, xe_guc_puc_fini
>> On this I also had to fight to get something working on the wedged_mode=2:
>> lore.kernel.org/all/20240403150732.102678-4-rodrigo.vivi@intel.com
>>
>> perhaps we can unify things here.
> I guess we dealing with different scenarios, in this the warning in xe_guc_submit_stop
> was because not invoking xe_uc_reset_prepare before. and we needn't invoke
> xe_guc_pc_fini as guc is already in stopped.
>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>>>
>>> dmesg snip showing FLR recovery:
>> things came different at my DG2 here with display working and all:
> after you mentioned this i tested on DG2 i got warnings but no segmentation fault
> and NPD, i have tested my branch which might not be update to date, will re test with the
> latest branch.

While I check upon this is it ok to have this version of series to be merged, as I see
even with warnings with display the device and driver are recoverable.

Thanks,
Aravind.
>
>
> Thanks,
> Aravnd.
>> root@rdvivi-desk:/sys/module/xe/drivers/pci:xe/0000:03:00.0# echo 1 > reset
>> Segmentation fault
>>
>> and many kernel warnings
>>  WARNING: CPU: 8 PID: 2389 at drivers/gpu/drm/i915/display/intel_display_power_well.c:281 hsw_wait_for_power_well_enable+0x3e7/0x570 [xe]
>>  WARNING: CPU: 9 PID: 1700 at drivers/gpu/drm/drm_mm.c:999 drm_mm_takedown+0x41/0x60
>>
>> [  117.128330] KASAN: null-ptr-deref in range [0x00000000000004e8-0x00000000000004ef]
>> [  117.128332] CPU: 13 PID: 2389 Comm: bash Tainted: G        W          6.9.0-rc1+ #9
>> [  117.135501]  ? exc_invalid_op+0x13/0x40
>> [  117.143626] Hardware name: iBUYPOWER INTEL/B660 DS3H AC DDR4-Y1, BIOS F5 12/17/2021
>> [  117.143627] RIP: 0010:__mutex_lock+0x124/0x14a0
>> [  117.143631] Code: d0 7c 08 84 d2 0f 85 62 0f 00 00 8b 0d 85 c8 8f 04 85 c9 75 29 48 b8 00 00 00 00 00 fc ff df 49 8d 7f 68 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 46 0f 00 00 4d 3b 7f 68 0f 85 aa 0e 00 00 bf 01
>> [  117.150630]  ? asm_exc_invalid_op+0x16/0x20
>> [  117.156401] RSP: 0018:ffffc90005a37690 EFLAGS: 00010202
>> [  117.156403] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
>> [  117.163571]  ? drm_buddy_fini+0x181/0x220
>>
>>
>> and more issues.
>>
>> so it looks like we are still missing some parts of the puzzle here...
>>
>>
>>> [  590.486336] xe 0000:4d:00.0: enabling device (0140 -> 0142)
>>> [  590.506933] xe 0000:4d:00.0: [drm] Using GuC firmware from
>>> xe/pvc_guc_70.20.0.bin version 70.20.0
>>> [  590.542355] xe 0000:4d:00.0: [drm] Using GuC firmware from
>>> xe/pvc_guc_70.20.0.bin version 70.20.0
>>> [  590.578532] xe 0000:4d:00.0: [drm] VISIBLE VRAM: 0x0000202000000000,
>>> 0x0000002000000000
>>> [  590.578556] xe 0000:4d:00.0: [drm] VRAM[0, 0]: Actual physical size
>>> 0x0000001000000000, usable size exclude stolen 0x0000000fff000000, CPU
>>> accessible size 0x0000000fff000000
>>> [  590.578560] xe 0000:4d:00.0: [drm] VRAM[0, 0]: DPA range:
>>> [0x0000000000000000-1000000000], io range:
>>> [0x0000202000000000-202fff000000]
>>> [  590.578585] xe 0000:4d:00.0: [drm] VRAM[1, 1]: Actual physical size
>>> 0x0000001000000000, usable size exclude stolen 0x0000000fff000000, CPU
>>> accessible size 0x0000000fff000000
>>> [  590.578589] xe 0000:4d:00.0: [drm] VRAM[1, 1]: DPA range:
>>> [0x0000001000000000-2000000000], io range:
>>> [0x0000203000000000-203fff000000]
>>> [  590.578592] xe 0000:4d:00.0: [drm] Total VRAM: 0x0000202000000000,
>>> 0x0000002000000000
>>> [  590.578594] xe 0000:4d:00.0: [drm] Available VRAM:
>>> 0x0000202000000000, 0x0000001ffe000000
>>> [  590.738899] xe 0000:4d:00.0: [drm] GT0: CCS_MODE=0 config:00400000,
>>> num_engines:1, num_slices:4
>>> [  590.889991] xe 0000:4d:00.0: [drm] GT1: CCS_MODE=0 config:00400000,
>>> num_engines:1, num_slices:4
>>> [  590.892835] [drm] Initialized xe 1.1.0 20201103 for 0000:4d:00.0 on
>>> minor 1
>>> [  590.900215] xe 0000:9a:00.0: enabling device (0140 -> 0142)
>>> [  590.915991] xe 0000:9a:00.0: [drm] Using GuC firmware from
>>> xe/pvc_guc_70.20.0.bin version 70.20.0
>>> [  590.957450] xe 0000:9a:00.0: [drm] Using GuC firmware from
>>> xe/pvc_guc_70.20.0.bin version 70.20.0
>>> [  590.989863] xe 0000:9a:00.0: [drm] VISIBLE VRAM: 0x000020e000000000,
>>> 0x0000002000000000
>>> [  590.989888] xe 0000:9a:00.0: [drm] VRAM[0, 0]: Actual physical size
>>> 0x0000001000000000, usable size exclude stolen 0x0000000fff000000, CPU
>>> accessible size 0x0000000fff000000
>>> [  590.989893] xe 0000:9a:00.0: [drm] VRAM[0, 0]: DPA range:
>>> [0x0000000000000000-1000000000], io range:
>>> [0x000020e000000000-20efff000000]
>>> [  590.989918] xe 0000:9a:00.0: [drm] VRAM[1, 1]: Actual physical size
>>> 0x0000001000000000, usable size exclude stolen 0x0000000fff000000, CPU
>>> accessible size 0x0000000fff000000
>>> [  590.989921] xe 0000:9a:00.0: [drm] VRAM[1, 1]: DPA range:
>>> [0x0000001000000000-2000000000], io range:
>>> [0x000020f000000000-20ffff000000]
>>> [  590.989924] xe 0000:9a:00.0: [drm] Total VRAM: 0x000020e000000000,
>>> 0x0000002000000000
>>> [  590.989927] xe 0000:9a:00.0: [drm] Available VRAM:
>>> 0x000020e000000000, 0x0000001ffe000000
>>> [  591.142061] xe 0000:9a:00.0: [drm] GT0: CCS_MODE=0 config:00400000,
>>> num_engines:1, num_slices:4
>>> [  591.293505] xe 0000:9a:00.0: [drm] GT1: CCS_MODE=0 config:00400000,
>>> num_engines:1, num_slices:4
>>> [  591.295487] [drm] Initialized xe 1.1.0 20201103 for 0000:9a:00.0 on
>>> minor 2
>>> [  610.685993] Console: switching to colour dummy device 80x25
>>> [  610.686118] [IGT] xe_exec_basic: executing
>>> [  610.755398] xe 0000:4d:00.0: [drm] GT0: CCS_MODE=0 config:00400000,
>>> num_engines:1, num_slices:4
>>> [  610.771783] xe 0000:4d:00.0: [drm] GT1: CCS_MODE=0 config:00400000,
>>> num_engines:1, num_slices:4
>>> [  610.773542] [IGT] xe_exec_basic: starting subtest once-basic
>>> [  610.960251] [IGT] xe_exec_basic: finished subtest once-basic, SUCCESS
>>> [  610.962741] [IGT] xe_exec_basic: exiting, ret=0
>>> [  610.977203] Console: switching to colour frame buffer device 128x48
>>> [  611.006675] xe_exec_basic (3237) used greatest stack depth: 11128
>>> bytes left
>>> [  644.682201] xe 0000:4d:00.0: [drm] GT0: CCS_MODE=0 config:00400000,
>>> num_engines:1, num_slices:4
>>> [  644.699060] xe 0000:4d:00.0: [drm] GT1: CCS_MODE=0 config:00400000,
>>> num_engines:1, num_slices:4
>>> [  644.699118] xe 0000:4d:00.0: preparing for PCIe FLR reset
>>> [  644.699149] xe 0000:4d:00.0: [drm] removing device access to
>>> userspace
>>> [  644.928577] xe 0000:4d:00.0: PCI device went through FLR, reenabling
>>> the device
>>> [  656.104233] xe 0000:4d:00.0: [drm] Using GuC firmware from
>>> xe/pvc_guc_70.20.0.bin version 70.20.0
>>> [  656.149525] xe 0000:4d:00.0: [drm] Using GuC firmware from
>>> xe/pvc_guc_70.20.0.bin version 70.20.0
>>> [  656.182711] xe 0000:4d:00.0: [drm] VISIBLE VRAM: 0x0000202000000000,
>>> 0x0000002000000000
>>> [  656.182737] xe 0000:4d:00.0: [drm] VRAM[0, 0]: Actual physical size
>>> 0x0000001000000000, usable size exclude stolen 0x0000000fff000000, CPU
>>> accessible size 0x0000000fff000000
>>> [  656.182742] xe 0000:4d:00.0: [drm] VRAM[0, 0]: DPA range:
>>> [0x0000000000000000-1000000000], io range:
>>> [0x0000202000000000-202fff000000]
>>> [  656.182768] xe 0000:4d:00.0: [drm] VRAM[1, 1]: Actual physical size
>>> 0x0000001000000000, usable size exclude stolen 0x0000000fff000000, CPU
>>> accessible size 0x0000000fff000000
>>> [  656.182772] xe 0000:4d:00.0: [drm] VRAM[1, 1]: DPA range:
>>> [0x0000001000000000-2000000000], io range:
>>> [0x0000203000000000-203fff000000]
>>> [  656.182775] xe 0000:4d:00.0: [drm] Total VRAM: 0x0000202000000000,
>>> 0x0000002000000000
>>> [  656.182778] xe 0000:4d:00.0: [drm] Available VRAM:
>>> 0x0000202000000000, 0x0000001ffe000000
>>> [  656.348657] xe 0000:4d:00.0: [drm] GT0: CCS_MODE=0 config:00400000,
>>> num_engines:1, num_slices:4
>>> [  656.507619] xe 0000:4d:00.0: [drm] GT1: CCS_MODE=0 config:00400000,
>>> num_engines:1, num_slices:4
>>> [  656.510848] [drm] Initialized xe 1.1.0 20201103 for 0000:4d:00.0 on
>>> minor 1
>>> [  665.754402] Console: switching to colour dummy device 80x25
>>> [  665.754484] [IGT] xe_exec_basic: executing
>>> [  665.805853] xe 0000:4d:00.0: [drm] GT0: CCS_MODE=0 config:00400000,
>>> num_engines:1, num_slices:4
>>> [  665.819825] xe 0000:4d:00.0: [drm] GT1: CCS_MODE=0 config:00400000,
>>> num_engines:1, num_slices:4
>>> [  665.820359] [IGT] xe_exec_basic: starting subtest once-basic
>>> [  665.968899] [IGT] xe_exec_basic: finished subtest once-basic, SUCCESS
>>> [  665.969534] [IGT] xe_exec_basic: exiting, ret=0
>>> [  665.981027] Console: switching to colour frame buffer device 128x48
>>>
>>>
>>> Aravind Iddamsetty (4):
>>>   drm: add devm release action
>>>   drm/xe: Save and restore PCI state
>>>   drm/xe: Extract xe_gt_idle() helper
>>>   drm/xe/FLR: Support PCIe FLR
>>>
>>>  drivers/gpu/drm/drm_drv.c            |  6 ++
>>>  drivers/gpu/drm/xe/Makefile          |  1 +
>>>  drivers/gpu/drm/xe/xe_device_types.h |  6 ++
>>>  drivers/gpu/drm/xe/xe_gt.c           | 31 +++++++---
>>>  drivers/gpu/drm/xe/xe_gt.h           |  1 +
>>>  drivers/gpu/drm/xe/xe_guc_pc.c       |  4 ++
>>>  drivers/gpu/drm/xe/xe_pci.c          | 57 +++++++++++++++--
>>>  drivers/gpu/drm/xe/xe_pci.h          |  6 +-
>>>  drivers/gpu/drm/xe/xe_pci_err.c      | 93 ++++++++++++++++++++++++++++
>>>  drivers/gpu/drm/xe/xe_pci_err.h      | 13 ++++
>>>  include/drm/drm_drv.h                |  2 +
>>>  11 files changed, 205 insertions(+), 15 deletions(-)
>>>  create mode 100644 drivers/gpu/drm/xe/xe_pci_err.c
>>>  create mode 100644 drivers/gpu/drm/xe/xe_pci_err.h
>>>
>>> -- 
>>> 2.25.1
>>>

  reply	other threads:[~2024-04-08  5:47 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-02  8:58 [PATCH v2 0/4] drm/xe: Support PCIe FLR Aravind Iddamsetty
2024-04-02  8:58 ` [PATCH v2 1/4] drm: add devm release action Aravind Iddamsetty
2024-04-15  4:42   ` Aravind Iddamsetty
2024-04-16 10:28   ` Aravind Iddamsetty
2024-04-02  8:58 ` [PATCH 2/4] drm/xe: Save and restore PCI state Aravind Iddamsetty
2024-04-02  8:58 ` [PATCH 3/4] drm/xe: Extract xe_gt_idle() helper Aravind Iddamsetty
2024-04-02  8:58 ` [PATCH v2 4/4] drm/xe/FLR: Support PCIe FLR Aravind Iddamsetty
2024-04-02  9:01 ` ✓ CI.Patch_applied: success for drm/xe: Support PCIe FLR (rev2) Patchwork
2024-04-02  9:01 ` ✗ CI.checkpatch: warning " Patchwork
2024-04-02  9:02 ` ✓ CI.KUnit: success " Patchwork
2024-04-02  9:57 ` ✓ CI.Build: " Patchwork
2024-04-02  9:59 ` ✓ CI.Hooks: " Patchwork
2024-04-02 10:01 ` ✗ CI.checksparse: warning " Patchwork
2024-04-02 10:33 ` ✓ CI.BAT: success " Patchwork
2024-04-04 22:25 ` [PATCH v2 0/4] drm/xe: Support PCIe FLR Rodrigo Vivi
2024-04-05  5:00   ` Aravind Iddamsetty
2024-04-08  5:50     ` Aravind Iddamsetty [this message]
  -- strict thread matches above, loose matches on Subject: below --
2024-04-17  8:41 Aravind Iddamsetty

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=45e36b75-54c8-45cb-9683-b16faab15833@linux.intel.com \
    --to=aravind.iddamsetty@linux.intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=lucas.demarchi@intel.com \
    --cc=rodrigo.vivi@intel.com \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.