From: "Dong, Zhanjun" <zhanjun.dong@intel.com>
To: "Souza, Jose" <jose.souza@intel.com>,
"intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>
Cc: "Teres Alexis, Alan Previn" <alan.previn.teres.alexis@intel.com>
Subject: Re: [PATCH v6 0/8] drm/xe/guc: Add GuC based register capture for error capture
Date: Wed, 27 Mar 2024 13:46:42 -0400 [thread overview]
Message-ID: <2d4f8f53-e918-412f-a01b-da770546df80@intel.com> (raw)
In-Reply-To: <beb3b7e6a062008afda7007ca40606f848a626e5.camel@intel.com>
On 2024-03-26 11:52 a.m., Souza, Jose wrote:
> On Tue, 2024-03-26 at 11:40 -0400, Dong, Zhanjun wrote:
>> See my comments inline below.
>>
>> Regards,
>> Zhanjun
>> On 2024-03-22 5:36 p.m., Dong, Zhanjun wrote:
>>> See my comments below.
>>>
>>> Regards,
>>> Zhanjun
>>>
>>> On 2024-03-22 5:28 p.m., Souza, Jose wrote:
>>>> On Fri, 2024-03-22 at 11:37 -0700, José Roberto de Souza wrote:
>>>>> On Fri, 2024-03-22 at 09:57 -0700, José Roberto de Souza wrote:
>>>>>> On Tue, 2024-03-19 at 07:36 -0700, Zhanjun Dong wrote:
>>>>>>> Port GuC based register capture for error capture from i915 to Xe.
>>>>>>>
>>>>>>> There are 3 parts inside:
>>>>>>> . Prepare for capture registers
>>>>>>> There is a bo create at guc ads init time, that is very early
>>>>>>> and engi ne map is not ready, make it hard to calculate the
>>>>>>> capture buffer size, new function created for worst case size
>>>>>>> caluation. Other than that, this part basically follows the i915
>>>>>>> design.
>>>>>>> . Process capture notification message
>>>>>>> Basically follows i915 design
>>>>>>> . Sysfs command process.
>>>>>>> Xe switched to devcoredump, adopted command line process with
>>>>>>> captured node list.
>>>>>>
>>>>>> Just some notes after trying this to debug a issue with this series:
>>>>>>
>>>>>> - CONFIG_DRM_XE_CAPTURE_ERROR is not a good name for a config to
>>>>>> enable GuC based capture.
>>>>>> - in my opinion this should not even exist, always capture with
>>>>>> GuC if xe_device_uc_enabled() is true.
>>
>> 3 reasons:
>> 1. This GuC based register error capture is for debug purpose, does it
>> still important for production stage? I want to leave it an option for
>> users who don't need it.
>
> definitely needed in production
>
>> 2. With this opiton disabled, extra memory could be saved. It might not
>> be a big deal for PC, while might be valuable for customers who want to
>> save few 10KBs to 100KB RAM usage.
>
> not relevant amount, having a accurate register dump is more important as allow us to debug issues reproduced by costumers.
>
>> 3. The feature with an opiton is what we have in i915, where the feature
>> is ported from. So make it the same style to avoid users migrated from
>> i915 complains about it.
Although I still think for users need it they can keep the config option
enabled, but after review of i915 code, the kconfig option is not bring
in by the feature. Thus no need to add the option here.
I will make another rev with this option removed.
>>
>>>>>>
>>>>>> - coredump don't have some registers like SAMPLER_INSTDONE,
>>>>>> ROW_INSTDONE, XEHPG_INSTDONE_GEOM_SVG...
>>>
>>> Yes, those steering register print out is missing.
>>>
>>>>>> Here the output in LNL:
>>>>>
>>>>> ah this could explain...
>>>>>
>>>>> [ 152.189386] xe 0000:00:02.0:
>>>>> [drm:xe_hw_engine_snapshot_from_capture [xe]] GT0: GuC error capture
>>>>> is empty, take snapshot from engine.
>>>>> [ 152.190215] xe 0000:00:02.0: [drm] Xe device coredump has been
>>>>> created
>>>>>
>>>>> but both code paths needs to dump the same registers
>>>
>>> I found you have another patch:
>>> drm/xe/devcoredump: Print errno if VM snapshot was not captured
>>> I will take a look.
>>>
>>>>
>>>> Hi Zhanjun
>>>>
>>>> Mesa team needs the INSTDONE registers to debug issues and I have
>>>> patches implementing the dump of this registers.
>>>> Do you mind if I send this patches the list?
>>>
>>> Sure, please send it.
Since you already send the INSTONE patch, my next revision will not
conver this part and will get merged with your patch later.
>>>
>>>>
>>>> I guess this GuC based register capture will need another version and
>>>> someone with experience on GuC to review, so it will take a while...
>>>>
>>>>>
>>>>>>
>>>>>> **** HW Engines ****
>>>>>> rcs0 (physical), logical instance=0
>>>>>> Forcewake: domain 0x2, ref 1
>>>>>> HWSTAM: 0xffffffff
>>>>>> RING_HWS_PGA: 0x01693000
>>>>>> RING_START: 0x029d5000
>>>>>> RING_HEAD: 0x00001914
>>>>>> RING_TAIL: 0x00001968
>>>>>> RING_CTL: 0x00003001
>>>>>> RING_MI_MODE: 0x00001000
>>>>>> RING_MODE: 0x00000008
>>>>>> RING_IMR: 0x00000000
>>>>>> RING_ESR: 0x00000000
>>>>>> RING_EMR: 0xffffffff
>>>>>> RING_EIR: 0x00000000
>>>>>> IPEHR: 0x7a000004
>>>>>> ACTHD: 0x0000effeffff9208
>>>>>> RING_BBADDR: 0x0000effeffff9209
>>>>>> RING_DMA_FADD: 0x00000000029d6968
>>>>>> RING_EXECLIST_STATUS: 0x000000004000279c
>>>>>> RING_EXECLIST_SQ_CONTENTS: 0x00000000029d9719
>>>>>>
>>>>>> - a lot of style issues, see 'CI.checkpatch'
>>
>> After review the CI.checkpatch output, all style issues reported is by
>> purpose.
>>
>>>>>>
>>>>>>>
>>>>>>> Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
>>>>>>> Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
>>>>>>>
>>>>>>> Changes from prior revs:
>>>>>>> v6:- Change hardcoded register snapshot fill to follow mapping
>>>>>>> tables
>>>>>>> When capture is empty, take snapshot from engine
>>>>>>> v5:- Split dss helper code out as an standalone patch
>>>>>>> Remove old platform registers definition.
>>>>>>> Split register map table to 32 and 64bit each
>>>>>>> v4:- Move register map table to xe_hw_engine.c
>>>>>>> v3:- Remove condition compilation in code
>>>>>>> v2:- Split into multiple chunks
>>>>>>>
>>>>>>> Zhanjun Dong (8):
>>>>>>> drm/xe/guc: Add kconfig for GuC based register capture
>>>>>>> drm/xe/guc: Update GuC ADS size for error capture
>>>>>>> drm/xe/guc: Add XE_LP steered register lists
>>>>>>> drm/xe/guc: Add capture size check in GuC log buffer
>>>>>>> drm/xe/guc: Check sizing of guc_capture output
>>>>>>> drm/xe/guc: Extract GuC error capture lists on G2H notification
>>>>>>> drm/xe/guc: Pre-allocate output nodes for extraction
>>>>>>> drm/xe/guc: Plumb GuC-capture into dev coredump
>>>>>>>
>>>>>>> drivers/gpu/drm/xe/Kconfig | 11 +
>>>>>>> drivers/gpu/drm/xe/Makefile | 1 +
>>>>>>> drivers/gpu/drm/xe/abi/guc_actions_abi.h | 7 +
>>>>>>> drivers/gpu/drm/xe/regs/xe_gt_regs.h | 5 +
>>>>>>> drivers/gpu/drm/xe/xe_gt_printk.h | 3 +
>>>>>>> drivers/gpu/drm/xe/xe_guc.c | 6 +
>>>>>>> drivers/gpu/drm/xe/xe_guc_ads.c | 229 +++-
>>>>>>> drivers/gpu/drm/xe/xe_guc_ads_types.h | 2 +
>>>>>>> drivers/gpu/drm/xe/xe_guc_capture.c | 1332
>>>>>>> ++++++++++++++++++++++
>>>>>>> drivers/gpu/drm/xe/xe_guc_capture.h | 21 +
>>>>>>> drivers/gpu/drm/xe/xe_guc_capture_fwif.h | 221 ++++
>>>>>>> drivers/gpu/drm/xe/xe_guc_ct.c | 2 +
>>>>>>> drivers/gpu/drm/xe/xe_guc_fwif.h | 68 ++
>>>>>>> drivers/gpu/drm/xe/xe_guc_log.c | 179 +++
>>>>>>> drivers/gpu/drm/xe/xe_guc_log.h | 15 +
>>>>>>> drivers/gpu/drm/xe/xe_guc_log_types.h | 26 +
>>>>>>> drivers/gpu/drm/xe/xe_guc_submit.c | 21 +-
>>>>>>> drivers/gpu/drm/xe/xe_guc_submit.h | 3 +
>>>>>>> drivers/gpu/drm/xe/xe_guc_types.h | 2 +
>>>>>>> drivers/gpu/drm/xe/xe_hw_engine.c | 261 ++++-
>>>>>>> drivers/gpu/drm/xe/xe_hw_engine.h | 4 +
>>>>>>> drivers/gpu/drm/xe/xe_hw_engine_types.h | 117 +-
>>>>>>> 22 files changed, 2421 insertions(+), 115 deletions(-)
>>>>>>> create mode 100644 drivers/gpu/drm/xe/xe_guc_capture.c
>>>>>>> create mode 100644 drivers/gpu/drm/xe/xe_guc_capture.h
>>>>>>> create mode 100644 drivers/gpu/drm/xe/xe_guc_capture_fwif.h
>>>>>>>
>>>>>>
>>>>>
>>>>
>
prev parent reply other threads:[~2024-03-27 17:46 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-19 14:36 [PATCH v6 0/8] drm/xe/guc: Add GuC based register capture for error capture Zhanjun Dong
2024-03-19 14:36 ` [PATCH v6 1/8] drm/xe/guc: Add kconfig for GuC based register capture Zhanjun Dong
2024-03-19 14:36 ` [PATCH v6 2/8] drm/xe/guc: Update GuC ADS size for error capture Zhanjun Dong
2024-03-19 14:36 ` [PATCH v6 3/8] drm/xe/guc: Add XE_LP steered register lists Zhanjun Dong
2024-03-19 14:36 ` [PATCH v6 4/8] drm/xe/guc: Add capture size check in GuC log buffer Zhanjun Dong
2024-03-19 14:36 ` [PATCH v6 5/8] drm/xe/guc: Check sizing of guc_capture output Zhanjun Dong
2024-03-19 14:36 ` [PATCH v6 6/8] drm/xe/guc: Extract GuC error capture lists on G2H notification Zhanjun Dong
2024-03-19 14:36 ` [PATCH v6 7/8] drm/xe/guc: Pre-allocate output nodes for extraction Zhanjun Dong
2024-03-19 14:36 ` [PATCH v6 8/8] drm/xe/guc: Plumb GuC-capture into dev coredump Zhanjun Dong
2024-03-19 14:41 ` ✓ CI.Patch_applied: success for drm/xe/guc: Add GuC based register capture for error capture (rev5) Patchwork
2024-03-19 14:42 ` ✗ CI.checkpatch: warning " Patchwork
2024-03-19 14:43 ` ✓ CI.KUnit: success " Patchwork
2024-03-19 14:53 ` ✓ CI.Build: " Patchwork
2024-03-19 14:55 ` ✗ CI.Hooks: failure " Patchwork
2024-03-19 14:57 ` ✓ CI.checksparse: success " Patchwork
2024-03-19 15:19 ` ✓ CI.BAT: " Patchwork
2024-03-22 16:57 ` [PATCH v6 0/8] drm/xe/guc: Add GuC based register capture for error capture Souza, Jose
2024-03-22 18:37 ` Souza, Jose
2024-03-22 21:28 ` Souza, Jose
2024-03-22 21:36 ` Dong, Zhanjun
2024-03-26 15:40 ` Dong, Zhanjun
2024-03-26 15:52 ` Souza, Jose
2024-03-27 17:46 ` Dong, Zhanjun [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2d4f8f53-e918-412f-a01b-da770546df80@intel.com \
--to=zhanjun.dong@intel.com \
--cc=alan.previn.teres.alexis@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=jose.souza@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.