Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Dong, Zhanjun" <zhanjun.dong@intel.com>
To: "Teres Alexis, Alan Previn" <alan.previn.teres.alexis@intel.com>,
	"intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>
Subject: Re: [PATCH v8 1/6] drm/xe/guc: Prepare GuC register list and update ADS size for error capture
Date: Tue, 14 May 2024 18:44:58 -0400	[thread overview]
Message-ID: <7cb2095e-b808-4ca2-aa2c-1c793526197a@intel.com> (raw)
In-Reply-To: <f55c5f7ef0ab613be9372d1f30cfb5071e872435.camel@intel.com>

See my comments inline below.

Regards,
Zhanjun

On 2024-05-10 2:43 p.m., Teres Alexis, Alan Previn wrote:
> On Mon, 2024-05-06 at 18:47 -0700, Zhanjun Dong wrote:
>> Add referenced registers defines and list of registers.
>> Update GuC ADS size allocation to include space for
>> the lists of error state capture register descriptors.
>>
>>
> alan:snip
>> +int xe_calculate_guc_regs_capture_worst_size(struct xe_guc *guc)
>> +{
>> +       size_t total_size, class_size, instance_size, global_size;
>> +       int i, j;
>> +
>> +       /* This function calculates the worst case register lists size by
>> +        * including all possible engines classes. It is called during the
>> +        * first of a two-phase GuC (and ADS-population) initialization
>> +        * sequence, that is, during the pre-hwconfig phase before we have
>> +        * the exact engine fusing info.
>> +        */
>> +       total_size = PAGE_SIZE; /* Pad a page in front for empty lists */
>> +       for (i = 0; i < GUC_CAPTURE_LIST_INDEX_MAX; i++) {
>> +               for (j = 0; j < GUC_LAST_ENGINE_CLASS; j++) {
> alan: so in rev6, i mentioned in one of the other hunks that in guc interface,
> the 'max engine class' definition used by guc-error-capture register-list-arrays
> does not follow the definition of engine-class enum of above (which is used
> for submission and other interfaces with GuC). I notice you fixed this on
> another hunk but missed it above. I should have made it clear this change is
> required for ALL cases where we are looping through the engine classes' register
> lists used in guc- error-capture.
Thanks, will changed to GUC_CAPTURE_LIST_CLASS_MAX

>> +                       if (xe_guc_capture_getlistsize(guc, i,
>> +                                                      GUC_CAPTURE_LIST_TYPE_ENGINE_CLASS,
>> +                                                      j, &class_size) < 0)
>> +                               class_size = 0;
>> +                       if (xe_guc_capture_getlistsize(guc, i,
>> +                                                      GUC_CAPTURE_LIST_TYPE_ENGINE_INSTANCE,
>> +                                                      j, &instance_size) < 0)
>>
> alan:snip
> 
>>   #define MAX_GOLDEN_LRC_SIZE    (SZ_4K * 64)
>>   
>>   int xe_guc_ads_init(struct xe_guc_ads *ads)
>> @@ -398,6 +443,7 @@ int xe_guc_ads_init(struct xe_guc_ads *ads)
>>          struct xe_bo *bo;
>>   
>>          ads->golden_lrc_size = calculate_golden_lrc_size(ads);
>> +       ads->capture_size = xe_calculate_guc_regs_capture_worst_size(ads_to_guc(ads));
>>          ads->regset_size = calculate_regset_size(gt);
>>          ads->ads_waklv_size = calculate_waklv_size(ads);
>>   
>> @@ -431,9 +477,10 @@ int xe_guc_ads_init_post_hwconfig(struct xe_guc_ads *ads)
>>          xe_gt_assert(gt, ads->bo);
>>   
>>          ads->golden_lrc_size = calculate_golden_lrc_size(ads);
>> +       ads->capture_size = xe_calculate_guc_regs_capture_worst_size(ads_to_guc(ads));
> alan: nit: we had an offline conversation that despite post_hw_config knowing
> exact engine class list, we simpliify and improve execution time by using
> worst-case-size in both pre and post hwconfig when deciding on size. Would be nice to
> add a comment on that? (so folks dont assume its a bug)
Sure, will do.
> 
> alan:snip
> 
> 
>>          ads->regset_size = calculate_regset_size(gt);
>>   
>> -       xe_gt_assert(gt, ads->golden_lrc_size +
>> +       xe_gt_assert(gt, ads->golden_lrc_size + ads->capture_size +
>>                       (ads->regset_size - prev_regset_size) <=
>>                       MAX_GOLDEN_LRC_SIZE);
> alan: i missed this in rev6 - why are we using ads->capture_size to verify MAX_GOLDEN_LRC_SIZE is
> sufficient? i dont believe this macro is related to guc-error-capture reg list.
> 
Yes, will remove it.
> alan:snip
> 
>> +static int guc_capture_prep_lists(struct xe_guc_ads *ads)
>> +{
>> +       struct xe_guc *guc = ads_to_guc(ads);
>> +       struct xe_gt *gt = ads_to_gt(ads);
>> +       u32 ads_ggtt, capture_offset, null_ggtt, total_size = 0;
>> +       struct iosys_map info_map;
>> +       size_t size = 0;
>> +       void *ptr;
>>          int i, j;
> alan:snip
>> +                       /********************************************************/
>> +                       /*** engine exists: start with engine-class registers ***/
>> +                       /********************************************************/
>> +                       write_empty_list = true; /* starting assumption is an empty list */
>> +                       size = 0;
>> +                       if (!xe_guc_capture_getlistsize(guc, i,
>> +                                                       GUC_CAPTURE_LIST_TYPE_ENGINE_CLASS,
>> +                                                       j, &size)) {
>> +                               if (total_size + size > ads->capture_size)
>> +                                       xe_gt_dbg(gt, "Capture size overflow :%lu vs %d\n",
>> +                                                 total_size + size, ads->capture_size);
> alan: i think premerge hooks are failing due to format specifiers - please do fix that.
> alan:snip
>> +       if (ads->capture_size != PAGE_ALIGN(total_size))
>> +               xe_gt_info(gt, "ADS capture alloc size changed from %d to %d\n",
>> +                          ads->capture_size, PAGE_ALIGN(total_size));
>> +       return PAGE_ALIGN(total_size);
>>   }
> alan:snip
> 
>> +#define COMMON_XELP_BASE_GLOBAL \
>> +       { FORCEWAKE_GT,             0,      0, "FORCEWAKE" }
>> +
>> +#define COMMON_BASE_ENGINE_INSTANCE \
>> +       { RING_ESR(0),              0,      0, "ESR" }, \
>> +       { RING_EMR(0),              0,      0, "EMR" }, \
>> +       { RING_EIR(0),              0,      0, "EIR" }, \
>> +       { RING_EXECLIST_STATUS_HI(0), 0,    0, "RING_EXECLIST_STATUS_HI" }, \
>> +       { RING_EXECLIST_STATUS_LO(0), 0,    0, "RING_EXECLIST_STATUS_LO" }, \
>> +       { RING_DMA_FADD(0),         0,      0, "RING_DMA_FADD_LDW" }, \
>> +       { RING_DMA_FADD_UDW(0),     0,      0, "RING_DMA_FADD_UDW" }, \
>> +       { RING_IPEHR(0),            0,      0, "IPEHR" }, \
>> +       { RING_BBADDR(0),           0,      0, "RING_BBADDR_LOW32" }, \
>> +       { RING_BBADDR_UDW(0),       0,      0, "RING_BBADDR_UP32" }, \
>> +       { RING_ACTHD(0),            0,      0, "ACTHD_LDW" }, \
>> +       { RING_ACTHD_UDW(0),        0,      0, "ACTHD_UDW" }, \
>> +       { RING_START(0),            0,      0, "START" }, \
>> +       { RING_HEAD(0),             0,      0, "HEAD" }, \
>> +       { RING_TAIL(0),             0,      0, "TAIL" }, \
>> +       { RING_CTL(0),              0,      0, "CTL" }, \
>> +       { RING_MI_MODE(0),          0,      0, "MODE" }, \
>> +       { RING_HWS_PGA(0),          0,      0, "HWS" }, \
>> +       { RING_MODE(0),             0,      0, "GFX_MODE" }
> alan: I notice that "struct __guc_mmio_reg_descr.regname" is introduced here in
> Patch-#1 and used in the population of the static reglist table above but then
> never ever referenced in the entire series. In fact in Patch #6, its removed.
> 
> NOTE: this is different from i915 where this variable was used in the reporting
> patch of i915's series but it looks like from this xe series version, we are
> creating a shared "register->offset->name" lookup table list in xe_hw_engine.c
> for both guc and execlist in Patch 6.
> 
> I believe this flirts dangerously close to breaking linux patch rules. A variable/
> func not used in the entire series really shouldnt be added. In this case, we are
> NOT consuming regname at all anywhere in the series. And then we remove it in the
> subsequent patch-6.
Yes, will get it removed in next patch.
Meanwhile, as there are 2 register list exist in capture and 
hw_engine.c, so I'm also consider to merge them. It could be done in 
separate patch.
> 
> alan:snip
> 
> 
> 
>> +static int
>> +guc_capture_getlistsize(struct xe_guc *guc, u32 owner, u32 type, u32 classid,
>> +                       size_t *size, bool is_purpose_est)
>> +{
>> +       struct xe_guc_state_capture *gc = guc->capture;
>> +       struct __guc_capture_ads_cache *cache = &gc->ads_cache[owner][type][classid];
>> +       int num_regs;
>> +
>> +       if (!gc->reglists) {
>> +               xe_gt_warn(guc_to_gt(guc), "No capture reglist for this device\n");
>> +               return -ENODEV;
>> +       }
>> +
>> +       if (cache->is_valid) {
>> +               *size = cache->size;
>> +               return cache->status;
>> +       }
>> +
>> +       if (!is_purpose_est && owner == GUC_CAPTURE_LIST_INDEX_PF &&
> alan: so this one took me a while to figure out why it has become orphaned.
> (after i went through the entire series hunting for it). I'm referign to "is_purpose_est".
> 
> So this is another i915 inherittance which did have valid reasoning. In i915 when we
> were doing "check_guc_capture_size" (which is a one-time event), we wanted to flag if
> any of the register tables were non existent. take note that at this early log-buffer-
> calculation time, we assumed all engine classes are valid on the platform at
> pre-hw-config time. The reason is because we did actually discover cases when new
> platforms added new engines and the developer completely missed adding the register
> table for error capture as well. With this message, it would get flagged as a warning.
> However, this same lower level function was also used for the regular register
> list population which can get called everytime we reset guc including coming
> out of suspend or if we performed a GT reset and so we didn't want that warning
> to happen every single time - only once per module load. so thats why we
> created a wraper with that "is_purpose_est" to differentate if the caller
> was for the log-buffer-size-estmation or for actual population where we only
> print the warn in the former.
> 
> so my review comment would be: we already learnt the lessons from i915 so lets
> use this properly. i suggest we fix patch #3 so that check_guc_capture_size
> will call above guc_capture_getlistsize with is_purpose_est set true as opposed to
> calling xe_calculate_guc_regs_capture_worst_size. I'll make the same comment
> in Patch3

Thanks for point out. Will be fixed in patch 3
> 
> alan:snip
> 
>> +
>> +#endif /* _XE_GUC_CAPTURE_H */
>> diff --git a/drivers/gpu/drm/xe/xe_guc_capture_fwif.h b/drivers/gpu/drm/xe/xe_guc_capture_fwif.h
>> new file mode 100644
>> index 000000000000..79bc277afaa8
>> --- /dev/null
> alan:snip
> 
>> +struct guc_state_capture_group_header_t {
>> +       u32 owner;
> alan: If i read the the spec correctly, capture_group_header's owner
> has VFID field at GENMASK(7,0).
Will add.
>> +       u32 info;
>> +#define CAP_GRP_HDR_NUM_CAPTURES GENMASK(7, 0)
>> +#define CAP_GRP_HDR_CAPTURE_TYPE GENMASK(15, 8) /* guc_capture_group_types */
>> +} __packed;
> alan:snip
> 

  reply	other threads:[~2024-05-14 22:45 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-07  1:47 [PATCH v8 0/6] drm/xe/guc: Add GuC based register capture for error capture Zhanjun Dong
2024-05-07  1:47 ` [PATCH v8 1/6] drm/xe/guc: Prepare GuC register list and update ADS size " Zhanjun Dong
2024-05-10 18:43   ` Teres Alexis, Alan Previn
2024-05-14 22:44     ` Dong, Zhanjun [this message]
2024-05-10 18:58   ` Teres Alexis, Alan Previn
2024-05-07  1:47 ` [PATCH v8 2/6] drm/xe/guc: Add XE_LP steered register lists Zhanjun Dong
2024-05-11  0:17   ` Teres Alexis, Alan Previn
2024-05-14 23:00     ` Dong, Zhanjun
2024-05-07  1:47 ` [PATCH v8 3/6] drm/xe/guc: Add capture size check in GuC log buffer Zhanjun Dong
2024-05-08 22:57   ` Teres Alexis, Alan Previn
2024-05-15 21:39     ` Dong, Zhanjun
2024-05-07  1:47 ` [PATCH v8 4/6] drm/xe/guc: Extract GuC error capture lists Zhanjun Dong
2024-05-11  1:43   ` Teres Alexis, Alan Previn
2024-05-15 21:45     ` Dong, Zhanjun
2024-05-15 21:55       ` Dong, Zhanjun
2024-05-07  1:47 ` [PATCH v8 5/6] drm/xe/guc: Pre-allocate output nodes for extraction Zhanjun Dong
2024-05-11 18:07   ` Teres Alexis, Alan Previn
2024-05-07  1:47 ` [PATCH v8 6/6] drm/xe/guc: Plumb GuC-capture into dev coredump Zhanjun Dong
2024-05-11 20:25   ` Teres Alexis, Alan Previn
2024-05-07  4:17 ` ✓ CI.Patch_applied: success for drm/xe/guc: Add GuC based register capture for error capture (rev8) Patchwork
2024-05-07  4:18 ` ✗ CI.checkpatch: warning " Patchwork
2024-05-07  4:19 ` ✓ CI.KUnit: success " Patchwork
2024-05-07  4:31 ` ✓ CI.Build: " Patchwork
2024-05-07  4:41 ` ✗ CI.Hooks: failure " Patchwork
2024-05-07  4:49 ` ✓ CI.checksparse: success " Patchwork
2024-05-07  5:24 ` ✗ CI.BAT: failure " Patchwork
2024-05-07  9:35 ` ✗ CI.FULL: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7cb2095e-b808-4ca2-aa2c-1c793526197a@intel.com \
    --to=zhanjun.dong@intel.com \
    --cc=alan.previn.teres.alexis@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox