From: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
To: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>, intel-xe@lists.freedesktop.org
Subject: Re: [Intel-xe] [PATCH v2] drm/xe: Skip XY_FAST_COLOR instruction on link copy engines
Date: Wed, 8 Mar 2023 11:58:15 +0530 [thread overview]
Message-ID: <ZAgq/3Rh3fb2iyRD@bvivekan-mobl> (raw)
In-Reply-To: <20230307090643.z5ncdvt6vmyimosf@ldmartin-desk2.lan>
On 07.03.2023 01:06, Lucas De Marchi wrote:
> On Tue, Mar 07, 2023 at 01:39:16PM +0530, Balasubramani Vivekanandan wrote:
> > Link copy engines doesn't support the XY_FAST_COLOR instruction.
> > Currently this instruction is used only at one place to clear a ttm
> > resource while migrating a BO.
> > A new device_info member is created to know if a platform has link copy
> > engine. If it supports, then instead of using XY_FAST_COLOR instruction,
> > MEM_SET is used which is available both in main and link copy engines.
> >
> > BSpec: 68433
> >
> > Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
> > ---
> > drivers/gpu/drm/xe/regs/xe_gpu_commands.h | 9 ++++
> > drivers/gpu/drm/xe/xe_device_types.h | 2 +
> > drivers/gpu/drm/xe/xe_migrate.c | 65 ++++++++++++++++-------
> > drivers/gpu/drm/xe/xe_pci.c | 4 ++
> > 4 files changed, 60 insertions(+), 20 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/xe/regs/xe_gpu_commands.h b/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
> > index 288576035ce3..df9ed4fbf2bf 100644
> > --- a/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
> > +++ b/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
> > @@ -6,6 +6,8 @@
> > #ifndef _XE_GPU_COMMANDS_H_
> > #define _XE_GPU_COMMANDS_H_
> >
> > +#include "regs/xe_reg_defs.h"
>
> for the uses we are making here, it should suffice to
> include linux/bitmap.h
This file is using the macro REG_BIT(), definition of which is included
via the file regs/xe_reg_defs.h . So I had to include regs/xe_reg_defs.h
Though REG_BIT() was preexisting in xe_gpu_commands.h, this patch
uncovered the missing inclusion of regs/xe_reg_defs.h
>
> > +
> > #define INSTR_CLIENT_SHIFT 29
> > #define INSTR_MI_CLIENT 0x0
> > #define __INSTR(client) ((client) << INSTR_CLIENT_SHIFT)
> > @@ -56,6 +58,13 @@
> > #define GEN9_XY_FAST_COPY_BLT_CMD (2 << 29 | 0x42 << 22)
> > #define BLT_DEPTH_32 (3<<24)
> >
> > +#define PVC_MEM_SET_CMD (2 << 29 | 0x5b << 22)
> > +#define PVC_MEM_SET_CMD_LEN_DW 7
> > +#define PVC_MS_MATRIX REG_BIT(17)
> > +/* Bspec lists field as [6:0], but index alone is from [6:1] */
> > +#define PVC_MS_MOCS_INDEX_MASK GENMASK(6, 1)
> > +#define PVC_MS_DATA_FIELD GENMASK(31, 24)
>
> wrong order. Should be defining the most significative bits first,
> like they appear in bspec.
I will reorder the definitions in the order of dword, followed by most
significant bits of the dword.
>
> > +
> > #define GFX_OP_PIPE_CONTROL(len) ((0x3<<29)|(0x3<<27)|(0x2<<24)|((len)-2))
> > #define PIPE_CONTROL_TILE_CACHE_FLUSH (1<<28)
> > #define PIPE_CONTROL_AMFS_FLUSH (1<<25)
> > diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
> > index 199bd37fce9a..a73c5e1d7503 100644
> > --- a/drivers/gpu/drm/xe/xe_device_types.h
> > +++ b/drivers/gpu/drm/xe/xe_device_types.h
> > @@ -95,6 +95,8 @@ struct xe_device {
> > bool has_4tile;
> > /** @has_range_tlb_invalidation: Has range based TLB invalidations */
> > bool has_range_tlb_invalidation;
> > + /** @has_link_copy_engines: Whether the platform has link copy engines */
> > + bool has_link_copy_engine;
> > /** @enable_display: display enabled */
> > bool enable_display;
> >
> > diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
> > index bc69ec17d5ad..59fd588a1faf 100644
> > --- a/drivers/gpu/drm/xe/xe_migrate.c
> > +++ b/drivers/gpu/drm/xe/xe_migrate.c
> > @@ -750,32 +750,57 @@ static int emit_clear(struct xe_gt *gt, struct xe_bb *bb, u64 src_ofs,
> > u32 size, u32 pitch, u32 value, bool is_vram)
> > {
> > u32 *cs = bb->cs + bb->len;
> > - u32 len = XY_FAST_COLOR_BLT_DW;
> > + u32 len;
> > u32 mocs = xe_mocs_index_to_value(gt->mocs.uc_index);
> > + struct xe_device *xe = gt_to_xe(gt);
> >
> > - if (GRAPHICS_VERx100(gt->xe) < 1250)
> > - len = 11;
> > -
> > - *cs++ = XY_FAST_COLOR_BLT_CMD | XY_FAST_COLOR_BLT_DEPTH_32 |
> > - (len - 2);
> > - *cs++ = FIELD_PREP(XY_FAST_COLOR_BLT_MOCS_MASK, mocs) |
> > - (pitch - 1);
> > - *cs++ = 0;
> > - *cs++ = (size / pitch) << 16 | pitch / 4;
> > - *cs++ = lower_32_bits(src_ofs);
> > - *cs++ = upper_32_bits(src_ofs);
> > - *cs++ = (is_vram ? 0x0 : 0x1) << XY_FAST_COLOR_BLT_MEM_TYPE_SHIFT;
> > - *cs++ = value;
> > - *cs++ = 0;
> > - *cs++ = 0;
> > - *cs++ = 0;
> > -
> > - if (len > 11) {
> > - *cs++ = 0;
> > + if (xe->info.has_link_copy_engine) {
>
> I wonder if instead of doing this we have here one example of function
> hook that could provide some value. +Francois.
>
> And take a look at the discussion in 20230303145015.1018870-1-francois.dugast@intel.com
>
> I wouldn't block this series on that though.
I am very much interested in the proposal. I wish to have such a change
in the driver. Like you said, we can take it up as an improvment.
Regards,
Bala
>
>
> > + /* MEM_SET command supports setting only 8-bit value.
>
> leave a blank line:
>
> /*
> * MEM_SET ...
>
> is the more usual style.
>
> > + * This function is currently used only to clear the address
> > + * range. So the value agrument is not really used. Need to
> > + * have a better handling when there is a need to actually set
> > + * a value. Print a warning if a value bigger than 8-bit is
> > + * passed
> > + */
> > + XE_WARN_ON(value > U8_MAX);
> > +
> > + len = PVC_MEM_SET_CMD_LEN_DW;
> > +
> > + *cs++ = PVC_MEM_SET_CMD | PVC_MS_MATRIX |
> > + (PVC_MEM_SET_CMD_LEN_DW - 2);
> > + *cs++ = pitch - 1;
> > + *cs++ = (size / pitch) - 1;
> > + *cs++ = pitch - 1;
> > + *cs++ = lower_32_bits(src_ofs);
> > + *cs++ = upper_32_bits(src_ofs);
> > + *cs++ = FIELD_PREP(PVC_MS_DATA_FIELD, value) |
> > + FIELD_PREP(PVC_MS_MOCS_INDEX_MASK, mocs);
> > + } else {
> > + len = XY_FAST_COLOR_BLT_DW;
> > + if (GRAPHICS_VERx100(gt->xe) < 1250)
> > + len = 11;
> > +
> > + *cs++ = XY_FAST_COLOR_BLT_CMD | XY_FAST_COLOR_BLT_DEPTH_32 |
> > + (len - 2);
> > + *cs++ = FIELD_PREP(XY_FAST_COLOR_BLT_MOCS_MASK, mocs) |
> > + (pitch - 1);
> > *cs++ = 0;
> > + *cs++ = (size / pitch) << 16 | pitch / 4;
> > + *cs++ = lower_32_bits(src_ofs);
> > + *cs++ = upper_32_bits(src_ofs);
> > + *cs++ = (is_vram ? 0x0 : 0x1) << XY_FAST_COLOR_BLT_MEM_TYPE_SHIFT;
> > + *cs++ = value;
> > *cs++ = 0;
> > *cs++ = 0;
> > *cs++ = 0;
> > +
> > + if (len > 11) {
> > + *cs++ = 0;
> > + *cs++ = 0;
> > + *cs++ = 0;
> > + *cs++ = 0;
> > + *cs++ = 0;
> > + }
> > }
> >
> > XE_BUG_ON(cs - bb->cs != len + bb->len);
> > diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
> > index c4d9fd2e7b2b..e555f13395ab 100644
> > --- a/drivers/gpu/drm/xe/xe_pci.c
> > +++ b/drivers/gpu/drm/xe/xe_pci.c
> > @@ -72,6 +72,8 @@ struct xe_device_desc {
> > bool has_4tile;
> > bool has_range_tlb_invalidation;
> > bool has_asid;
> > +
> > + bool has_link_copy_engine;
>
> I'm not sure if we should tie the availability of
> MEM_SET instruction to the fact that the platform has link copy engines.
> It seems to be an instruction to carry forward regardless of the type of
> the copy engine.
>
> +Matt Roper for opinion here
>
> Lucas De Marchi
>
> > };
> >
> > __diag_push();
> > @@ -224,6 +226,7 @@ static const struct xe_device_desc pvc_desc = {
> > .vm_max_level = 4,
> > .supports_usm = true,
> > .has_asid = true,
> > + .has_link_copy_engine = true,
> > };
> >
> > #define MTL_MEDIA_ENGINES \
> > @@ -413,6 +416,7 @@ static int xe_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
> > xe->info.has_flat_ccs = desc->has_flat_ccs;
> > xe->info.has_4tile = desc->has_4tile;
> > xe->info.has_range_tlb_invalidation = desc->has_range_tlb_invalidation;
> > + xe->info.has_link_copy_engine = desc->has_link_copy_engine;
> >
> > spd = subplatform_get(xe, desc);
> > xe->info.subplatform = spd ? spd->subplatform : XE_SUBPLATFORM_NONE;
> > --
> > 2.34.1
> >
next prev parent reply other threads:[~2023-03-08 6:28 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-07 8:09 [Intel-xe] [PATCH v2] drm/xe: Skip XY_FAST_COLOR instruction on link copy engines Balasubramani Vivekanandan
2023-03-07 8:32 ` [Intel-xe] ✓ CI.Patch_applied: success for drm/xe: Skip XY_FAST_COLOR instruction on link copy engines (rev2) Patchwork
2023-03-07 8:33 ` [Intel-xe] ✓ CI.KUnit: " Patchwork
2023-03-07 8:37 ` [Intel-xe] ✓ CI.Build: " Patchwork
2023-03-07 9:06 ` [Intel-xe] [PATCH v2] drm/xe: Skip XY_FAST_COLOR instruction on link copy engines Lucas De Marchi
2023-03-08 0:52 ` Matt Roper
2023-03-08 6:28 ` Balasubramani Vivekanandan [this message]
2023-03-07 16:17 ` Maarten Lankhorst
2023-03-08 7:23 ` Balasubramani Vivekanandan
2023-03-08 1:28 ` Niranjana Vishwanathapura
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZAgq/3Rh3fb2iyRD@bvivekan-mobl \
--to=balasubramani.vivekanandan@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=lucas.demarchi@intel.com \
--cc=matthew.d.roper@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.