All of lore.kernel.org
 help / color / mirror / Atom feed
From: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
To: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>, intel-xe@lists.freedesktop.org
Subject: Re: [Intel-xe] [PATCH v2] drm/xe: Skip XY_FAST_COLOR instruction on link copy engines
Date: Wed, 8 Mar 2023 11:58:15 +0530	[thread overview]
Message-ID: <ZAgq/3Rh3fb2iyRD@bvivekan-mobl> (raw)
In-Reply-To: <20230307090643.z5ncdvt6vmyimosf@ldmartin-desk2.lan>

On 07.03.2023 01:06, Lucas De Marchi wrote:
> On Tue, Mar 07, 2023 at 01:39:16PM +0530, Balasubramani Vivekanandan wrote:
> > Link copy engines doesn't support the XY_FAST_COLOR instruction.
> > Currently this instruction is used only at one place to clear a ttm
> > resource while migrating a BO.
> > A new device_info member is created to know if a platform has link copy
> > engine. If it supports, then instead of using XY_FAST_COLOR instruction,
> > MEM_SET is used which is available both in main and link copy engines.
> > 
> > BSpec: 68433
> > 
> > Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
> > ---
> > drivers/gpu/drm/xe/regs/xe_gpu_commands.h |  9 ++++
> > drivers/gpu/drm/xe/xe_device_types.h      |  2 +
> > drivers/gpu/drm/xe/xe_migrate.c           | 65 ++++++++++++++++-------
> > drivers/gpu/drm/xe/xe_pci.c               |  4 ++
> > 4 files changed, 60 insertions(+), 20 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/regs/xe_gpu_commands.h b/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
> > index 288576035ce3..df9ed4fbf2bf 100644
> > --- a/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
> > +++ b/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
> > @@ -6,6 +6,8 @@
> > #ifndef _XE_GPU_COMMANDS_H_
> > #define _XE_GPU_COMMANDS_H_
> > 
> > +#include "regs/xe_reg_defs.h"
> 
> for the uses we are making here, it should suffice to
> include linux/bitmap.h
This file is using the macro REG_BIT(), definition of which is included
via the file regs/xe_reg_defs.h . So I had to include regs/xe_reg_defs.h

Though REG_BIT() was preexisting in xe_gpu_commands.h, this patch
uncovered the missing inclusion of regs/xe_reg_defs.h

> 
> > +
> > #define INSTR_CLIENT_SHIFT      29
> > #define   INSTR_MI_CLIENT       0x0
> > #define __INSTR(client) ((client) << INSTR_CLIENT_SHIFT)
> > @@ -56,6 +58,13 @@
> > #define GEN9_XY_FAST_COPY_BLT_CMD	(2 << 29 | 0x42 << 22)
> > #define   BLT_DEPTH_32			(3<<24)
> > 
> > +#define	PVC_MEM_SET_CMD		(2 << 29 | 0x5b << 22)
> > +#define   PVC_MEM_SET_CMD_LEN_DW	7
> > +#define   PVC_MS_MATRIX			REG_BIT(17)
> > +/* Bspec lists field as [6:0], but index alone is from [6:1] */
> > +#define   PVC_MS_MOCS_INDEX_MASK	GENMASK(6, 1)
> > +#define   PVC_MS_DATA_FIELD		GENMASK(31, 24)
> 
> wrong order. Should be defining the most significative bits first,
> like they appear in bspec.
I will reorder the definitions in the order of dword, followed by most
significant bits of the dword.

> 
> > +
> > #define GFX_OP_PIPE_CONTROL(len)	((0x3<<29)|(0x3<<27)|(0x2<<24)|((len)-2))
> > #define   PIPE_CONTROL_TILE_CACHE_FLUSH			(1<<28)
> > #define   PIPE_CONTROL_AMFS_FLUSH			(1<<25)
> > diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
> > index 199bd37fce9a..a73c5e1d7503 100644
> > --- a/drivers/gpu/drm/xe/xe_device_types.h
> > +++ b/drivers/gpu/drm/xe/xe_device_types.h
> > @@ -95,6 +95,8 @@ struct xe_device {
> > 		bool has_4tile;
> > 		/** @has_range_tlb_invalidation: Has range based TLB invalidations */
> > 		bool has_range_tlb_invalidation;
> > +		/** @has_link_copy_engines: Whether the platform has link copy engines */
> > +		bool has_link_copy_engine;
> > 		/** @enable_display: display enabled */
> > 		bool enable_display;
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
> > index bc69ec17d5ad..59fd588a1faf 100644
> > --- a/drivers/gpu/drm/xe/xe_migrate.c
> > +++ b/drivers/gpu/drm/xe/xe_migrate.c
> > @@ -750,32 +750,57 @@ static int emit_clear(struct xe_gt *gt, struct xe_bb *bb, u64 src_ofs,
> > 		      u32 size, u32 pitch, u32 value, bool is_vram)
> > {
> > 	u32 *cs = bb->cs + bb->len;
> > -	u32 len = XY_FAST_COLOR_BLT_DW;
> > +	u32 len;
> > 	u32 mocs = xe_mocs_index_to_value(gt->mocs.uc_index);
> > +	struct xe_device *xe = gt_to_xe(gt);
> > 
> > -	if (GRAPHICS_VERx100(gt->xe) < 1250)
> > -		len = 11;
> > -
> > -	*cs++ = XY_FAST_COLOR_BLT_CMD | XY_FAST_COLOR_BLT_DEPTH_32 |
> > -		(len - 2);
> > -	*cs++ = FIELD_PREP(XY_FAST_COLOR_BLT_MOCS_MASK, mocs) |
> > -		(pitch - 1);
> > -	*cs++ = 0;
> > -	*cs++ = (size / pitch) << 16 | pitch / 4;
> > -	*cs++ = lower_32_bits(src_ofs);
> > -	*cs++ = upper_32_bits(src_ofs);
> > -	*cs++ = (is_vram ? 0x0 : 0x1) <<  XY_FAST_COLOR_BLT_MEM_TYPE_SHIFT;
> > -	*cs++ = value;
> > -	*cs++ = 0;
> > -	*cs++ = 0;
> > -	*cs++ = 0;
> > -
> > -	if (len > 11) {
> > -		*cs++ = 0;
> > +	if (xe->info.has_link_copy_engine) {
> 
> I wonder if instead of doing this we have here one example of function
> hook that could provide some value. +Francois.
> 
> And take a look at the discussion in 20230303145015.1018870-1-francois.dugast@intel.com
> 
> I wouldn't block this series on that though.
I am very much interested in the proposal. I wish to have such a change
in the driver. Like you said, we can take it up as an improvment.

Regards,
Bala

> 
> 
> > +		/* MEM_SET command supports setting only 8-bit value.
> 
> leave a blank line:
> 
> 		/*
> 		 * MEM_SET ...
> 
> is the more usual style.
> 	
> > +		 * This function is currently used only to clear the address
> > +		 * range.  So the value agrument is not really used. Need to
> > +		 * have a better handling when there is a need to actually set
> > +		 * a value. Print a warning if a value bigger than 8-bit is
> > +		 * passed
> > +		 */
> > +		XE_WARN_ON(value > U8_MAX);
> > +
> > +		len = PVC_MEM_SET_CMD_LEN_DW;
> > +
> > +		*cs++ = PVC_MEM_SET_CMD | PVC_MS_MATRIX |
> > +				(PVC_MEM_SET_CMD_LEN_DW - 2);
> > +		*cs++ = pitch - 1;
> > +		*cs++ = (size / pitch) - 1;
> > +		*cs++ = pitch - 1;
> > +		*cs++ = lower_32_bits(src_ofs);
> > +		*cs++ = upper_32_bits(src_ofs);
> > +		*cs++ = FIELD_PREP(PVC_MS_DATA_FIELD, value) |
> > +				FIELD_PREP(PVC_MS_MOCS_INDEX_MASK, mocs);
> > +	} else {
> > +		len = XY_FAST_COLOR_BLT_DW;
> > +		if (GRAPHICS_VERx100(gt->xe) < 1250)
> > +			len = 11;
> > +
> > +		*cs++ = XY_FAST_COLOR_BLT_CMD | XY_FAST_COLOR_BLT_DEPTH_32 |
> > +			(len - 2);
> > +		*cs++ = FIELD_PREP(XY_FAST_COLOR_BLT_MOCS_MASK, mocs) |
> > +			(pitch - 1);
> > 		*cs++ = 0;
> > +		*cs++ = (size / pitch) << 16 | pitch / 4;
> > +		*cs++ = lower_32_bits(src_ofs);
> > +		*cs++ = upper_32_bits(src_ofs);
> > +		*cs++ = (is_vram ? 0x0 : 0x1) <<  XY_FAST_COLOR_BLT_MEM_TYPE_SHIFT;
> > +		*cs++ = value;
> > 		*cs++ = 0;
> > 		*cs++ = 0;
> > 		*cs++ = 0;
> > +
> > +		if (len > 11) {
> > +			*cs++ = 0;
> > +			*cs++ = 0;
> > +			*cs++ = 0;
> > +			*cs++ = 0;
> > +			*cs++ = 0;
> > +		}
> > 	}
> > 
> > 	XE_BUG_ON(cs - bb->cs != len + bb->len);
> > diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
> > index c4d9fd2e7b2b..e555f13395ab 100644
> > --- a/drivers/gpu/drm/xe/xe_pci.c
> > +++ b/drivers/gpu/drm/xe/xe_pci.c
> > @@ -72,6 +72,8 @@ struct xe_device_desc {
> > 	bool has_4tile;
> > 	bool has_range_tlb_invalidation;
> > 	bool has_asid;
> > +
> > +	bool has_link_copy_engine;
> 
> I'm not sure if we should tie the availability of
> MEM_SET instruction to the fact that the platform has link copy engines.
> It seems to be an instruction to carry forward regardless of the type of
> the copy engine.
> 
> +Matt Roper for opinion here
> 
> Lucas De Marchi
> 
> > };
> > 
> > __diag_push();
> > @@ -224,6 +226,7 @@ static const struct xe_device_desc pvc_desc = {
> > 	.vm_max_level = 4,
> > 	.supports_usm = true,
> > 	.has_asid = true,
> > +	.has_link_copy_engine = true,
> > };
> > 
> > #define MTL_MEDIA_ENGINES \
> > @@ -413,6 +416,7 @@ static int xe_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
> > 	xe->info.has_flat_ccs = desc->has_flat_ccs;
> > 	xe->info.has_4tile = desc->has_4tile;
> > 	xe->info.has_range_tlb_invalidation = desc->has_range_tlb_invalidation;
> > +	xe->info.has_link_copy_engine = desc->has_link_copy_engine;
> > 
> > 	spd = subplatform_get(xe, desc);
> > 	xe->info.subplatform = spd ? spd->subplatform : XE_SUBPLATFORM_NONE;
> > -- 
> > 2.34.1
> > 

  parent reply	other threads:[~2023-03-08  6:28 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-07  8:09 [Intel-xe] [PATCH v2] drm/xe: Skip XY_FAST_COLOR instruction on link copy engines Balasubramani Vivekanandan
2023-03-07  8:32 ` [Intel-xe] ✓ CI.Patch_applied: success for drm/xe: Skip XY_FAST_COLOR instruction on link copy engines (rev2) Patchwork
2023-03-07  8:33 ` [Intel-xe] ✓ CI.KUnit: " Patchwork
2023-03-07  8:37 ` [Intel-xe] ✓ CI.Build: " Patchwork
2023-03-07  9:06 ` [Intel-xe] [PATCH v2] drm/xe: Skip XY_FAST_COLOR instruction on link copy engines Lucas De Marchi
2023-03-08  0:52   ` Matt Roper
2023-03-08  6:28   ` Balasubramani Vivekanandan [this message]
2023-03-07 16:17 ` Maarten Lankhorst
2023-03-08  7:23   ` Balasubramani Vivekanandan
2023-03-08  1:28 ` Niranjana Vishwanathapura

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZAgq/3Rh3fb2iyRD@bvivekan-mobl \
    --to=balasubramani.vivekanandan@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=lucas.demarchi@intel.com \
    --cc=matthew.d.roper@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.