Re: [Intel-xe] [PATCH v2 3/3] drm/xe: Avoid any races around ccs_mode update

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Simona Vetter <simona.vetter@ffwll.ch>
To: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Cc: intel-xe@lists.freedesktop.org
Subject: Re: [Intel-xe] [PATCH v2 3/3] drm/xe: Avoid any races around ccs_mode update
Date: Wed, 4 Sep 2024 12:28:47 +0200	[thread overview]
Message-ID: <Ztg2X0CpZockAe9w@phenom.ffwll.local> (raw)
In-Reply-To: <20231129025716.9094-4-niranjana.vishwanathapura@intel.com>

On Tue, Nov 28, 2023 at 06:57:16PM -0800, Niranjana Vishwanathapura wrote:
> Ensure that there are no drm clients when changing CCS mode.
> Allow exec_queue creation only with enabled CCS engines.
> 
> v2: Rebase
> 
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_device.c       |  9 +++++++++
>  drivers/gpu/drm/xe/xe_device_types.h |  9 +++++++++
>  drivers/gpu/drm/xe/xe_gt_ccs_mode.c  | 10 ++++++++++
>  3 files changed, 28 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> index 54202623e255..c7134d377b4c 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -72,6 +72,10 @@ static int xe_file_open(struct drm_device *dev, struct drm_file *file)
>  	mutex_init(&xef->exec_queue.lock);
>  	xa_init_flags(&xef->exec_queue.xa, XA_FLAGS_ALLOC1);
>  
> +	spin_lock(&xe->clients.lock);
> +	xe->clients.count++;
> +	spin_unlock(&xe->clients.lock);
> +
>  	file->driver_priv = xef;
>  	return 0;
>  }
> @@ -104,6 +108,10 @@ static void xe_file_close(struct drm_device *dev, struct drm_file *file)
>  	xa_destroy(&xef->vm.xa);
>  	mutex_destroy(&xef->vm.lock);
>  
> +	spin_lock(&xe->clients.lock);
> +	xe->clients.count--;
> +	spin_unlock(&xe->clients.lock);
> +
>  	xe_drm_client_put(xef->client);
>  	kfree(xef);
>  }
> @@ -226,6 +234,7 @@ struct xe_device *xe_device_create(struct pci_dev *pdev,
>  	xe->info.force_execlist = xe_modparam.force_execlist;
>  
>  	spin_lock_init(&xe->irq.lock);
> +	spin_lock_init(&xe->clients.lock);
>  
>  	init_waitqueue_head(&xe->ufence_wq);
>  
> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
> index 2712905c7a91..12cb1caacda8 100644
> --- a/drivers/gpu/drm/xe/xe_device_types.h
> +++ b/drivers/gpu/drm/xe/xe_device_types.h
> @@ -306,6 +306,15 @@ struct xe_device {
>  		enum xe_sriov_mode __mode;
>  	} sriov;
>  
> +	/** @clients: drm clients info */
> +	struct {
> +		/** @lock: Protects drm clients info */
> +		spinlock_t lock;
> +
> +		/** @count: number of drm clients */
> +		u64 count;
> +	} clients;
> +
>  	/** @usm: unified memory state */
>  	struct {
>  		/** @asid: convert a ASID to VM */
> diff --git a/drivers/gpu/drm/xe/xe_gt_ccs_mode.c b/drivers/gpu/drm/xe/xe_gt_ccs_mode.c
> index 0ad97383aedd..303ffb0761d8 100644
> --- a/drivers/gpu/drm/xe/xe_gt_ccs_mode.c
> +++ b/drivers/gpu/drm/xe/xe_gt_ccs_mode.c
> @@ -90,6 +90,7 @@ ccs_mode_store(struct device *kdev, struct device_attribute *attr,
>  	       const char *buff, size_t count)
>  {
>  	struct xe_gt *gt = kobj_to_gt(&kdev->kobj);
> +	struct xe_device *xe = gt_to_xe(gt);
>  	u32 num_engines, num_slices;
>  	int ret;
>  
> @@ -108,12 +109,21 @@ ccs_mode_store(struct device *kdev, struct device_attribute *attr,
>  		return -EINVAL;
>  	}
>  
> +	/* CCS mode can only be updated when there are no drm clients */
> +	spin_lock(&xe->clients.lock);
> +	if (xe->clients.count) {
> +		spin_unlock(&xe->clients.lock);
> +		return -EBUSY;
> +	}
> +
>  	if (gt->ccs_mode.num_engines != num_engines) {
>  		xe_gt_info(gt, "Setting compute mode to %d\n", num_engines);
>  		gt->ccs_mode.num_engines = num_engines;
>  		xe_gt_reset_async(gt);
>  	}
>  
> +	spin_unlock(&xe->clients.lock);

I pinged Rodrigo on #xe irc, but I think better I also drop this in a mail
reply. I think the patch doesn't yet reach it's goal of closing all races
here:

- xe_gt_reset_async() doesn't schedule a reset if there's one pending, so
  I htink you have a race here where you want a function that waits for
  any pending reset to finish, and _then_ issues a new one.

- The reset code does not have any protection around reading ->ccs_mode,
  so might get different values at different times. That's no good. You
  need to make sure you're only reading this once, and then using it
  everywhere. Simplest fix would be a gt->ccs_mode.num_engines_stage,
  which you WRITE_ONCE here, and then at the top of the reset code you do

	gt->ccs_mode.num_engines = READ_ONCE(gt->ccs_mode.num_engines_stage);

  But if you go with this you need to wait for the reset to finish while
  holding the lock or there's new races.

- Because you use the ordered wq there's no need to flush anything that's
  already queued. I think it'd still be good to make sure that drm/sched
  absolutely does not reinsert anything, or some other assert to make sure
  it's all clean would imo be adequate paranoia levels.

- Also for paranoia I'd make this a mutex and just block for the reset to
  finish. It shouldn't cost and it will stop any potential issues if the
  reset is still ongoing. Shouldn't due to the ordered wq, but this is so
  race and fallout hard to catch I think it's better to be absolutely
  safe.

Cheers, Sima

> +
>  	return count;
>  }
>  
> -- 
> 2.21.0.rc0.32.g243a4c7e27
> 

-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

next prev parent reply	other threads:[~2024-09-04 10:28 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-29  2:57 [Intel-xe] [PATCH v2 0/3] drm/xe: Enable fixed CCS mode Niranjana Vishwanathapura
2023-11-29  2:57 ` [Intel-xe] [PATCH v2 1/3] drm/xe: Enable Fixed CCS mode setting Niranjana Vishwanathapura
2023-12-01 20:52   ` Andi Shyti
2023-12-02  6:05     ` Niranjana Vishwanathapura
2023-11-29  2:57 ` [Intel-xe] [PATCH v2 2/3] drm/xe: Allow userspace to configure CCS mode Niranjana Vishwanathapura
2023-12-01 20:57   ` Andi Shyti
2023-12-02  5:48     ` Niranjana Vishwanathapura
2023-11-29  2:57 ` [Intel-xe] [PATCH v2 3/3] drm/xe: Avoid any races around ccs_mode update Niranjana Vishwanathapura
2023-12-01 20:59   ` Andi Shyti
2023-12-02  2:37     ` Niranjana Vishwanathapura
2024-09-04 10:28   ` Simona Vetter [this message]
2023-11-29  4:50 ` [Intel-xe] ✓ CI.Patch_applied: success for drm/xe: Enable fixed CCS mode Patchwork
2023-11-29  4:51 ` [Intel-xe] ✗ CI.checkpatch: warning " Patchwork
2023-11-29  4:52 ` [Intel-xe] ✓ CI.KUnit: success " Patchwork
2023-11-29  4:59 ` [Intel-xe] ✓ CI.Build: " Patchwork
2023-11-29  4:59 ` [Intel-xe] ✓ CI.Hooks: " Patchwork
2023-11-29  5:01 ` [Intel-xe] ✓ CI.checksparse: " Patchwork
2023-11-29  5:34 ` [Intel-xe] ✓ CI.BAT: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Ztg2X0CpZockAe9w@phenom.ffwll.local \
    --to=simona.vetter@ffwll.ch \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=niranjana.vishwanathapura@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox