Re: [PATCH] drm/xe/pf: Allow to lock/unlock the PF

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Michal Wajdeczko <michal.wajdeczko@intel.com>
To: Matthew Brost <matthew.brost@intel.com>
Cc: <intel-xe@lists.freedesktop.org>,
	Nareshkumar Gollakoti <naresh.kumar.g@intel.com>,
	Christoph Manszewski <christoph.manszewski@intel.com>,
	Maciej Patelczyk <maciej.patelczyk@intel.com>
Subject: Re: [PATCH] drm/xe/pf: Allow to lock/unlock the PF
Date: Wed, 29 Oct 2025 09:14:59 +0100	[thread overview]
Message-ID: <cdae237e-a4d7-4053-8b37-2164a1b2ea31@intel.com> (raw)
In-Reply-To: <aQFuXW5WUeUu1f6r@lstrano-desk.jf.intel.com>



On 10/29/2025 2:31 AM, Matthew Brost wrote:
> On Tue, Oct 28, 2025 at 09:05:21PM +0100, Michal Wajdeczko wrote:
>> Some driver functionalities, like eudebug or ccs-mode, can't
>> be used when VFs are enabled.  Add functions to allow locking
>> the PF functionality for exclusive usage (either for enabling
>> VFs or to enable those other features, or simply for testing).
>> Add also debugfs attributes to explicitly call those functions
>> if needed.
>>
> 
> Hmm, I'm not sure about this. Why not just lock the SR-IOV master mutex
> in pf_enable_vfs? If the reason is that lockdep blows up — for example,
> if the master mutex is annotated with __reclaim and pf_enable_vfs
> allocates memory — then you still have a potential deadlock; you've just
> silenced lockdep. I'm not certain that's the case, just using it as an
> example.
> 
> Given that, I'd lean toward saying no — this really, really looks
> unsafe. If you'd like, get a second opinion from a locking expert (e.g.,
> Thomas), but I think this is a no from me.

looks that more background info is needed here

this "lock/unlock" is not to protect any PF structures/data, as for this
we have master_mutex, but to allow other components, like mentioned
above eudebug & ccs-mode, to block PF from enabling VFs while that other
feature is running or making incompatible with VFs changes, see [1] [2]

in this patch, the PF is trying to "lock" itself when enabling VFs.
if other component (or here debugfs) already locked the PF, then PF will
not enable any VFs, thus will not break this any other feature.

it is expected that other components, will follow the same flow, so they
will first start with pf_try_lock and then either abort it's enabling, as
PF is already running, or call pf_unlock when they are done.

however there is one open that we might need to solve: what if there are
more such non-PF compatible features that would like to run in parallel

with this trivial approach, only eudebug or ccs-mode will be able to run

if that's not sufficient, then we can switch to use rw_semaphore, or
maybe, if that would be cleaner/safer, use that from the beginning ?

then we will have:

components:
	xe_sriov_pf_try_lock		--> down_read_trylock
	xe_sriov_pf_unlock		--> up_read

PF internals:
	__xe_sriov_pf_try_lock(write)	--> down_write_trylock
	__xe_sriov_pf_unblock(write)	--> up_write

[1] https://patchwork.freedesktop.org/patch/667725/?series=152682&rev=1
[2] https://patchwork.freedesktop.org/patch/681266/?series=154538&rev=6

> 
> Matt
> 
>> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
>> Cc: Nareshkumar Gollakoti <naresh.kumar.g@intel.com>
>> Cc: Christoph Manszewski <christoph.manszewski@intel.com>
>> Cc: Maciej Patelczyk <maciej.patelczyk@intel.com>
>> ---
>>  drivers/gpu/drm/xe/xe_pci_sriov.c        |  7 +++++
>>  drivers/gpu/drm/xe/xe_sriov_pf.c         | 38 ++++++++++++++++++++++++
>>  drivers/gpu/drm/xe/xe_sriov_pf.h         |  4 +++
>>  drivers/gpu/drm/xe/xe_sriov_pf_debugfs.c | 15 ++++++++++
>>  drivers/gpu/drm/xe/xe_sriov_pf_types.h   |  3 ++
>>  5 files changed, 67 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_pci_sriov.c b/drivers/gpu/drm/xe/xe_pci_sriov.c
>> index 735f51effc7a..e1d34860b064 100644
>> --- a/drivers/gpu/drm/xe/xe_pci_sriov.c
>> +++ b/drivers/gpu/drm/xe/xe_pci_sriov.c
>> @@ -120,6 +120,10 @@ static int pf_enable_vfs(struct xe_device *xe, int num_vfs)
>>  	if (err)
>>  		goto out;
>>  
>> +	err = xe_sriov_pf_try_lock(xe);
>> +	if (err)
>> +		goto out;
>> +
>>  	/*
>>  	 * We must hold additional reference to the runtime PM to keep PF in D0
>>  	 * during VFs lifetime, as our VFs do not implement the PM capability.
>> @@ -157,6 +161,7 @@ static int pf_enable_vfs(struct xe_device *xe, int num_vfs)
>>  failed:
>>  	xe_sriov_pf_unprovision_vfs(xe, num_vfs);
>>  	xe_pm_runtime_put(xe);
>> +	xe_sriov_pf_unlock(xe);
>>  out:
>>  	xe_sriov_notice(xe, "Failed to enable %u VF%s (%pe)\n",
>>  			num_vfs, str_plural(num_vfs), ERR_PTR(err));
>> @@ -186,6 +191,8 @@ static int pf_disable_vfs(struct xe_device *xe)
>>  	/* not needed anymore - see pf_enable_vfs() */
>>  	xe_pm_runtime_put(xe);
>>  
>> +	xe_sriov_pf_unlock(xe);
>> +
>>  	xe_sriov_info(xe, "Disabled %u VF%s\n", num_vfs, str_plural(num_vfs));
>>  	return 0;
>>  }
>> diff --git a/drivers/gpu/drm/xe/xe_sriov_pf.c b/drivers/gpu/drm/xe/xe_sriov_pf.c
>> index bc1ab9ee31d9..8cdd25db2cf9 100644
>> --- a/drivers/gpu/drm/xe/xe_sriov_pf.c
>> +++ b/drivers/gpu/drm/xe/xe_sriov_pf.c
>> @@ -157,6 +157,44 @@ int xe_sriov_pf_wait_ready(struct xe_device *xe)
>>  	return 0;
>>  }
>>  
>> +/**
>> + * xe_sriov_pf_try_lock() - Try to lock the PF.
>> + * @xe: the PF &xe_device
>> + *
>> + * This function can only be called on PF.
>> + *
>> + * Return: 0 on success or a negative error code on failure.
>> + */
>> +int xe_sriov_pf_try_lock(struct xe_device *xe)
>> +{
>> +	guard(mutex)(xe_sriov_pf_master_mutex(xe));
>> +
>> +	if (xe->sriov.pf.owner) {
>> +		xe_sriov_dbg(xe, "already locked by %ps\n", xe->sriov.pf.owner);
>> +		return -EBUSY;
>> +	}
>> +
>> +	xe->sriov.pf.owner = __builtin_return_address(0);
>> +	xe_sriov_dbg_verbose(xe, "locked by %ps\n", xe->sriov.pf.owner);
>> +
>> +	return 0;
>> +}
>> +
>> +/**
>> + * xe_sriov_pf_unlock() - Unlock the PF.
>> + * @xe: the PF &xe_device
>> + *
>> + * This function can only be called on PF.
>> + */
>> +void xe_sriov_pf_unlock(struct xe_device *xe)
>> +{
>> +	guard(mutex)(xe_sriov_pf_master_mutex(xe));
>> +
>> +	xe_assert(xe, xe->sriov.pf.owner);
>> +	xe_sriov_dbg_verbose(xe, "unlocked by %ps\n", __builtin_return_address(0));
>> +	xe->sriov.pf.owner = NULL;
>> +}
>> +
>>  /**
>>   * xe_sriov_pf_print_vfs_summary - Print SR-IOV PF information.
>>   * @xe: the &xe_device to print info from
>> diff --git a/drivers/gpu/drm/xe/xe_sriov_pf.h b/drivers/gpu/drm/xe/xe_sriov_pf.h
>> index cba3fde9581f..2261596bb4fe 100644
>> --- a/drivers/gpu/drm/xe/xe_sriov_pf.h
>> +++ b/drivers/gpu/drm/xe/xe_sriov_pf.h
>> @@ -17,11 +17,15 @@ bool xe_sriov_pf_readiness(struct xe_device *xe);
>>  int xe_sriov_pf_init_early(struct xe_device *xe);
>>  int xe_sriov_pf_init_late(struct xe_device *xe);
>>  int xe_sriov_pf_wait_ready(struct xe_device *xe);
>> +int xe_sriov_pf_try_lock(struct xe_device *xe);
>> +void xe_sriov_pf_unlock(struct xe_device *xe);
>>  void xe_sriov_pf_print_vfs_summary(struct xe_device *xe, struct drm_printer *p);
>>  #else
>>  static inline bool xe_sriov_pf_readiness(struct xe_device *xe) { return false; }
>>  static inline int xe_sriov_pf_init_early(struct xe_device *xe) { return 0; }
>>  static inline int xe_sriov_pf_init_late(struct xe_device *xe) { return 0; }
>> +int xe_sriov_pf_try_lock(struct xe_device *xe) { return 0; }
>> +void xe_sriov_pf_unlock(struct xe_device *xe) { }
>>  #endif
>>  
>>  #endif
>> diff --git a/drivers/gpu/drm/xe/xe_sriov_pf_debugfs.c b/drivers/gpu/drm/xe/xe_sriov_pf_debugfs.c
>> index a81aa05c5532..7c011462244d 100644
>> --- a/drivers/gpu/drm/xe/xe_sriov_pf_debugfs.c
>> +++ b/drivers/gpu/drm/xe/xe_sriov_pf_debugfs.c
>> @@ -96,12 +96,27 @@ static inline int xe_sriov_pf_restore_auto_provisioning(struct xe_device *xe)
>>  	return xe_sriov_pf_provision_set_mode(xe, XE_SRIOV_PROVISIONING_MODE_AUTO);
>>  }
>>  
>> +static inline int xe_sriov_pf_try_lock_pf(struct xe_device *xe)
>> +{
>> +	return xe_sriov_pf_try_lock(xe);
>> +}
>> +
>> +static inline int xe_sriov_pf_force_unlock_pf(struct xe_device *xe)
>> +{
>> +	xe_sriov_pf_unlock(xe);
>> +	return 0;
>> +}
>> +
>>  DEFINE_SRIOV_ATTRIBUTE(restore_auto_provisioning);
>> +DEFINE_SRIOV_ATTRIBUTE(try_lock_pf);
>> +DEFINE_SRIOV_ATTRIBUTE(force_unlock_pf);
>>  
>>  static void pf_populate_root(struct xe_device *xe, struct dentry *dent)
>>  {
>>  	debugfs_create_file("restore_auto_provisioning", 0200, dent, xe,
>>  			    &restore_auto_provisioning_fops);
>> +	debugfs_create_file("try_lock_pf", 0200, dent, xe, &try_lock_pf_fops);
>> +	debugfs_create_file("force_unlock_pf", 0200, dent, xe, &force_unlock_pf_fops);
>>  }
>>  
>>  static int simple_show(struct seq_file *m, void *data)
>> diff --git a/drivers/gpu/drm/xe/xe_sriov_pf_types.h b/drivers/gpu/drm/xe/xe_sriov_pf_types.h
>> index c753cd59aed2..91da3c979922 100644
>> --- a/drivers/gpu/drm/xe/xe_sriov_pf_types.h
>> +++ b/drivers/gpu/drm/xe/xe_sriov_pf_types.h
>> @@ -36,6 +36,9 @@ struct xe_device_pf {
>>  	/** @master_lock: protects all VFs configurations across GTs */
>>  	struct mutex master_lock;
>>  
>> +	/** @owner: the RET_IP of the owner who locked the PF */
>> +	void *owner;
>> +
>>  	/** @provision: device level provisioning data. */
>>  	struct xe_sriov_pf_provision provision;
>>  
>> -- 
>> 2.47.1
>>

next prev parent reply	other threads:[~2025-10-29  8:15 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-28 20:05 [PATCH] drm/xe/pf: Allow to lock/unlock the PF Michal Wajdeczko
2025-10-28 22:19 ` ✓ CI.KUnit: success for " Patchwork
2025-10-28 22:57 ` ✓ Xe.CI.BAT: " Patchwork
2025-10-29  1:31 ` [PATCH] " Matthew Brost
2025-10-29  8:14   ` Michal Wajdeczko [this message]
2025-10-29 11:02     ` Manszewski, Christoph
2025-10-29  1:34 ` Matthew Brost
2025-10-29 10:28 ` ✗ Xe.CI.Full: failure for " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cdae237e-a4d7-4053-8b37-2164a1b2ea31@intel.com \
    --to=michal.wajdeczko@intel.com \
    --cc=christoph.manszewski@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=maciej.patelczyk@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=naresh.kumar.g@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox