Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Brost <matthew.brost@intel.com>
To: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: <intel-xe@lists.freedesktop.org>, <thomas.hellstrom@linux.intel.com>
Subject: Re: [PATCH 2/2] drm/xe/pf: Expose access to the VF GGTT PTEs over debugfs
Date: Tue, 5 Nov 2024 09:26:59 -0800	[thread overview]
Message-ID: <ZypVY0DtjRgRWiL9@lstrano-desk.jf.intel.com> (raw)
In-Reply-To: <9928fbe6-c874-404c-82f0-3c0db4d6eea6@intel.com>

On Tue, Nov 05, 2024 at 05:41:40PM +0100, Michal Wajdeczko wrote:
> 
> 
> On 05.11.2024 02:14, Matthew Brost wrote:
> > On Sun, Nov 03, 2024 at 09:16:33PM +0100, Michal Wajdeczko wrote:
> >> For feature enabling and testing purposes, allow to capture and
> >> replace VF's GGTT PTEs data using debugfs blob file.
> >>
> >> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> >> ---
> >>  drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c | 62 +++++++++++++++++++++
> >>  1 file changed, 62 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c
> >> index 05df4ab3514b..69ba830d9e8d 100644
> >> --- a/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c
> >> +++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c
> >> @@ -11,6 +11,7 @@
> >>  #include "xe_bo.h"
> >>  #include "xe_debugfs.h"
> >>  #include "xe_device.h"
> >> +#include "xe_ggtt.h"
> >>  #include "xe_gt.h"
> >>  #include "xe_gt_debugfs.h"
> >>  #include "xe_gt_sriov_pf_config.h"
> >> @@ -497,6 +498,64 @@ static const struct file_operations config_blob_ops = {
> >>  	.llseek		= default_llseek,
> >>  };
> >>  
> >> +/*
> >> + *      /sys/kernel/debug/dri/0/
> >> + *      ├── gt0
> >> + *      │   ├── vf1
> >> + *      │   │   ├── ggtt_raw
> >> + */
> >> +
> >> +static ssize_t ggtt_raw_read(struct file *file, char __user *buf,
> >> +			     size_t count, loff_t *pos)
> >> +{
> >> +	struct dentry *dent = file_dentry(file);
> >> +	struct dentry *parent = dent->d_parent;
> >> +	unsigned int vfid = extract_vfid(parent);
> >> +	struct xe_gt *gt = extract_gt(parent);
> >> +	struct xe_device *xe = gt_to_xe(gt);
> >> +	ssize_t ret;
> >> +
> >> +	xe_pm_runtime_get(xe);
> >> +	mutex_lock(xe_gt_sriov_pf_master_mutex(gt));
> > 
> > + Thomas to confirm I'm making sense here.
> > 
> > So this relates to this patch [1] / Thomas comment [2].
> > 
> > You are adding memory allocations here under the
> > xe_gt_sriov_pf_master_mutex which renders [1] incomplete.
> 
> I was assuming that using GFP_NOWAIT and then on fail having a fallback
> to fixed 64B local chunk is fine, no?
> 

Yes. I realized after I typed this you use GFP_NOWAIT which cannot
trigger reclaim so this is indeed fine.

> > 
> > So you need to one of two things:
> > 
> > 1. Never do any memory allocations under xe_gt_sriov_pf_master_mutex. If
> > you choose this option taint this mutex with reclaim when loading the
> > PF. It is then safe to xe_gt_sriov_pf_master_mutex in suspend / resume /
> > reset flows.
> 
> well, due to lack of [1] there are still some allocations done during
> sending a VF config to the GuC, but hopefully we can mitigate that soon
> 

Yea I see one in pf_push_full_vf_config using GFP_KERNEL which would
problematic.

> but what I found recently is that due to recent GGTT refactoring, the
> xe_ggtt_node is now allocated (with GFP_NOFS) flag under that mutex,
> which may require another round of fixes
> 

Yep, if a xe_ggtt_node is allocated with GFP_NOFS that would also be
problematic. If you could preallocate the node outside of
xe_gt_sriov_pf_master_mutex that might work?

> > 
> > 2. Remove xe_gt_sriov_pf_master_mutex from suspend / resume / reset
> > flows.
> 
> reprovisioning (sending VFs configs to GuC) is only done as one of the
> final reset steps, and as long it's there it will require that mutex
> 
> alternate option would be to decouple reprovisioning to an async worker
> triggered from the reset, will take a look at this
> 

This might be the safest option as keeping memory allocations outside of
a large mutex can be difficut to maintain but if taint the mutex with
reclaim we'd presumably catch such issues immediately. No idea if an
async worker here would creates another set problems though.

Matt 

> > 
> > In addition to above, also never allocate memory in suspend / resume /
> > reset flows.
> > 
> > Not blocker here but just using this as an example to explain the
> > current SRIOV locking problems. Hope this helps.
> > 
> > Matt
> > 
> > [1] https://patchwork.freedesktop.org/patch/619024/?series=139801&rev=1
> > [2] https://lore.kernel.org/intel-xe/3e13401972fd49240f486fd7d47580e576794c78.camel@intel.com/
> > 
> >> +
> >> +	ret = xe_ggtt_node_read(gt->sriov.pf.vfs[vfid].config.ggtt_region,
> >> +				buf, count, pos);
> >> +
> >> +	mutex_unlock(xe_gt_sriov_pf_master_mutex(gt));
> >> +	xe_pm_runtime_put(xe);
> >> +
> >> +	return ret;
> >> +}
> >> +
> >> +static ssize_t ggtt_raw_write(struct file *file, const char __user *buf,
> >> +			      size_t count, loff_t *pos)
> >> +{
> >> +	struct dentry *dent = file_dentry(file);
> >> +	struct dentry *parent = dent->d_parent;
> >> +	unsigned int vfid = extract_vfid(parent);
> >> +	struct xe_gt *gt = extract_gt(parent);
> >> +	struct xe_device *xe = gt_to_xe(gt);
> >> +	ssize_t ret;
> >> +
> >> +	xe_pm_runtime_get(xe);
> >> +	mutex_lock(xe_gt_sriov_pf_master_mutex(gt));
> >> +
> >> +	ret = xe_ggtt_node_write(gt->sriov.pf.vfs[vfid].config.ggtt_region,
> >> +				 buf, count, pos);
> >> +
> >> +	mutex_unlock(xe_gt_sriov_pf_master_mutex(gt));
> >> +	xe_pm_runtime_put(xe);
> >> +
> >> +	return ret;
> >> +}
> >> +
> >> +static const struct file_operations ggtt_raw_ops = {
> >> +	.owner		= THIS_MODULE,
> >> +	.read		= ggtt_raw_read,
> >> +	.write		= ggtt_raw_write,
> >> +	.llseek		= default_llseek,
> >> +};
> >> +
> >>  /**
> >>   * xe_gt_sriov_pf_debugfs_register - Register SR-IOV PF specific entries in GT debugfs.
> >>   * @gt: the &xe_gt to register
> >> @@ -554,6 +613,9 @@ void xe_gt_sriov_pf_debugfs_register(struct xe_gt *gt, struct dentry *root)
> >>  			debugfs_create_file("config_blob",
> >>  					    IS_ENABLED(CONFIG_DRM_XE_DEBUG_SRIOV) ? 0600 : 0400,
> >>  					    vfdentry, NULL, &config_blob_ops);
> >> +			debugfs_create_file("ggtt_raw",
> >> +					    IS_ENABLED(CONFIG_DRM_XE_DEBUG_SRIOV) ? 0600 : 0400,
> >> +					    vfdentry, NULL, &ggtt_raw_ops);
> >>  		}
> >>  	}
> >>  }
> >> -- 
> >> 2.43.0
> >>
> 

  reply	other threads:[~2024-11-05 17:26 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-03 20:16 [PATCH 0/2] PF: Expose access to the VF GGTT PTEs over debugfs Michal Wajdeczko
2024-11-03 20:16 ` [PATCH 1/2] drm/xe: Add read/write debugfs helpers for GGTT node Michal Wajdeczko
2024-11-05  0:28   ` Lis, Tomasz
2024-11-05  0:49   ` Matthew Brost
2024-11-05 15:25     ` Michal Wajdeczko
2024-11-05 17:39       ` Matthew Brost
2024-11-05 18:57         ` Matthew Brost
2024-11-03 20:16 ` [PATCH 2/2] drm/xe/pf: Expose access to the VF GGTT PTEs over debugfs Michal Wajdeczko
2024-11-05  0:28   ` Lis, Tomasz
2024-11-05  1:14   ` Matthew Brost
2024-11-05 16:41     ` Michal Wajdeczko
2024-11-05 17:26       ` Matthew Brost [this message]
2024-11-03 20:21 ` ✓ CI.Patch_applied: success for PF: " Patchwork
2024-11-03 20:21 ` ✓ CI.checkpatch: " Patchwork
2024-11-03 20:22 ` ✓ CI.KUnit: " Patchwork
2024-11-03 20:34 ` ✓ CI.Build: " Patchwork
2024-11-03 20:36 ` ✓ CI.Hooks: " Patchwork
2024-11-03 20:38 ` ✓ CI.checksparse: " Patchwork
2024-11-03 20:57 ` ✓ CI.BAT: " Patchwork
2024-11-03 21:59 ` ✗ CI.FULL: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZypVY0DtjRgRWiL9@lstrano-desk.jf.intel.com \
    --to=matthew.brost@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=michal.wajdeczko@intel.com \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox