From: Daniel Vetter <daniel@ffwll.ch>
To: Matthew Auld <matthew.auld@intel.com>
Cc: dri-devel@lists.freedesktop.org, intel-gfx@lists.freedesktop.org,
stable@vger.kernel.org, Chris Wilson <chris.p.wilson@intel.com>
Subject: Re: [Intel-gfx] [PATCH] drm/i915/gtt: drop the page table optimisation
Date: Tue, 13 Jul 2021 15:38:16 +0200 [thread overview]
Message-ID: <YO2XSKyCe5FIBwYS@phenom.ffwll.local> (raw)
In-Reply-To: <20210713130431.2392740-1-matthew.auld@intel.com>
On Tue, Jul 13, 2021 at 02:04:31PM +0100, Matthew Auld wrote:
> We skip filling out the pt with scratch entries if the va range covers
> the entire pt, since we later have to fill it with the PTEs for the
> object pages anyway. However this might leave open a small window where
> the PTEs don't point to anything valid for the HW to consume.
>
> When for example using 2M GTT pages this fill_px() showed up as being
> quite significant in perf measurements, and ends up being completely
> wasted since we ignore the pt and just use the pde directly.
>
> Anyway, currently we have our PTE construction split between alloc and
> insert, which is probably slightly iffy nowadays, since the alloc
> doesn't actually allocate anything anymore, instead it just sets up the
> page directories and points the PTEs at the scratch page. Later when we
> do the insert step we re-program the PTEs again. Better might be to
> squash the alloc and insert into a single step, then bringing back this
> optimisation(along with some others) should be possible.
>
> Fixes: 14826673247e ("drm/i915: Only initialize partially filled pagetables")
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Jon Bloomfield <jon.bloomfield@intel.com>
> Cc: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Cc: <stable@vger.kernel.org> # v4.15+
This is some impressively convoluted code, and I'm scared.
But as far as I managed to convince myself, your story here checks out.
Problem will be a bit that this code moved around a _lot_ so we'll need a
lot of dedicated backports :-(
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
> drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 5 +----
> 1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> index 3d02c726c746..6e0e52eeb87a 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> @@ -303,10 +303,7 @@ static void __gen8_ppgtt_alloc(struct i915_address_space * const vm,
> __i915_gem_object_pin_pages(pt->base);
> i915_gem_object_make_unshrinkable(pt->base);
>
> - if (lvl ||
> - gen8_pt_count(*start, end) < I915_PDES ||
> - intel_vgpu_active(vm->i915))
> - fill_px(pt, vm->scratch[lvl]->encode);
> + fill_px(pt, vm->scratch[lvl]->encode);
>
> spin_lock(&pd->lock);
> if (likely(!pd->entry[idx])) {
> --
> 2.26.3
>
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
WARNING: multiple messages have this Message-ID (diff)
From: Daniel Vetter <daniel@ffwll.ch>
To: Matthew Auld <matthew.auld@intel.com>
Cc: dri-devel@lists.freedesktop.org, intel-gfx@lists.freedesktop.org,
stable@vger.kernel.org, Jon Bloomfield <jon.bloomfield@intel.com>,
Chris Wilson <chris.p.wilson@intel.com>
Subject: Re: [PATCH] drm/i915/gtt: drop the page table optimisation
Date: Tue, 13 Jul 2021 15:38:16 +0200 [thread overview]
Message-ID: <YO2XSKyCe5FIBwYS@phenom.ffwll.local> (raw)
In-Reply-To: <20210713130431.2392740-1-matthew.auld@intel.com>
On Tue, Jul 13, 2021 at 02:04:31PM +0100, Matthew Auld wrote:
> We skip filling out the pt with scratch entries if the va range covers
> the entire pt, since we later have to fill it with the PTEs for the
> object pages anyway. However this might leave open a small window where
> the PTEs don't point to anything valid for the HW to consume.
>
> When for example using 2M GTT pages this fill_px() showed up as being
> quite significant in perf measurements, and ends up being completely
> wasted since we ignore the pt and just use the pde directly.
>
> Anyway, currently we have our PTE construction split between alloc and
> insert, which is probably slightly iffy nowadays, since the alloc
> doesn't actually allocate anything anymore, instead it just sets up the
> page directories and points the PTEs at the scratch page. Later when we
> do the insert step we re-program the PTEs again. Better might be to
> squash the alloc and insert into a single step, then bringing back this
> optimisation(along with some others) should be possible.
>
> Fixes: 14826673247e ("drm/i915: Only initialize partially filled pagetables")
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Jon Bloomfield <jon.bloomfield@intel.com>
> Cc: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Cc: <stable@vger.kernel.org> # v4.15+
This is some impressively convoluted code, and I'm scared.
But as far as I managed to convince myself, your story here checks out.
Problem will be a bit that this code moved around a _lot_ so we'll need a
lot of dedicated backports :-(
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
> drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 5 +----
> 1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> index 3d02c726c746..6e0e52eeb87a 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> @@ -303,10 +303,7 @@ static void __gen8_ppgtt_alloc(struct i915_address_space * const vm,
> __i915_gem_object_pin_pages(pt->base);
> i915_gem_object_make_unshrinkable(pt->base);
>
> - if (lvl ||
> - gen8_pt_count(*start, end) < I915_PDES ||
> - intel_vgpu_active(vm->i915))
> - fill_px(pt, vm->scratch[lvl]->encode);
> + fill_px(pt, vm->scratch[lvl]->encode);
>
> spin_lock(&pd->lock);
> if (likely(!pd->entry[idx])) {
> --
> 2.26.3
>
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
WARNING: multiple messages have this Message-ID (diff)
From: Daniel Vetter <daniel@ffwll.ch>
To: Matthew Auld <matthew.auld@intel.com>
Cc: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
Jon Bloomfield <jon.bloomfield@intel.com>,
Chris Wilson <chris.p.wilson@intel.com>,
Daniel Vetter <daniel@ffwll.ch>,
stable@vger.kernel.org
Subject: Re: [PATCH] drm/i915/gtt: drop the page table optimisation
Date: Tue, 13 Jul 2021 15:38:16 +0200 [thread overview]
Message-ID: <YO2XSKyCe5FIBwYS@phenom.ffwll.local> (raw)
In-Reply-To: <20210713130431.2392740-1-matthew.auld@intel.com>
On Tue, Jul 13, 2021 at 02:04:31PM +0100, Matthew Auld wrote:
> We skip filling out the pt with scratch entries if the va range covers
> the entire pt, since we later have to fill it with the PTEs for the
> object pages anyway. However this might leave open a small window where
> the PTEs don't point to anything valid for the HW to consume.
>
> When for example using 2M GTT pages this fill_px() showed up as being
> quite significant in perf measurements, and ends up being completely
> wasted since we ignore the pt and just use the pde directly.
>
> Anyway, currently we have our PTE construction split between alloc and
> insert, which is probably slightly iffy nowadays, since the alloc
> doesn't actually allocate anything anymore, instead it just sets up the
> page directories and points the PTEs at the scratch page. Later when we
> do the insert step we re-program the PTEs again. Better might be to
> squash the alloc and insert into a single step, then bringing back this
> optimisation(along with some others) should be possible.
>
> Fixes: 14826673247e ("drm/i915: Only initialize partially filled pagetables")
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Jon Bloomfield <jon.bloomfield@intel.com>
> Cc: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Cc: <stable@vger.kernel.org> # v4.15+
This is some impressively convoluted code, and I'm scared.
But as far as I managed to convince myself, your story here checks out.
Problem will be a bit that this code moved around a _lot_ so we'll need a
lot of dedicated backports :-(
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
> drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 5 +----
> 1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> index 3d02c726c746..6e0e52eeb87a 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> @@ -303,10 +303,7 @@ static void __gen8_ppgtt_alloc(struct i915_address_space * const vm,
> __i915_gem_object_pin_pages(pt->base);
> i915_gem_object_make_unshrinkable(pt->base);
>
> - if (lvl ||
> - gen8_pt_count(*start, end) < I915_PDES ||
> - intel_vgpu_active(vm->i915))
> - fill_px(pt, vm->scratch[lvl]->encode);
> + fill_px(pt, vm->scratch[lvl]->encode);
>
> spin_lock(&pd->lock);
> if (likely(!pd->entry[idx])) {
> --
> 2.26.3
>
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
next prev parent reply other threads:[~2021-07-13 13:38 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-13 13:04 [Intel-gfx] [PATCH] drm/i915/gtt: drop the page table optimisation Matthew Auld
2021-07-13 13:04 ` Matthew Auld
2021-07-13 13:04 ` Matthew Auld
2021-07-13 13:38 ` Daniel Vetter [this message]
2021-07-13 13:38 ` Daniel Vetter
2021-07-13 13:38 ` Daniel Vetter
2021-07-13 13:55 ` [Intel-gfx] ✓ Fi.CI.BAT: success for " Patchwork
2021-07-13 17:29 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YO2XSKyCe5FIBwYS@phenom.ffwll.local \
--to=daniel@ffwll.ch \
--cc=chris.p.wilson@intel.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-gfx@lists.freedesktop.org \
--cc=matthew.auld@intel.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.