Re: [RFC PATCH 0/8] ULLS for kernel submission of migration jobs

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: Matthew Brost <matthew.brost@intel.com>
Cc: intel-xe@lists.freedesktop.org
Subject: Re: [RFC PATCH 0/8] ULLS for kernel submission of migration jobs
Date: Tue, 17 Sep 2024 08:59:10 +0200	[thread overview]
Message-ID: <5ee3eec0b0c8585b0b278d9f2f779d299aa51e3a.camel@linux.intel.com> (raw)
In-Reply-To: <Zug40aMNkJ9ZibYC@DUT025-TGLU.fm.intel.com>

Hi, Matt

On Mon, 2024-09-16 at 13:55 +0000, Matthew Brost wrote:
> On Mon, Aug 12, 2024 at 05:26:26PM +0000, Matthew Brost wrote:
> > On Mon, Aug 12, 2024 at 10:53:01AM +0200, Thomas Hellström wrote:
> > > Hi, Matt,
> > > 
> > > On Sun, 2024-08-11 at 19:47 -0700, Matthew Brost wrote:
> > > > Ultra low latency for kernel submission of migration jobs.
> > > > 
> > > > The basic idea is that faults (CPU or GPU) typically depend on
> > > > migration
> > > > jobs. Faults should be addressed as quickly as possible, but
> > > > context
> > > > switches via GuC on hardware are slow. To avoid context
> > > > switches,
> > > > perform ULLS in the kernel for migration jobs on discrete
> > > > faulting
> > > > devices with an LR VM open.
> > > > 
> > > > This is implemented by switching the migration layer to ULLS
> > > > mode
> > > > upon
> > > > opening an LR VM. In ULLS mode, migration jobs have a preamble
> > > > and
> > > > postamble: the preamble clears the current semaphore value, and
> > > > the
> > > > postamble waits for the next semaphore value. Each job
> > > > submission
> > > > sets
> > > > the current semaphore in memory, bypassing the GuC. The net
> > > > effect is
> > > > that the migration execution queue never gets switched off the
> > > > hardware
> > > > while an LR VM is open.
> > > > 
> > > > There may be concerns regarding power management, as the ring
> > > > program
> > > > continuously runs on a copy engine, and a force wake reference
> > > > to a
> > > > copy
> > > > engine is held with an LR VM open.
> > > > 
> > > > The implementation has been lightly tested but seems to be
> > > > working.
> > > > 
> > > > This approach will likely be put on hold until SVM is
> > > > operational
> > > > with
> > > > benchmarks, but it is being posted early for feedback and as a
> > > > public
> > > > checkpoint.
> > > > 
> > > > Matt
> > > 
> > > The main concern I have with this is that, at least according to
> > > upstream discussions, pagefaults are so slow anyway, a performant
> > > stack
> > > needs to try extremely hard to avoid them using manual prefaults,
> > > and
> > > if we hit a gpu pagefault, we've lost anyway and any migration
> > > latency
> > > optimization won't matter much.
> > > 
> > 
> > I agree that if pagefaults are getting hit all the time we are in
> > trouble wrt to performance but that doesn't mean when they do occur
> > we
> > shouldn't try to make servicing them as fast as possible.
> > 
> 
> Okay, there is definitely something to this. I have an SVM test that
> serially bounces (i.e., each bounce results in a GuC context switch)
> a
> 2M allocation via fault about 1,000 times between the CPU and GPU. I
> applied this series and added a modparam to enable/disable ULLS to
> the
> tip of my latest working branch [3].
> 
> Without ULLS enabled, the test on average took about 3 seconds on
> BMG.
> With ULLS enabled, the test on average took 2 seconds. I was
> expecting a
> small gain, but this is significant. Other similar sections I have
> seen
> a speed nearing twice as fast. It seems to indicate that this series
> is
> definitely worth pursuing.
> 
> Also note that any operation using the migrate engine will see gains
> (e.g., clearing BOs, prefetches, eviction) in terms of latency.

I'm still concerned about adding this, and if we do we should only use
it as a last resource if we see significant performance improvements in
real-world applications. Historically all the latency optimizations was
what screwed up the maintainability of the i915 driver and I think we
should be extremely careful so we don't end up in the same situation.
Concerns also around power management.

Regardless, ULLS only really should improve things if we fail to
pipeline migrations on the HW in the non-ULLS case, assuming that we're
using a single exec-queue in both cases. Is that because we wait for
each migration to complete before allowing the next one? If so, is that
something we could look at?

Thanks,
Thomas



> 
> Matt
> 
> > > Also, for power management, LR VM open is a very simple strategy,
> > > which
> > > is good, but shouldn't it be possible to hook that up to LR job
> > > running, similar to vm->preempt.rebind_deactivated?
> > > 
> > 
> > That seems possible. Then in is scenario we'd hook the
> > xe_migrate_lr_vm_get / put calls [1] [2] and runtime PM calls into
> > the
> > LR VM activate / deactivate calls rather LR VM open / close calls.
> > 
> > Matt
> > 
> > [1]
> > https://patchwork.freedesktop.org/patch/607842/?series=137128&rev=1
> > [2]
> > https://patchwork.freedesktop.org/patch/607841/?series=137128&rev=1
> > 
> > > /Thomas
> > > 
> > > 
> > > > 
> > > > Matthew Brost (8):
> > > >   drm/xe: Add xe_hw_engine_write_ring_tail
> > > >   drm/xe: Add ULLS support to LRC
> > > >   drm/xe: Add ULLS flags for jobs
> > > >   drm/xe: Add ULLS migration job support to migration layer
> > > >   drm/xe: Add MI_SEMAPHORE_WAIT instruction defs
> > > >   drm/xe: Add ULLS migration job support to ring ops
> > > >   drm/xe: Add ULLS migration job support to GuC submission
> > > >   drm/xe: Enable ULLS migration jobs when opening LR VM
> > > > 
> > > >  .../gpu/drm/xe/instructions/xe_mi_commands.h  |   6 +
> > > >  drivers/gpu/drm/xe/xe_guc_submit.c            |  26 +++-
> > > >  drivers/gpu/drm/xe/xe_hw_engine.c             |  10 ++
> > > >  drivers/gpu/drm/xe/xe_hw_engine.h             |   1 +
> > > >  drivers/gpu/drm/xe/xe_lrc.c                   |  49 +++++++
> > > >  drivers/gpu/drm/xe/xe_lrc.h                   |   3 +
> > > >  drivers/gpu/drm/xe/xe_lrc_types.h             |   2 +
> > > >  drivers/gpu/drm/xe/xe_migrate.c               | 130
> > > > +++++++++++++++++-
> > > >  drivers/gpu/drm/xe/xe_migrate.h               |   4 +
> > > >  drivers/gpu/drm/xe/xe_ring_ops.c              |  32 +++++
> > > >  drivers/gpu/drm/xe/xe_sched_job_types.h       |   3 +
> > > >  drivers/gpu/drm/xe/xe_vm.c                    |  10 ++
> > > >  12 files changed, 268 insertions(+), 8 deletions(-)
> > > > 
> > >

next prev parent reply	other threads:[~2024-09-17  6:59 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-12  2:47 [RFC PATCH 0/8] ULLS for kernel submission of migration jobs Matthew Brost
2024-08-12  2:47 ` [RFC PATCH 1/8] drm/xe: Add xe_hw_engine_write_ring_tail Matthew Brost
2024-08-12  2:47 ` [RFC PATCH 2/8] drm/xe: Add ULLS support to LRC Matthew Brost
2024-08-12  2:47 ` [RFC PATCH 3/8] drm/xe: Add ULLS flags for jobs Matthew Brost
2024-08-12  2:47 ` [RFC PATCH 4/8] drm/xe: Add ULLS migration job support to migration layer Matthew Brost
2024-08-12  2:47 ` [RFC PATCH 5/8] drm/xe: Add MI_SEMAPHORE_WAIT instruction defs Matthew Brost
2024-08-12  2:47 ` [RFC PATCH 6/8] drm/xe: Add ULLS migration job support to ring ops Matthew Brost
2024-08-12  2:47 ` [RFC PATCH 7/8] drm/xe: Add ULLS migration job support to GuC submission Matthew Brost
2024-08-12  2:47 ` [RFC PATCH 8/8] drm/xe: Enable ULLS migration jobs when opening LR VM Matthew Brost
2024-08-12  2:51 ` ✓ CI.Patch_applied: success for ULLS for kernel submission of migration jobs Patchwork
2024-08-12  2:52 ` ✗ CI.checkpatch: warning " Patchwork
2024-08-12  2:53 ` ✓ CI.KUnit: success " Patchwork
2024-08-12  3:05 ` ✓ CI.Build: " Patchwork
2024-08-12  3:07 ` ✓ CI.Hooks: " Patchwork
2024-08-12  3:08 ` ✓ CI.checksparse: " Patchwork
2024-08-12  3:28 ` ✓ CI.BAT: " Patchwork
2024-08-12  4:31 ` ✗ CI.FULL: failure " Patchwork
2024-08-12  8:53 ` [RFC PATCH 0/8] " Thomas Hellström
2024-08-12 17:26   ` Matthew Brost
2024-09-16 13:55     ` Matthew Brost
2024-09-17  6:59       ` Thomas Hellström [this message]
2024-09-17 14:39         ` Matthew Brost

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5ee3eec0b0c8585b0b278d9f2f779d299aa51e3a.camel@linux.intel.com \
    --to=thomas.hellstrom@linux.intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.brost@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox