Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Hellstrom, Thomas" <thomas.hellstrom@intel.com>
To: "intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>,
	"tvrtko.ursulin@linux.intel.com" <tvrtko.ursulin@linux.intel.com>
Cc: "Auld, Matthew" <matthew.auld@intel.com>,
	"Santa, Carlos" <carlos.santa@intel.com>,
	"Vivi, Rodrigo" <rodrigo.vivi@intel.com>
Subject: Re: [Intel-xe] LLC configurating, mmap and bo cache management questions
Date: Wed, 6 Dec 2023 11:46:17 +0000	[thread overview]
Message-ID: <36e0e059e0f74ca7c55b2647658456fabe26420d.camel@intel.com> (raw)
In-Reply-To: <eaa44445-21e2-4b9f-95a0-6b7395203900@linux.intel.com>

Hi, Tvrtko.

On Wed, 2023-12-06 at 10:58 +0000, Tvrtko Ursulin wrote:
> 
> Hi Thomas,
> 
> On 06/12/2023 08:26, Hellstrom, Thomas wrote:
> > On Tue, 2023-12-05 at 14:19 +0000, Tvrtko Ursulin wrote:
> > > 
> > > Hi,
> > > 
> > > We are working on adding xe support to ChromeOS minigbm and have
> > > a
> > > couple questions.
> > > 
> > > If I follow things correctly with xe mmap caching mode is fixed
> > > to
> > > object caching modes set at bo create. For framebuffers it will
> > > be WC
> > > and for the rest userspace can choose WB or WC via
> > > drm_xe_gem_create->cpu_caching. (Unless discrete, when WB cannot
> > > be
> > > used
> > > at all.)
> > > 
> > > AFAICT minigbm basically cares about two transition points. Lets
> > > call
> > > them CPU access begin and end.
> > > 
> > > 1)
> > > When a bo is mmapped it wants to invalidate the cache, which
> > > looks to
> > > be
> > > about making sure all GPU writes have landed to the backing
> > > store. In
> > > the i915 world that translates to the set_domain ioctl.
> > > 
> > > What is the uapi for this with xe, or it is somehow guaranteed to
> > > not
> > > be
> > > needed?
> > 
> > Signalling a user-fence or dma-fence obtained as an out-fence from
> > an
> > exec call will guarantee GPU caches are flushed. Currently I don't
> > think there is anything like gem wait in the uAPI, although Matt is
> > just about to add functionality to wait on all outstanding work on
> > an
> > exec_queue.
> 
> Problem I see is there are no execs or therefore fences in the
> minigbm 
> ABI. It's just buffers, created or imported, CPU access and some
> other 
> stuff.
> 
> And it is quite extensively used in the OS so I assume it has to work
> (I 
> mean the invalidation/flushing was not put in there for no reason),
> in 
> other words, where the i915 backend today does
> DRM_I915_GEM_SET_DOMAIN 
> on "cpu access begin", which is buffer based, I am not clear how to 
> implement that with xe.
> 
> For the outstanding work you mention, since you say it is about 
> exec_queue, I assume again it will not work purely with buffers? If
> so 
> it probably won't be useful for minigbm.
> 
> Also if I look at all the other minigbm backends, I see a mix of
> behaviours:
> 
>   * msm and vc4 appear to not concern themselves with any of this.
> 
>   * rockchip appears to be doing full bounce buffering via memcpy on
> CPU 
> access begin/end.
> 
>   * i915 and amdgpu respectively use 
> DRM_I915_GEM_SET_DOMAIN/DRM_AMDGPU_GEM_WAIT_IDLE (also buffer based,
> not 
> execution queue). Andgpu curiously does not do any flushing on CPU 
> access end.
> 
> Digging into git history, both DRM_I915_GEM_SET_DOMAIN on CPU access 
> begin and clflushing on end were added to fix various CTS test
> failures. 
> So I guess we could also wait and see what happens there. If those or
> some will be failing with xe too then propose adding some new uapi.
> Or 
> if manual testing will start reporting visual corruption in the UI 
> elements or such.

Indeed it sounds like we'd need a DRM_XE_GEM_WAIT_IDLE, similar to
AMDGPU for this use-case. I'll bring that up for discussion.

> 
> > > 2)
> > > When a bo is unmapped, or CPU access finished, it wants to flush
> > > the
> > > CPU
> > > caches. That is /almost/ completely a CPU operation, where it
> > > just
> > > needs
> > > to either clflush or invalidate the WC buffer respectively, if
> > > not
> > > the
> > > fact that clflush can be skipped on platforms with LLC.
> > > 
> > > I did not see an equivalent of an I915_PARAM_HAS_LLC in xe? Did I
> > > miss
> > > it or what it is the plan for querying this detail?
> > 
> > XeKMD is generally coherent, except if UMD selects a GPU PAT index
> > with
> > limited coherency together with WB instead of WC memory. In that
> > case,
> > UMD is responsible for doing the needed CLFLUSH-ing, whereas KMD
> > only
> > ensures initial clearing of the pages is CLFLUSHED for security
> > reasons.
> > 
> > I'm not 100% sure if UMD can actually select WB with limited
> > coherency
> > PAT index in the initial uAPI revision, but Matthew has received
> > requests for that so any additional input here on performance
> > implications is appreciated.
> > 
> > The thinking here is otherwise that GPU PAT indices with limited
> > coherency should be used together with WC memory in the same
> > situations
> > as VRAM/LMEM is used on DGFX.
> 
> Hm okay, this would be the VM BIND side of things which deals with
> GPU 
> PATs. From the CPU side it is just CPU caching modes implicitly
> selected 
> via DRM_XE_GEM_CPU_CACHING_WB/WC.
> 
> The question about the analogue of I915_PARAM_HAS_LLC was AFAIU about
> a 
> performance optimisation where UMD is deciding whether it is okay to 
> skip issuing clflush for the mapped bo if DRM_XE_GEM_CPU_CACHING_WB
> was 
> used. (If DRM_XE_GEM_CPU_CACHING_WC was used it obviously only needs
> to 
> flush the write combine buffer.)
> 
> Looking at what Mesa is doing it appears it is not using 
> I915_PARAM_HAS_LLC but has its own device info tables. So I guess 
> minigbm xe backend will have to do the same if xe will not provide
> the 
> analogue query.

IMO clflushing in this case should never be needed, (Again, similar to
AMD). Whatever renders from / to those buffers should make sure they
are clflushed before or while accessing them. How is a rendering API
made aware of these bos? Are they imported using drm prime?

/Thomas


> 
> Regards,
> 
> Tvrtko


  reply	other threads:[~2023-12-06 11:46 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-05 14:19 [Intel-xe] LLC configurating, mmap and bo cache management questions Tvrtko Ursulin
2023-12-06  8:26 ` Hellstrom, Thomas
2023-12-06 10:58   ` Tvrtko Ursulin
2023-12-06 11:46     ` Hellstrom, Thomas [this message]
2023-12-07 11:11       ` Tvrtko Ursulin
2023-12-07 12:01         ` Thomas Hellström
2023-12-13 11:55           ` Tvrtko Ursulin
2023-12-13 17:27             ` Thomas Hellström
2023-12-13 17:50               ` Tvrtko Ursulin
2023-12-13 20:11                 ` Thomas Hellström
2023-12-14  8:10                   ` Tvrtko Ursulin
2023-12-14 10:52                     ` Thomas Hellström
2023-12-14 12:34                       ` Tvrtko Ursulin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=36e0e059e0f74ca7c55b2647658456fabe26420d.camel@intel.com \
    --to=thomas.hellstrom@intel.com \
    --cc=carlos.santa@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.auld@intel.com \
    --cc=rodrigo.vivi@intel.com \
    --cc=tvrtko.ursulin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox