From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Daniel Vetter <daniel@ffwll.ch>
Cc: linuxppc-dev@ozlabs.org, Thomas Hellstrom <thellstrom@vmware.com>,
dri-devel@lists.freedesktop.org,
Michel Danzer <daenzer@vmware.com>
Subject: Re: TTM placement & caching issue/questions
Date: Thu, 04 Sep 2014 19:43:43 +1000 [thread overview]
Message-ID: <1409823823.4246.61.camel@pasglop> (raw)
In-Reply-To: <20140904093454.GG15520@phenom.ffwll.local>
On Thu, 2014-09-04 at 11:34 +0200, Daniel Vetter wrote:
> On Thu, Sep 04, 2014 at 09:44:04AM +0200, Thomas Hellstrom wrote:
> > Last time I tested, (and it seems like Michel is on the same track),
> > writing with the CPU to write-combined memory was substantially faster
> > than writing to cached memory, with the additional side-effect that CPU
> > caches are left unpolluted.
> >
> > Moreover (although only tested on Intel's embedded chipsets), texturing
> > from cpu-cache-coherent PCI memory was a real GPU performance hog
> > compared to texturing from non-snooped memory. Hence, whenever a buffer
> > could be classified as GPU-read-only (or almost at least), it should be
> > placed in write-combined memory.
>
> Just a quick comment since this explicitly referes to intel chips: On
> desktop/laptop chips with the big shared l3/l4 caches it's the other way
> round. Cached uploads are substantially faster than wc and not using
> coherent access is a severe perf hit for texturing. I guess the hw guys
> worked really hard to hide the snooping costs so that the gpu can benefit
> from the massive bandwidth these caches can provide.
This is similar to modern POWER chips as well. We have pretty big L3's
(though not technically shared they are in a separate quadrant and we
have a shared L4 in the memory buffer) and our fabric is generally
optimized for cachable/coherent access performance. In fact, we only
have so many credits for NC accesses on the bus...
What that tells me is that when setting up the desired cachability
attributes for the mapping of a memory object, we need to consider these
things here:
- The hard requirement of the HW (non-coherent GPUs require NC, AGP
does in some cases, etc...) which I think is basically already handled
using the placement attributes set by the GPU driver for the memory type
- The optimal attributes (and platform hard requirements) for fast
memory accesses to an object by the processor. From what I read here,
this can be NC+WC on older Intel, cachable on newer, etc...)
- The optimal attributes for fast GPU DMA accesses to the object in
system memory. Here too, this is fairly platform/chipset dependent.
Do we have flags in the DRM that tell us whether an object in memory is
more likely to be used by the GPU via DMA vs by the CPU via MMIO ? On
powerpc (except in the old AGP case), I wouldn't care about require
cachable in both case, but I can see the low latency crowd wanting the
former to be non-cachable while the dumb GPUs like AST who don't do DMA
would benefit greatly from the latter...
Cheers,
Ben.
next prev parent reply other threads:[~2014-09-04 9:43 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-04 0:12 TTM placement & caching issue/questions Benjamin Herrenschmidt
2014-09-04 1:55 ` Jerome Glisse
2014-09-04 2:07 ` Jerome Glisse
2014-09-04 2:25 ` Benjamin Herrenschmidt
2014-09-04 2:31 ` Jerome Glisse
2014-09-04 2:32 ` Jerome Glisse
2014-09-04 2:36 ` Jerome Glisse
2014-09-04 5:23 ` Benjamin Herrenschmidt
2014-09-04 6:45 ` Gabriel Paubert
2014-09-04 7:19 ` Michel Dänzer
2014-09-04 7:54 ` Benjamin Herrenschmidt
2014-09-04 7:59 ` Michel Dänzer
2014-09-04 7:59 ` Michel Dänzer
2014-09-04 8:07 ` Benjamin Herrenschmidt
2014-09-04 2:15 ` Benjamin Herrenschmidt
2014-09-04 7:12 ` Michel Dänzer
2014-09-04 7:44 ` Thomas Hellstrom
2014-09-04 8:06 ` Benjamin Herrenschmidt
2014-09-04 8:46 ` Thomas Hellstrom
2014-09-04 9:34 ` Daniel Vetter
2014-09-04 9:43 ` Benjamin Herrenschmidt [this message]
2014-09-04 10:23 ` Thomas Hellstrom
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1409823823.4246.61.camel@pasglop \
--to=benh@kernel.crashing.org \
--cc=daenzer@vmware.com \
--cc=daniel@ffwll.ch \
--cc=dri-devel@lists.freedesktop.org \
--cc=linuxppc-dev@ozlabs.org \
--cc=thellstrom@vmware.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).