From: Daniel Vetter <daniel@ffwll.ch>
To: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Airlie <airlied@linux.ie>, intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 21/21] drm/i915: Introduce vmap (mapping of user pages into video memory) ioctl
Date: Mon, 18 Apr 2011 16:58:17 +0200 [thread overview]
Message-ID: <20110418145817.GA3420@viiv.ffwll.ch> (raw)
In-Reply-To: <1302945465-32115-22-git-send-email-chris@chris-wilson.co.uk>
Hi Chris,
First things first, vmap scares me. It's scariness is threefold:
- We haven't really used snooped memory seriously anywhere. And every time
we newly use hw features, stuff tends to blow up in hilarious ways.
- Using snooped memory is rather different than using uc gtt mappings and
has the potential for tons of funny api issues.
- With vmap gem is suddenly no longer in full control of its backing
storage - imo a rather fundamental change.
With that admission out of the way follows the hopefully more rational
part ;-) First I look a bit at the use case for vmap and the problems (as
I see them) it tries to solve. Then comes fear rationalization in the form
of issues I see concerning the above three points. At the end then some
ideas as how to procedd.
I haven't yet looked in-depth into the code, so there's a decent chance
I've got some misconceptions. Please correct me.
Usage
-----
... to tightly integrate gpu rendering with rendering on the cpu, in
whatever form. Your current code uses this for really fast X fallbacks
(optimized with damage tracking to avoid unnecessary syncing), but I think
it might also prove useful for vbo upload, especially with swtnl. In
short: transfers.
There are two ways to further differentiate this: One is whether the
driver can allocate the storage for the cpu renderer or whether that is
handed down to it as fixed (X shm). The other is whether the transfer is a
oneshot or can reasonably be expected to be reused a lot of times (and
hence setup time be amortized).
vmap is squarely designed to for case of externally fixed memory which
will be reused. It obviously subsumes the driver provided memory case (for
not-oneshot stuff). In your X rework you also use it for small oneshot
stuff by amalgating many smaller transfers into larger vmaps.
Other contenders in the transfer business are:
- uc gtt maps, which just suck (besides oneshot uploads of giant stuff)
- and pwrite/pread, which due to abi-design is unusable for 2d/tiled
objects.
Issues/questions: hw
-------------------------
- It newly uses a hw features. If that is not broken in certain special
ways itself, it will surely turn up other bugs by changing the timing of
all kind of things. It would be great to have a smoke (stress) test to
have at least a vague idea of how well it works on different machines.
- While strolling through docs, I've found the following gem (Public ilk
docs, Vol1Part2, Table 3-4 on page 53).
"System Memory Address. Accesses are snooped in processor cache,
allowing shared graphics/ processor access to (locked) cacheable
memory data."
With the range being "[0, 4GB]" Device range is unspecified, and might
be DevBW-DevILK. If that's true, it sounds like a showstopper for
vmapping random userspace address ranges.
(Aside: that comment about (locked) is interesting. Could that explain
why we sometimes miss seqno interrupts because we read a stale seqno
from the hw page due to lack of a asm "lock" prefix? That might be
necessary to avoid reading stale (virtual address indexed) L1 contents
which the gpu might not be able to snoop.)
- Another thing I've noticed: You're using MLC_LLC caching for vmaps. Is
that required to get coherent behaviour? If so we might need to change
the if (cache_level != NONE) return; to if (cache_level == MLC_LLC)
return; in clflush_object (for snb).
Issues/questions: api
---------------------
- One thing we already agreed on: the kernel won't magically redirect gtt
mappings to cpu mappings. Userspace has to adapt.
- Your userspace already tracks the damage in both the cpu and the gpu
portion of a pixmap. So I think it makes sense to fully offload
coherency management to userspace, including gpu synchronization.
Shooting down ptes would likely kill all benefits, anyway ;-)
- You're using set_domain_ioctl for synchronization. It seems to be enough
for uber-synchronous X. GL's a bit better and might benefit from
asynchronous execution fences (hello ttm). Nothing pressing, but
something to keep in mind.
Issues/questions: vmap specifically
-----------------------------------
I can't do fear rationalization here, it essentially boils down to me not
being an -mm dev.
The other thing I'm wondering is what other use cases than X shm accel
this has. Single-purpose api extensions at the kernel level cause a
somewhat irky feeling for me.
I've also thought a bit about backwards compat of the new userspace code.
But given that vmap is just a special scheme for transfers I don't see a
problem in hiding it behind a bit of abstraction.
Ideas for moving forward
------------------------
In conclusion I think the idea of vmap - to use snoopeable mem for tight
cpu/gpu integration - is awesome and your experimental branches serve a
great landmark for where to head to. But my gut feeeling also says that
it's a too big leap to mainline in just one step. Hence this proposal for
a tad bit more piecemeal approach. The goal being not to tackle
everything at once and to have the opportunity for some adjustment
mid-way.
First start with a new ioctl to enable snooped bos. We might prepare for a
kernel object cache and introduce a new gem_create ioctl, or just a
set_cacheable_ioctl. In short, just implement the "driver provided memory"
use case.
I think a smoke test should exist at merge time. Not just to have an idea
of which chipsets actually work, but also to have a clear example of the
intended use of snoopable bos wrt to synchronization. I.e. a simpel
example to think through the api implications.
Then convert over userspace. By looking at your snb branch, there's quite
some work to do. But there's also about 4-5 months until the 2.640
release, so should be plenty of time.
Using snoopable mem in less synchronous settings like mesa perhaps also
shows some api-hole that needs stuffing.
Then, when users adopt this in masses (probably somwhere in Q3) and we've
weeded out all the api kinks and worked around hw oddities, I think it's
time to re-evaluate the "externally provided memory" use-case and take a
look at vmap (the ioctl as proposed in your patch).
I have hopes that we might be able to subsume that use-case into the
single-shot use-case by beefing up pwrite/read with a blt variant that
does the right thing for 2d/tiled buffers (and also handles stride
mismatches). That feels a bit more generally useful, which is why I lean a
bit towards it. On the other hand we might turn vmap on it's head and
create an shm id out of a bo for X to use (if X shm turns out to be the
only user for such a thing).
Comments, ideas, flames highly welcome.
Yours, Daniel
--
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48
next prev parent reply other threads:[~2011-04-18 14:59 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-04-16 9:17 i915 next, post-llc Chris Wilson
2011-04-16 9:17 ` [PATCH 01/21] drm/i915: Cache GT fifo count for SandyBridge Chris Wilson
2011-04-16 9:17 ` [PATCH 02/21] drm/i915: Refactor pwrite/pread to use single copy of get_user_pages Chris Wilson
2011-04-16 9:17 ` [PATCH 03/21] drm/i915: s/addr & ~PAGE_MASK/offset_in_page(addr)/ Chris Wilson
2011-04-16 9:17 ` [PATCH 04/21] drm/i915: Maintain fenced gpu access until we flush the fence Chris Wilson
2011-04-16 9:17 ` [PATCH 05/21] drm/i915: Invalidate fenced read domains upon flush Chris Wilson
2011-04-16 9:17 ` [PATCH 06/21] drm/i915: Pass the fence register number to be written Chris Wilson
2011-04-16 9:17 ` [PATCH 07/21] drm/i915: Track fence setup separately from fenced object lifetime Chris Wilson
2011-04-16 13:20 ` Daniel Vetter
2011-04-16 9:17 ` [PATCH 08/21] drm/i915: Only print out the actual number of fences for i915_error_state Chris Wilson
2011-04-16 9:17 ` [PATCH 09/21] drm/i915: Implement direct support for 24 bit LVDS pixel format Chris Wilson
2011-04-16 9:17 ` [PATCH 10/21] drm/i915: Implement manual override of LVDS single/dual channel mode Chris Wilson
2011-04-16 9:17 ` [PATCH 11/21] drm/i915/tv: Use a direct pointer for tv_mode Chris Wilson
2011-04-16 9:17 ` [PATCH 12/21] drm/i915: Replace ironlake_compute_wm0 with g4x_compute_wm0 Chris Wilson
2011-04-16 9:17 ` [PATCH 13/21] drm/i915/crt: Explicitly return false if connected to a digital monitor Chris Wilson
2011-04-16 9:17 ` [PATCH 14/21] drm/i915/i2c: Convert from using GMBUS1 + reg_offset idiom to reg + 0 Chris Wilson
2011-04-16 9:17 ` [PATCH 15/21] drm/i915/gmbus: Reset the controller on initialisation Chris Wilson
2011-04-16 9:17 ` [PATCH 16/21] drm/i915: Retire requests before disabling pagefaults Chris Wilson
2011-04-16 13:44 ` Daniel Vetter
2011-04-16 9:17 ` [PATCH 17/21] drm/i915: Repeat retiring of requests until the seqno is stable Chris Wilson
2011-04-16 13:45 ` Daniel Vetter
2011-04-16 9:17 ` [PATCH 18/21] drm/i915: Split out i915_gem_object_move_to_ring() from execbuffer Chris Wilson
2011-04-16 13:54 ` Daniel Vetter
2011-04-16 14:18 ` Chris Wilson
2011-04-16 14:24 ` Daniel Vetter
2011-04-16 9:17 ` [PATCH 19/21] drm/i915: Enable the use of GPU semaphores whilst page-flipping Chris Wilson
2011-04-16 13:58 ` Daniel Vetter
2011-04-16 14:20 ` Chris Wilson
2011-04-16 9:17 ` [PATCH 20/21] drm/i915: Use a slab for object allocation Chris Wilson
2011-04-16 14:07 ` Daniel Vetter
2011-04-16 9:17 ` [PATCH 21/21] drm/i915: Introduce vmap (mapping of user pages into video memory) ioctl Chris Wilson
2011-04-18 14:58 ` Daniel Vetter [this message]
2011-04-19 6:20 ` Chris Wilson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110418145817.GA3420@viiv.ffwll.ch \
--to=daniel@ffwll.ch \
--cc=airlied@linux.ie \
--cc=chris@chris-wilson.co.uk \
--cc=intel-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox