* [PATCH] drm/i915: Disable full ppgtt by default
@ 2014-03-06 11:14 Daniel Vetter
2014-03-06 18:01 ` Jesse Barnes
2014-03-06 18:17 ` Ben Widawsky
0 siblings, 2 replies; 7+ messages in thread
From: Daniel Vetter @ 2014-03-06 11:14 UTC (permalink / raw)
To: Intel Graphics Development; +Cc: Daniel Vetter, Ben Widawsky
There are too many oustanding issues:
- Fence handling in the current code is broken. There's a patch series
from me, but it's blocked on and extended review (which includes
writing the testcases).
- IOMMU mapping handling is broken, we need to properly refcount it -
currently it gets destroyed when the first vma is unbound, so way
too early.
- There's a pending reset issue on snb. Since Mika's reset work and
full ppgtt have been pulled in in separate branches and ended up
intermittingly breaking each another it's unclear who's the exact
culprit here.
- We still have persistent evidince of crazy recursion bugs through
vma_unbind and ppgtt_relase, e.g.
https://bugs.freedesktop.org/show_bug.cgi?id=73383
This issue (and a few others meanwhile resolved) have blocked our
performance measuring/tuning group since 3 months.
- Secure batch dispatching is broken. This is blocking Brad Volkin's
command checker work since 3 months.
All these issues are confirmed to only happen when full ppgtt is
enabled, falling back to aliasing ppgtt resolves them. But even
aliasing ppgtt itself still has a regression:
- We currently unconditionally bind objects into the aliasing ppgtt,
which means all priviledged objects like ringbuffers are visible to
unpriviledged access again. On top of that this also breaks the
command checker for aliasing ppgtt, since it can't hide the
validated batch any more.
Furthermore topic/full-ppgtt has never been reviewed:
- Lifetime rules around vma unbinding/release are unclear, resulting
into this awesome hack called ppgtt_release. Which seems to take the
blame for most of the recursion fallout.
- Context/ring init works different on gpu reset than anywhere else.
Such differeneces have in the past always lead to really hard to
track down bugs.
- Aliasing ppgtt is treated in a bunch of places as a real address
space, but it isn't - the real address space is always the global
gtt in that case. This results in a bit a mess between contexts and
ppgtt object, further complication the context/ppgtt/vma lifetime
rules.
- We don't have any docs describing the overall concepts introduced
with full ppgtt. A short, concise overview describing vmas and some
of the strange bits around them (like the unbound vmas used by
execbuf, or the new binding rules) really is needed.
Note that a lot of the post topic/full-ppgtt merge fallout has already
been addressed, this entire list here of 10 issues really only contains
the still outstanding issues.
Finally the 3.15 merge window is approaching and I think we need to
use the remaining time to ensure that our fallback option of using
aliasing ppgtt is in solid shape. Hence I think it's time to throw the
switch. While at it demote the helper from static inline status
because really.
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Dave Airlie <airlied@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
To preempt the inevitable flamewar:
- I'm not really up for another experiment to merge a working feature
as-is through a branch and then polish it later because
a) full ppgtt never really worked well enough to begin with and
b) the polishing (and bugfixing) didn't really happen sufficiently
quickly to avoid blocking tons of other people's work.
No I don't have a better idea than our current patch/people crunch
process to get big feature work in, but feature branches (at least
like it went down with full ppgtt) is definitely worse.
- Not moving the goalpost here. Most of these issues have been spotted
in review or caught with tests even before the plan to merge an
unreviewed full ppgtt topic branch was agreed on. But they all have
been shrugged off or ducttaped over the failing testcase without
fixing the underlying issue.
- I've had my fair share of flamewars in the past few months, not up
for another one just now. I've discussed this a bit with Ben in
private and he said that as long as I clearly state what's missing
and as long I only disable the default and don't rip out the code he
can live with this, though obviously he's not happy.
So if you disagree with my decision to disable full ppgtt for now or
disagree with my assessment that the feature branch experiment
failed and shouldn't be repeated without changes, then please follow
the established maintainer impeachement procedures and send a pull
request to Dave Airlie to sign yourself up for this grateful
position.
-Daniel
---
drivers/gpu/drm/i915/i915_drv.h | 22 +---------------------
drivers/gpu/drm/i915/i915_gem_gtt.c | 23 +++++++++++++++++++++++
2 files changed, 24 insertions(+), 21 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index dce09fcddc2c..843aaee8d80b 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2377,27 +2377,7 @@ static inline void i915_gem_chipset_flush(struct drm_device *dev)
intel_gtt_chipset_flush();
}
int i915_gem_init_ppgtt(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt);
-static inline bool intel_enable_ppgtt(struct drm_device *dev, bool full)
-{
- if (i915.enable_ppgtt == 0 || !HAS_ALIASING_PPGTT(dev))
- return false;
-
- if (i915.enable_ppgtt == 1 && full)
- return false;
-
-#ifdef CONFIG_INTEL_IOMMU
- /* Disable ppgtt on SNB if VT-d is on. */
- if (INTEL_INFO(dev)->gen == 6 && intel_iommu_gfx_mapped) {
- DRM_INFO("Disabling PPGTT because VT-d is on\n");
- return false;
- }
-#endif
-
- if (full)
- return HAS_PPGTT(dev);
- else
- return HAS_ALIASING_PPGTT(dev);
-}
+bool intel_enable_ppgtt(struct drm_device *dev, bool full);
/* i915_gem_stolen.c */
int i915_gem_init_stolen(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index a616cac0f161..3d8bd62a926c 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -30,6 +30,29 @@
#include "i915_trace.h"
#include "intel_drv.h"
+bool intel_enable_ppgtt(struct drm_device *dev, bool full)
+{
+ if (i915.enable_ppgtt == 0 || !HAS_ALIASING_PPGTT(dev))
+ return false;
+
+ if (i915.enable_ppgtt == 1 && full)
+ return false;
+
+#ifdef CONFIG_INTEL_IOMMU
+ /* Disable ppgtt on SNB if VT-d is on. */
+ if (INTEL_INFO(dev)->gen == 6 && intel_iommu_gfx_mapped) {
+ DRM_INFO("Disabling PPGTT because VT-d is on\n");
+ return false;
+ }
+#endif
+
+ /* Full ppgtt disabled by default for now due to issues. */
+ if (full)
+ return false; /* HAS_PPGTT(dev) */
+ else
+ return HAS_ALIASING_PPGTT(dev);
+}
+
#define GEN6_PPGTT_PD_ENTRIES 512
#define I915_PPGTT_PT_ENTRIES (PAGE_SIZE / sizeof(gen6_gtt_pte_t))
typedef uint64_t gen8_gtt_pte_t;
--
1.8.5.2
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/i915: Disable full ppgtt by default
2014-03-06 11:14 [PATCH] drm/i915: Disable full ppgtt by default Daniel Vetter
@ 2014-03-06 18:01 ` Jesse Barnes
2014-03-06 18:17 ` Ben Widawsky
1 sibling, 0 replies; 7+ messages in thread
From: Jesse Barnes @ 2014-03-06 18:01 UTC (permalink / raw)
To: Daniel Vetter; +Cc: Intel Graphics Development, Ben Widawsky
On Thu, 6 Mar 2014 12:14:21 +0100
Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> There are too many oustanding issues:
>
> - Fence handling in the current code is broken. There's a patch series
> from me, but it's blocked on and extended review (which includes
> writing the testcases).
>
> - IOMMU mapping handling is broken, we need to properly refcount it -
> currently it gets destroyed when the first vma is unbound, so way
> too early.
>
> - There's a pending reset issue on snb. Since Mika's reset work and
> full ppgtt have been pulled in in separate branches and ended up
> intermittingly breaking each another it's unclear who's the exact
> culprit here.
>
> - We still have persistent evidince of crazy recursion bugs through
> vma_unbind and ppgtt_relase, e.g.
>
> https://bugs.freedesktop.org/show_bug.cgi?id=73383
>
> This issue (and a few others meanwhile resolved) have blocked our
> performance measuring/tuning group since 3 months.
>
> - Secure batch dispatching is broken. This is blocking Brad Volkin's
> command checker work since 3 months.
Do we have bugs and/or tasks filed for all these, or just the one? We
need to make sure people are signed up to fix/review them, or it'll end
up staying disabled forever, and then we'll be stuck without a command
checker and some advanced features coming up...
--
Jesse Barnes, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/i915: Disable full ppgtt by default
2014-03-06 11:14 [PATCH] drm/i915: Disable full ppgtt by default Daniel Vetter
2014-03-06 18:01 ` Jesse Barnes
@ 2014-03-06 18:17 ` Ben Widawsky
2014-03-06 20:30 ` Daniel Vetter
1 sibling, 1 reply; 7+ messages in thread
From: Ben Widawsky @ 2014-03-06 18:17 UTC (permalink / raw)
To: Daniel Vetter; +Cc: Intel Graphics Development
On Thu, Mar 06, 2014 at 12:14:21PM +0100, Daniel Vetter wrote:
> There are too many oustanding issues:
>
> - Fence handling in the current code is broken. There's a patch series
> from me, but it's blocked on and extended review (which includes
> writing the testcases).
>
> - IOMMU mapping handling is broken, we need to properly refcount it -
> currently it gets destroyed when the first vma is unbound, so way
> too early.
>
> - There's a pending reset issue on snb. Since Mika's reset work and
> full ppgtt have been pulled in in separate branches and ended up
> intermittingly breaking each another it's unclear who's the exact
> culprit here.
>
> - We still have persistent evidince of crazy recursion bugs through
> vma_unbind and ppgtt_relase, e.g.
>
> https://bugs.freedesktop.org/show_bug.cgi?id=73383
>
> This issue (and a few others meanwhile resolved) have blocked our
> performance measuring/tuning group since 3 months.
>
> - Secure batch dispatching is broken. This is blocking Brad Volkin's
> command checker work since 3 months.
>
> All these issues are confirmed to only happen when full ppgtt is
> enabled, falling back to aliasing ppgtt resolves them. But even
> aliasing ppgtt itself still has a regression:
>
> - We currently unconditionally bind objects into the aliasing ppgtt,
> which means all priviledged objects like ringbuffers are visible to
> unpriviledged access again. On top of that this also breaks the
> command checker for aliasing ppgtt, since it can't hide the
> validated batch any more.
>
> Furthermore topic/full-ppgtt has never been reviewed:
>
> - Lifetime rules around vma unbinding/release are unclear, resulting
> into this awesome hack called ppgtt_release. Which seems to take the
> blame for most of the recursion fallout.
>
> - Context/ring init works different on gpu reset than anywhere else.
> Such differeneces have in the past always lead to really hard to
> track down bugs.
>
> - Aliasing ppgtt is treated in a bunch of places as a real address
> space, but it isn't - the real address space is always the global
> gtt in that case. This results in a bit a mess between contexts and
> ppgtt object, further complication the context/ppgtt/vma lifetime
> rules.
>
> - We don't have any docs describing the overall concepts introduced
> with full ppgtt. A short, concise overview describing vmas and some
> of the strange bits around them (like the unbound vmas used by
> execbuf, or the new binding rules) really is needed.
>
> Note that a lot of the post topic/full-ppgtt merge fallout has already
> been addressed, this entire list here of 10 issues really only contains
> the still outstanding issues.
>
> Finally the 3.15 merge window is approaching and I think we need to
> use the remaining time to ensure that our fallback option of using
> aliasing ppgtt is in solid shape. Hence I think it's time to throw the
> switch. While at it demote the helper from static inline status
> because really.
>
> Cc: Ben Widawsky <ben@bwidawsk.net>
> Cc: Dave Airlie <airlied@gmail.com>
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
[snip]
I want a concise list in the commit message so it's obvious as we fix
things if we've achieved the goal or not. If you want to have nice prose
describing the reason and/or your feelings, that's fine, but please put
it after the concise list.
I'll start what I want, and please fill in as needed. I believe this is
all 10 you mentioned.
* Fence handling broken: BUG #
* IOMMU Broken: BUG #
* "Reset issue": Bug #
* Secure dispatch: Failing testcase:
* Bug: https://bugs.freedesktop.org/show_bug.cgi?id=73383
* Documentation
Then there is fuzzy stuff that you "want" which need more clarification
on exactly what will satisfy you.
* Lifetime rules: No clear requirement from you.
* Context/ring init differences: What do you want?
* Aliasing PPGTT real address treatment: What do you want?
In my opinion, the last 3 are things you've imposed because of your
style as maintainer, whereas the first 7 are real issues that any sane
person would require before turning on.
Anyway, if you make the concise list like I want, at the top of the
commit, and you fill in the missing details, this is:
Acked-by: Ben Widawsky <ben@bwidawsk.net>
--
Ben Widawsky, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/i915: Disable full ppgtt by default
2014-03-06 18:17 ` Ben Widawsky
@ 2014-03-06 20:30 ` Daniel Vetter
2014-03-06 21:42 ` Daniel Vetter
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Daniel Vetter @ 2014-03-06 20:30 UTC (permalink / raw)
To: Ben Widawsky; +Cc: Daniel Vetter, Intel Graphics Development
On Thu, Mar 06, 2014 at 10:17:12AM -0800, Ben Widawsky wrote:
> On Thu, Mar 06, 2014 at 12:14:21PM +0100, Daniel Vetter wrote:
> > There are too many oustanding issues:
> >
> > - Fence handling in the current code is broken. There's a patch series
> > from me, but it's blocked on and extended review (which includes
> > writing the testcases).
> >
> > - IOMMU mapping handling is broken, we need to properly refcount it -
> > currently it gets destroyed when the first vma is unbound, so way
> > too early.
> >
> > - There's a pending reset issue on snb. Since Mika's reset work and
> > full ppgtt have been pulled in in separate branches and ended up
> > intermittingly breaking each another it's unclear who's the exact
> > culprit here.
> >
> > - We still have persistent evidince of crazy recursion bugs through
> > vma_unbind and ppgtt_relase, e.g.
> >
> > https://bugs.freedesktop.org/show_bug.cgi?id=73383
> >
> > This issue (and a few others meanwhile resolved) have blocked our
> > performance measuring/tuning group since 3 months.
> >
> > - Secure batch dispatching is broken. This is blocking Brad Volkin's
> > command checker work since 3 months.
> >
> > All these issues are confirmed to only happen when full ppgtt is
> > enabled, falling back to aliasing ppgtt resolves them. But even
> > aliasing ppgtt itself still has a regression:
> >
> > - We currently unconditionally bind objects into the aliasing ppgtt,
> > which means all priviledged objects like ringbuffers are visible to
> > unpriviledged access again. On top of that this also breaks the
> > command checker for aliasing ppgtt, since it can't hide the
> > validated batch any more.
> >
> > Furthermore topic/full-ppgtt has never been reviewed:
> >
> > - Lifetime rules around vma unbinding/release are unclear, resulting
> > into this awesome hack called ppgtt_release. Which seems to take the
> > blame for most of the recursion fallout.
> >
> > - Context/ring init works different on gpu reset than anywhere else.
> > Such differeneces have in the past always lead to really hard to
> > track down bugs.
> >
> > - Aliasing ppgtt is treated in a bunch of places as a real address
> > space, but it isn't - the real address space is always the global
> > gtt in that case. This results in a bit a mess between contexts and
> > ppgtt object, further complication the context/ppgtt/vma lifetime
> > rules.
> >
> > - We don't have any docs describing the overall concepts introduced
> > with full ppgtt. A short, concise overview describing vmas and some
> > of the strange bits around them (like the unbound vmas used by
> > execbuf, or the new binding rules) really is needed.
> >
> > Note that a lot of the post topic/full-ppgtt merge fallout has already
> > been addressed, this entire list here of 10 issues really only contains
> > the still outstanding issues.
> >
> > Finally the 3.15 merge window is approaching and I think we need to
> > use the remaining time to ensure that our fallback option of using
> > aliasing ppgtt is in solid shape. Hence I think it's time to throw the
> > switch. While at it demote the helper from static inline status
> > because really.
> >
> > Cc: Ben Widawsky <ben@bwidawsk.net>
> > Cc: Dave Airlie <airlied@gmail.com>
> > Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>
> [snip]
>
> I want a concise list in the commit message so it's obvious as we fix
> things if we've achieved the goal or not. If you want to have nice prose
> describing the reason and/or your feelings, that's fine, but please put
> it after the concise list.
>
> I'll start what I want, and please fill in as needed. I believe this is
> all 10 you mentioned.
> * Fence handling broken: BUG #
We have patches from me, and Paulo is signed up to do the review+igt
testcase on our review board.
> * IOMMU Broken: BUG #
No bug report thus far. I can create one if people want, but that's more
work than firing up my damn ivb, enabling dmar again and fixing it. Imo
the short description above should be good enough to understand the bug.
> * "Reset issue": Bug #
https://bugs.freedesktop.org/show_bug.cgi?id=74100
Mika is working on this afaik.
> * Secure dispatch: Failing testcase:
There's a Jira with full details: VIZ-3490
> * Bug: https://bugs.freedesktop.org/show_bug.cgi?id=73383
There's also been reports from Ville and iirc Chris also had pending
issues, at least compared to dinq. Note that plain igt doesn't seem to be
sufficient to provoke all these cases :(
> * Documentation
The above description is imo sufficient as a task description. There's
also jira for this with more details: VIZ-3468
> Then there is fuzzy stuff that you "want" which need more clarification
> on exactly what will satisfy you.
> * Lifetime rules: No clear requirement from you.
Whomever tracks down the recursion bugs will likely have to sort his out.
Essentially I want ppgtt_release gone, that entire function is just a
giant hack.
> * Context/ring init differences: What do you want?
http://lists.freedesktop.org/archives/intel-gfx/2013-November/036482.html
I really want this to be address before we start wreaking more havoc in
this area, i.e. the patch series currently pending internally.
> * Aliasing PPGTT real address treatment: What do you want?
i915_gem_context_free in i195_gem_context.c has a comment:
/* We refcount even the aliasing PPGTT to keep the code symmetric */
That comment is a lie and _not_ refcounting the aliasing ppgtt would allow
us to ditch a bunch of hard-to-understand (at least for me) cornercase in
the lifetime rules. At least that was the case when I've reviewed the
ppgtt branch 3 months ago. Given that we have issues around the lifetimes
of these suckers it can't help to have as much clarity as possible.
One really important thing you've dropped is fixing up the ppgtt binding
rules for aliasing ppgtt. That's a regression compared to 3.14 and will
render the cmd parser void with aliasing ppgtt. We need this asap both in
time for the 3.15 merge and to finally unblock Brad Volkin's command
parser.
I've signed myself up for this and started working on it.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/i915: Disable full ppgtt by default
2014-03-06 20:30 ` Daniel Vetter
@ 2014-03-06 21:42 ` Daniel Vetter
2014-03-06 21:55 ` Ben Widawsky
2014-03-18 10:19 ` Daniel Vetter
2 siblings, 0 replies; 7+ messages in thread
From: Daniel Vetter @ 2014-03-06 21:42 UTC (permalink / raw)
To: Ben Widawsky; +Cc: Daniel Vetter, Intel Graphics Development
On Thu, Mar 6, 2014 at 9:30 PM, Daniel Vetter <daniel@ffwll.ch> wrote:
>> * Aliasing PPGTT real address treatment: What do you want?
>
> i915_gem_context_free in i195_gem_context.c has a comment:
>
> /* We refcount even the aliasing PPGTT to keep the code symmetric */
>
> That comment is a lie and _not_ refcounting the aliasing ppgtt would allow
> us to ditch a bunch of hard-to-understand (at least for me) cornercase in
> the lifetime rules. At least that was the case when I've reviewed the
> ppgtt branch 3 months ago. Given that we have issues around the lifetimes
> of these suckers it can't help to have as much clarity as possible.
Let me elaborate, since I really think this isn't a bikeshed:
If you look at the aliasing ppgtt from a logical perspective, it's not
a full ppgtt at all:
- The lifetime is fully linked to the global gtt. Refcounting it makes
zero sense.
- It doesn't have an independent drm_mm.
- Besides at driver load or after reset it never gets changed in the
hw, it's essentially a constant piece.
Imo the correct way to think about global gtt + aliasing ppgtt is as
one address space, with one set of ptes. Normal hw engineers would the
behaviour with 2 simple bits:
- one bit controls whether priviledged access is possible
- one bit controls whether unpriviledged access from userspace batch
buffers is possible.
The only crazy thing here is that we don't have two bits in the pte,
but instead two completely separate pagetable hirarchies. But
logically they work as if there's just one pagetable with 2 special
bits.
Hence why we also need the vma_bind/unbind vfuncs, they hide the
logical pagetable used by the core code from the actual reality of
things.
When I've reviewed the full ppgtt branch I've noticed 3 areas where
this was all mixed up:
- secure batch pinning - the current code now works for aliasing ppgtt
but is now broken on full ppgtt. But the discussions we've had last
autumn was all about this fine disdinction.
- All the confusion with the vma_bind flags, partially fixed with
Chris' binding rework, the remaining code.
- Finally the lifetime confusion I've pointed out above.
Since these are all variants on the underlying theme it's imo a real
design bug in the code and not just bikesheds, so I want it fixed. And
I prefer that you fix it up, to make sure we really understand this
all.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/i915: Disable full ppgtt by default
2014-03-06 20:30 ` Daniel Vetter
2014-03-06 21:42 ` Daniel Vetter
@ 2014-03-06 21:55 ` Ben Widawsky
2014-03-18 10:19 ` Daniel Vetter
2 siblings, 0 replies; 7+ messages in thread
From: Ben Widawsky @ 2014-03-06 21:55 UTC (permalink / raw)
To: Daniel Vetter; +Cc: Daniel Vetter, Intel Graphics Development
On Thu, Mar 06, 2014 at 09:30:01PM +0100, Daniel Vetter wrote:
> On Thu, Mar 06, 2014 at 10:17:12AM -0800, Ben Widawsky wrote:
> > On Thu, Mar 06, 2014 at 12:14:21PM +0100, Daniel Vetter wrote:
> > > There are too many oustanding issues:
> > >
> > > - Fence handling in the current code is broken. There's a patch series
> > > from me, but it's blocked on and extended review (which includes
> > > writing the testcases).
> > >
> > > - IOMMU mapping handling is broken, we need to properly refcount it -
> > > currently it gets destroyed when the first vma is unbound, so way
> > > too early.
> > >
> > > - There's a pending reset issue on snb. Since Mika's reset work and
> > > full ppgtt have been pulled in in separate branches and ended up
> > > intermittingly breaking each another it's unclear who's the exact
> > > culprit here.
> > >
> > > - We still have persistent evidince of crazy recursion bugs through
> > > vma_unbind and ppgtt_relase, e.g.
> > >
> > > https://bugs.freedesktop.org/show_bug.cgi?id=73383
> > >
> > > This issue (and a few others meanwhile resolved) have blocked our
> > > performance measuring/tuning group since 3 months.
> > >
> > > - Secure batch dispatching is broken. This is blocking Brad Volkin's
> > > command checker work since 3 months.
> > >
> > > All these issues are confirmed to only happen when full ppgtt is
> > > enabled, falling back to aliasing ppgtt resolves them. But even
> > > aliasing ppgtt itself still has a regression:
> > >
> > > - We currently unconditionally bind objects into the aliasing ppgtt,
> > > which means all priviledged objects like ringbuffers are visible to
> > > unpriviledged access again. On top of that this also breaks the
> > > command checker for aliasing ppgtt, since it can't hide the
> > > validated batch any more.
> > >
> > > Furthermore topic/full-ppgtt has never been reviewed:
> > >
> > > - Lifetime rules around vma unbinding/release are unclear, resulting
> > > into this awesome hack called ppgtt_release. Which seems to take the
> > > blame for most of the recursion fallout.
> > >
> > > - Context/ring init works different on gpu reset than anywhere else.
> > > Such differeneces have in the past always lead to really hard to
> > > track down bugs.
> > >
> > > - Aliasing ppgtt is treated in a bunch of places as a real address
> > > space, but it isn't - the real address space is always the global
> > > gtt in that case. This results in a bit a mess between contexts and
> > > ppgtt object, further complication the context/ppgtt/vma lifetime
> > > rules.
> > >
> > > - We don't have any docs describing the overall concepts introduced
> > > with full ppgtt. A short, concise overview describing vmas and some
> > > of the strange bits around them (like the unbound vmas used by
> > > execbuf, or the new binding rules) really is needed.
> > >
> > > Note that a lot of the post topic/full-ppgtt merge fallout has already
> > > been addressed, this entire list here of 10 issues really only contains
> > > the still outstanding issues.
> > >
> > > Finally the 3.15 merge window is approaching and I think we need to
> > > use the remaining time to ensure that our fallback option of using
> > > aliasing ppgtt is in solid shape. Hence I think it's time to throw the
> > > switch. While at it demote the helper from static inline status
> > > because really.
> > >
> > > Cc: Ben Widawsky <ben@bwidawsk.net>
> > > Cc: Dave Airlie <airlied@gmail.com>
> > > Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> >
> > [snip]
> >
> > I want a concise list in the commit message so it's obvious as we fix
> > things if we've achieved the goal or not. If you want to have nice prose
> > describing the reason and/or your feelings, that's fine, but please put
> > it after the concise list.
> >
> > I'll start what I want, and please fill in as needed. I believe this is
> > all 10 you mentioned.
> > * Fence handling broken: BUG #
>
> We have patches from me, and Paulo is signed up to do the review+igt
> testcase on our review board.
>
> > * IOMMU Broken: BUG #
>
> No bug report thus far. I can create one if people want, but that's more
> work than firing up my damn ivb, enabling dmar again and fixing it. Imo
> the short description above should be good enough to understand the bug.
>
> > * "Reset issue": Bug #
>
> https://bugs.freedesktop.org/show_bug.cgi?id=74100
>
> Mika is working on this afaik.
>
> > * Secure dispatch: Failing testcase:
>
> There's a Jira with full details: VIZ-3490
>
> > * Bug: https://bugs.freedesktop.org/show_bug.cgi?id=73383
>
> There's also been reports from Ville and iirc Chris also had pending
> issues, at least compared to dinq. Note that plain igt doesn't seem to be
> sufficient to provoke all these cases :(
>
> > * Documentation
>
> The above description is imo sufficient as a task description. There's
> also jira for this with more details: VIZ-3468
>
> > Then there is fuzzy stuff that you "want" which need more clarification
> > on exactly what will satisfy you.
> > * Lifetime rules: No clear requirement from you.
>
> Whomever tracks down the recursion bugs will likely have to sort his out.
> Essentially I want ppgtt_release gone, that entire function is just a
> giant hack.
I still don't see a concrete task here. Using the word, "essentially" is
not helpful for describing what your requirements are.
>
> > * Context/ring init differences: What do you want?
>
> http://lists.freedesktop.org/archives/intel-gfx/2013-November/036482.html
>
> I really want this to be address before we start wreaking more havoc in
> this area, i.e. the patch series currently pending internally.
>
I never agreed with you on this topic. I will try to find time to look
at internal patches that potentially fix it. But again, since I haven't
seen any real issue to fix, and I have had a very large deficit of time
, it would probably be great if someone else can help.
> > * Aliasing PPGTT real address treatment: What do you want?
>
> i915_gem_context_free in i195_gem_context.c has a comment:
>
> /* We refcount even the aliasing PPGTT to keep the code symmetric */
>
> That comment is a lie and _not_ refcounting the aliasing ppgtt would allow
> us to ditch a bunch of hard-to-understand (at least for me) cornercase in
> the lifetime rules. At least that was the case when I've reviewed the
> ppgtt branch 3 months ago. Given that we have issues around the lifetimes
> of these suckers it can't help to have as much clarity as possible.
>
> One really important thing you've dropped is fixing up the ppgtt binding
> rules for aliasing ppgtt. That's a regression compared to 3.14 and will
> render the cmd parser void with aliasing ppgtt. We need this asap both in
> time for the 3.15 merge and to finally unblock Brad Volkin's command
> parser.
>
> I've signed myself up for this and started working on it.
Well, that's great! It's been so long since I wrote any of the code, and
a good deal has changed since then. It seemed like a good idea at the
time.
> -Daniel
I think you've misunderstood what I was requesting. I just want a
concise list in the commit (preferably at the top) with the issues. My
last email had *most* of what I wanted, sans any existing BUG # or JIRA
tasks you might have excluded. If we know people are working on it,
putting that information would be equally helpful. This will make it
really easy to review the list at a given time and determine when we're
done. If you prefer to put it in a bugzilla and link it at the top of a
commit, I am fine with that as well. In the past, it has been very lossy
when I need to read through long emails to decipher your requests.
That's not blame or criticism, it's likely my fault. Having it in the
commit message allows easy references from future commits, ie. This
addresses issue #3 from commit FOO.
It also probably does make sense to convert any internal JIRA tasks to
FDO tasks, since these are now all public, and your publicly issuing
reasons for disabling the feature that people on the mailing list may
not have access to... However, I agree it's not worth the time to do
this.
--
Ben Widawsky, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/i915: Disable full ppgtt by default
2014-03-06 20:30 ` Daniel Vetter
2014-03-06 21:42 ` Daniel Vetter
2014-03-06 21:55 ` Ben Widawsky
@ 2014-03-18 10:19 ` Daniel Vetter
2 siblings, 0 replies; 7+ messages in thread
From: Daniel Vetter @ 2014-03-18 10:19 UTC (permalink / raw)
To: Ben Widawsky; +Cc: Daniel Vetter, Intel Graphics Development
On Thu, Mar 06, 2014 at 09:30:01PM +0100, Daniel Vetter wrote:
> On Thu, Mar 06, 2014 at 10:17:12AM -0800, Ben Widawsky wrote:
> > On Thu, Mar 06, 2014 at 12:14:21PM +0100, Daniel Vetter wrote:
> > > There are too many oustanding issues:
> > >
> > > - Fence handling in the current code is broken. There's a patch series
> > > from me, but it's blocked on and extended review (which includes
> > > writing the testcases).
> > >
> > > - IOMMU mapping handling is broken, we need to properly refcount it -
> > > currently it gets destroyed when the first vma is unbound, so way
> > > too early.
> > >
> > > - There's a pending reset issue on snb. Since Mika's reset work and
> > > full ppgtt have been pulled in in separate branches and ended up
> > > intermittingly breaking each another it's unclear who's the exact
> > > culprit here.
> > >
> > > - We still have persistent evidince of crazy recursion bugs through
> > > vma_unbind and ppgtt_relase, e.g.
> > >
> > > https://bugs.freedesktop.org/show_bug.cgi?id=73383
> > >
> > > This issue (and a few others meanwhile resolved) have blocked our
> > > performance measuring/tuning group since 3 months.
> > >
> > > - Secure batch dispatching is broken. This is blocking Brad Volkin's
> > > command checker work since 3 months.
> > >
> > > All these issues are confirmed to only happen when full ppgtt is
> > > enabled, falling back to aliasing ppgtt resolves them. But even
> > > aliasing ppgtt itself still has a regression:
> > >
> > > - We currently unconditionally bind objects into the aliasing ppgtt,
> > > which means all priviledged objects like ringbuffers are visible to
> > > unpriviledged access again. On top of that this also breaks the
> > > command checker for aliasing ppgtt, since it can't hide the
> > > validated batch any more.
> > >
> > > Furthermore topic/full-ppgtt has never been reviewed:
> > >
> > > - Lifetime rules around vma unbinding/release are unclear, resulting
> > > into this awesome hack called ppgtt_release. Which seems to take the
> > > blame for most of the recursion fallout.
> > >
> > > - Context/ring init works different on gpu reset than anywhere else.
> > > Such differeneces have in the past always lead to really hard to
> > > track down bugs.
> > >
> > > - Aliasing ppgtt is treated in a bunch of places as a real address
> > > space, but it isn't - the real address space is always the global
> > > gtt in that case. This results in a bit a mess between contexts and
> > > ppgtt object, further complication the context/ppgtt/vma lifetime
> > > rules.
> > >
> > > - We don't have any docs describing the overall concepts introduced
> > > with full ppgtt. A short, concise overview describing vmas and some
> > > of the strange bits around them (like the unbound vmas used by
> > > execbuf, or the new binding rules) really is needed.
> > >
> > > Note that a lot of the post topic/full-ppgtt merge fallout has already
> > > been addressed, this entire list here of 10 issues really only contains
> > > the still outstanding issues.
> > >
> > > Finally the 3.15 merge window is approaching and I think we need to
> > > use the remaining time to ensure that our fallback option of using
> > > aliasing ppgtt is in solid shape. Hence I think it's time to throw the
> > > switch. While at it demote the helper from static inline status
> > > because really.
> > >
> > > Cc: Ben Widawsky <ben@bwidawsk.net>
> > > Cc: Dave Airlie <airlied@gmail.com>
> > > Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> >
> > [snip]
> >
> > I want a concise list in the commit message so it's obvious as we fix
> > things if we've achieved the goal or not. If you want to have nice prose
> > describing the reason and/or your feelings, that's fine, but please put
> > it after the concise list.
> >
> > I'll start what I want, and please fill in as needed. I believe this is
> > all 10 you mentioned.
> > * Fence handling broken: BUG #
>
> We have patches from me, and Paulo is signed up to do the review+igt
> testcase on our review board.
>
> > * IOMMU Broken: BUG #
>
> No bug report thus far. I can create one if people want, but that's more
> work than firing up my damn ivb, enabling dmar again and fixing it. Imo
> the short description above should be good enough to understand the bug.
>
> > * "Reset issue": Bug #
>
> https://bugs.freedesktop.org/show_bug.cgi?id=74100
>
> Mika is working on this afaik.
>
> > * Secure dispatch: Failing testcase:
>
> There's a Jira with full details: VIZ-3490
>
> > * Bug: https://bugs.freedesktop.org/show_bug.cgi?id=73383
>
> There's also been reports from Ville and iirc Chris also had pending
> issues, at least compared to dinq. Note that plain igt doesn't seem to be
> sufficient to provoke all these cases :(
>
> > * Documentation
>
> The above description is imo sufficient as a task description. There's
> also jira for this with more details: VIZ-3468
>
> > Then there is fuzzy stuff that you "want" which need more clarification
> > on exactly what will satisfy you.
> > * Lifetime rules: No clear requirement from you.
>
> Whomever tracks down the recursion bugs will likely have to sort his out.
> Essentially I want ppgtt_release gone, that entire function is just a
> giant hack.
>
> > * Context/ring init differences: What do you want?
>
> http://lists.freedesktop.org/archives/intel-gfx/2013-November/036482.html
>
> I really want this to be address before we start wreaking more havoc in
> this area, i.e. the patch series currently pending internally.
>
> > * Aliasing PPGTT real address treatment: What do you want?
>
> i915_gem_context_free in i195_gem_context.c has a comment:
>
> /* We refcount even the aliasing PPGTT to keep the code symmetric */
>
> That comment is a lie and _not_ refcounting the aliasing ppgtt would allow
> us to ditch a bunch of hard-to-understand (at least for me) cornercase in
> the lifetime rules. At least that was the case when I've reviewed the
> ppgtt branch 3 months ago. Given that we have issues around the lifetimes
> of these suckers it can't help to have as much clarity as possible.
>
> One really important thing you've dropped is fixing up the ppgtt binding
> rules for aliasing ppgtt. That's a regression compared to 3.14 and will
> render the cmd parser void with aliasing ppgtt. We need this asap both in
> time for the 3.15 merge and to finally unblock Brad Volkin's command
> parser.
>
> I've signed myself up for this and started working on it.
One more taks is required here which seems to have completely fallen
through the cracks:
- Rebase the drm_mm top-to-bottom allocator patches onto latest drm-next
and resubmit the i915 parts. I didn't merge this back in Dec due to
conflicts with other ongoing drm_mm work, but that has all settled now.
Cheers, Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-03-18 10:19 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-06 11:14 [PATCH] drm/i915: Disable full ppgtt by default Daniel Vetter
2014-03-06 18:01 ` Jesse Barnes
2014-03-06 18:17 ` Ben Widawsky
2014-03-06 20:30 ` Daniel Vetter
2014-03-06 21:42 ` Daniel Vetter
2014-03-06 21:55 ` Ben Widawsky
2014-03-18 10:19 ` Daniel Vetter
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox