* [PATCH 00/12] [RFC] PPGTT prep patches part 1
@ 2013-04-24 6:15 Ben Widawsky
2013-04-24 6:15 ` [PATCH 01/12] drm/i915: Assert mutex_is_locked on context lookup Ben Widawsky
` (13 more replies)
0 siblings, 14 replies; 44+ messages in thread
From: Ben Widawsky @ 2013-04-24 6:15 UTC (permalink / raw)
To: Intel GFX; +Cc: Ben Widawsky
First, I have not finished implementing PPGTT. These are all the patches
I have which I can dump now and not have serious regression risk (and I
think most of them are tolerable on their own). The point is to get
review, make my plans explicit, and get these patches merged.
The main accomplishment of this series is to provide more abstract
management of the PPGTT for when we have more than just the aliasing
PPGTT. That abstract way in which we allocate space for PPGTT PDEs (and
the use of the topdown allocator for that matter) seem entirely
unnecessary because of course nothing has functionally changed except
the ppgtt is not kzalloc'd, and that PDEs now serve as the guard page
now for gen6+ (y'all own me 4k).
I've seriously reworked Chris' top down allocator patch, so I'd really
like at least him to take a look at that one and make sure he still
agrees.
Since the earliest discussions of contexts in early 2011, we've
discussed the association between a context and the PPGTT. I've always
felt it made sense to have a very strong association between a context
and its address space. Knowing a bit about the future, I feel doubly
confident in saying so. Aside from the reworks in this series which
allows PPGTT PDE allocation to come from the drm_mm allocator, the rest
irrevocably ties the PPGTT to a context, and one cannot exist without
the other. Future patcher in the later series' will make things fail a
lot harder if one exists without the other.
Even those these patches don't yet do the following, I should explain
the next steps or else I feel reviewers' comments to the above will all
be, "why?"
1. Create a new context/ppgtt for every file_priv.
2. Instruct execbuffer/reserve code to use that file_priv context
3. Switch PP_DIR_BASE on context switch.
This will likely be where I post part 2 of the prep patches, and I do
not expect to merge those before the rest of the series is complete. At
that point flink, dma_buf, and pin interfaces won't work. It should
however prove a useful way to test the HW pretty well. It also shows how
I intend to have a per file context/address space regardless of whether
or not userspace opts in.
4. Track objects across address spaces
This should enable flink and dma_buf to work. I suspect a lot of
bikeshedding will take place on this patch(es) so I want to make it
distinct from the HW testing.
5. Figure out what do about killing relocations.
6. PPGTT page tables allocated on demand
For the curious reader, I've implemented 1, 2, and 4 several times over the
past couple of weeks. I was using MI_UPDATE_GTT for step 3 originally, and that
was tested as well, but I haven't yet done the PP_DIR_BASE switch since
implementing the drm_mm for PDEs.
I haven't run performance numbers on this series specifically, but I did
run performance numbers on the slightly more invasive patches which come
after (through step 1 above)
x11perf -aa10text
before: 1925000/s
after: 1941667/s (clearly not a good test)
xonotic
before: 140.6242469 fps
after: 138.5189648 fps
Ben Widawsky (11):
drm/i915: Assert mutex_is_locked on context lookup
drm/i915: BUG_ON bad PPGTT offset
drm/i915: make PDE|PTE platform specific
drm/i915: Extract PDE writes
drm/i915: Use drm_mm for PPGTT PDEs
drm/i915: Use PDEs as the guard page
drm/i915: Update context_fini
drm/i915: Split context enabling from init
drm/i915: destroy i915_gem_init_global_gtt
drm/i915: Embed PPGTT into the context
drm/i915: No contexts without ppgtt
Chris Wilson (1):
drm: Optionally create mm blocks from top-to-bottom
drivers/gpu/drm/drm_mm.c | 94 +++++++++++---------
drivers/gpu/drm/i915/i915_dma.c | 2 +-
drivers/gpu/drm/i915/i915_drv.h | 15 ++--
drivers/gpu/drm/i915/i915_gem.c | 49 +++++++++--
drivers/gpu/drm/i915/i915_gem_context.c | 33 ++++---
drivers/gpu/drm/i915/i915_gem_gtt.c | 147 ++++++++++++++++----------------
include/drm/drm_mm.h | 78 +++++++++++------
7 files changed, 251 insertions(+), 167 deletions(-)
--
1.8.2.1
^ permalink raw reply [flat|nested] 44+ messages in thread* [PATCH 01/12] drm/i915: Assert mutex_is_locked on context lookup 2013-04-24 6:15 [PATCH 00/12] [RFC] PPGTT prep patches part 1 Ben Widawsky @ 2013-04-24 6:15 ` Ben Widawsky 2013-05-02 20:27 ` Jesse Barnes 2013-04-24 6:15 ` [PATCH 02/12] drm/i915: BUG_ON bad PPGTT offset Ben Widawsky ` (12 subsequent siblings) 13 siblings, 1 reply; 44+ messages in thread From: Ben Widawsky @ 2013-04-24 6:15 UTC (permalink / raw) To: Intel GFX; +Cc: Ben Widawsky Because our context refcounting doesn't grab a ref at lookup time, it is unsafe to do so without the lock. NOTE: We don't have an easy way to put the assertion in the lookup function which is where this really belongs. Context switching is good enough because it actually asserts even more correctness by protecting the default_context. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> --- drivers/gpu/drm/i915/i915_gem_context.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index a1e8ecb..411ace0 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -444,6 +444,8 @@ int i915_switch_context(struct intel_ring_buffer *ring, if (dev_priv->hw_contexts_disabled) return 0; + BUG_ON(!mutex_is_locked(&dev_priv->dev->struct_mutex)); + if (ring != &dev_priv->ring[RCS]) return 0; -- 1.8.2.1 ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 01/12] drm/i915: Assert mutex_is_locked on context lookup 2013-04-24 6:15 ` [PATCH 01/12] drm/i915: Assert mutex_is_locked on context lookup Ben Widawsky @ 2013-05-02 20:27 ` Jesse Barnes 2013-05-06 9:40 ` Daniel Vetter 0 siblings, 1 reply; 44+ messages in thread From: Jesse Barnes @ 2013-05-02 20:27 UTC (permalink / raw) To: Ben Widawsky; +Cc: Intel GFX On Tue, 23 Apr 2013 23:15:29 -0700 Ben Widawsky <ben@bwidawsk.net> wrote: > Because our context refcounting doesn't grab a ref at lookup time, it is > unsafe to do so without the lock. > > NOTE: We don't have an easy way to put the assertion in the lookup > function which is where this really belongs. Context switching is good > enough because it actually asserts even more correctness by protecting > the default_context. > > Signed-off-by: Ben Widawsky <ben@bwidawsk.net> > --- > drivers/gpu/drm/i915/i915_gem_context.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c > index a1e8ecb..411ace0 100644 > --- a/drivers/gpu/drm/i915/i915_gem_context.c > +++ b/drivers/gpu/drm/i915/i915_gem_context.c > @@ -444,6 +444,8 @@ int i915_switch_context(struct intel_ring_buffer *ring, > if (dev_priv->hw_contexts_disabled) > return 0; > > + BUG_ON(!mutex_is_locked(&dev_priv->dev->struct_mutex)); > + > if (ring != &dev_priv->ring[RCS]) > return 0; > Simple enough. Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> We usually do WARN_ONs for this stuff though, in case a user actually does hit it, it may not be fatal so why crash the machine? But that's a minor distinction since we shouldn't hit this except in development anyway. -- Jesse Barnes, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 01/12] drm/i915: Assert mutex_is_locked on context lookup 2013-05-02 20:27 ` Jesse Barnes @ 2013-05-06 9:40 ` Daniel Vetter 2013-05-06 9:44 ` Daniel Vetter 0 siblings, 1 reply; 44+ messages in thread From: Daniel Vetter @ 2013-05-06 9:40 UTC (permalink / raw) To: Jesse Barnes; +Cc: Ben Widawsky, Intel GFX On Thu, May 02, 2013 at 01:27:32PM -0700, Jesse Barnes wrote: > On Tue, 23 Apr 2013 23:15:29 -0700 > Ben Widawsky <ben@bwidawsk.net> wrote: > > > Because our context refcounting doesn't grab a ref at lookup time, it is > > unsafe to do so without the lock. > > > > NOTE: We don't have an easy way to put the assertion in the lookup > > function which is where this really belongs. Context switching is good > > enough because it actually asserts even more correctness by protecting > > the default_context. > > > > Signed-off-by: Ben Widawsky <ben@bwidawsk.net> > > --- > > drivers/gpu/drm/i915/i915_gem_context.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c > > index a1e8ecb..411ace0 100644 > > --- a/drivers/gpu/drm/i915/i915_gem_context.c > > +++ b/drivers/gpu/drm/i915/i915_gem_context.c > > @@ -444,6 +444,8 @@ int i915_switch_context(struct intel_ring_buffer *ring, > > if (dev_priv->hw_contexts_disabled) > > return 0; > > > > + BUG_ON(!mutex_is_locked(&dev_priv->dev->struct_mutex)); > > + > > if (ring != &dev_priv->ring[RCS]) > > return 0; > > > > Simple enough. > > Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> > > We usually do WARN_ONs for this stuff though, in case a user actually > does hit it, it may not be fatal so why crash the machine? > > But that's a minor distinction since we shouldn't hit this except in > development anyway. Well, since this is a patch for upstream the focus should very much be on supporting bug reporters and not developers. And for bug reporters a BUG is much more annoying than a WARN and greatly reduces the chances that we'll get a bug report. There are imo only very few cases where a BUG instead of a WARN is justified: - The kernel is _guaranteed_ to oops in the next few lines anyway, so a BUG_ON will help in readability of the backtrace. Note that checking 3 different things for non-NULL in the same BUG actually reduces OOPS readability (with an oops you can at least reconstruct the faulting address and so probably the pointer). Also, this means the BUG should have a neat description of what exactly blew up. The "a few lines" part is just a guideline with some big exceptions. A prime example is refcount over/underflows since those will blow up, but only sometimes later (and usually no one will have a clue why). - BUGs are justified if there's a potential security hole awaiting, e.g. when something in the userspace input validation has gone wrong. - I'm wary of special error handling for WARNs, but if an early return (with an error code if possible) transforms a BUG into a WARN I'm in. But trying to fix up e.g. modeset state is usually futile, since we'll end up with a black/fuzzy/corrupted screen most likely anyway. But the most important rule is: In case of doubt, just WARN, don't BUG. [I know, I violate it sometimes, too.] Patch applied with the s/BUG/WARN bikeshed. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 01/12] drm/i915: Assert mutex_is_locked on context lookup 2013-05-06 9:40 ` Daniel Vetter @ 2013-05-06 9:44 ` Daniel Vetter 2013-05-06 17:59 ` Ben Widawsky 0 siblings, 1 reply; 44+ messages in thread From: Daniel Vetter @ 2013-05-06 9:44 UTC (permalink / raw) To: Jesse Barnes; +Cc: Ben Widawsky, Intel GFX On Mon, May 06, 2013 at 11:40:06AM +0200, Daniel Vetter wrote: > On Thu, May 02, 2013 at 01:27:32PM -0700, Jesse Barnes wrote: > > On Tue, 23 Apr 2013 23:15:29 -0700 > > Ben Widawsky <ben@bwidawsk.net> wrote: > > > > > Because our context refcounting doesn't grab a ref at lookup time, it is > > > unsafe to do so without the lock. > > > > > > NOTE: We don't have an easy way to put the assertion in the lookup > > > function which is where this really belongs. Context switching is good > > > enough because it actually asserts even more correctness by protecting > > > the default_context. > > > > > > Signed-off-by: Ben Widawsky <ben@bwidawsk.net> > > > --- > > > drivers/gpu/drm/i915/i915_gem_context.c | 2 ++ > > > 1 file changed, 2 insertions(+) > > > > > > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c > > > index a1e8ecb..411ace0 100644 > > > --- a/drivers/gpu/drm/i915/i915_gem_context.c > > > +++ b/drivers/gpu/drm/i915/i915_gem_context.c > > > @@ -444,6 +444,8 @@ int i915_switch_context(struct intel_ring_buffer *ring, > > > if (dev_priv->hw_contexts_disabled) > > > return 0; > > > > > > + BUG_ON(!mutex_is_locked(&dev_priv->dev->struct_mutex)); > > > + > > > if (ring != &dev_priv->ring[RCS]) > > > return 0; > > > > > > > Simple enough. > > > > Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> > > > > We usually do WARN_ONs for this stuff though, in case a user actually > > does hit it, it may not be fatal so why crash the machine? > > > > But that's a minor distinction since we shouldn't hit this except in > > development anyway. > > Well, since this is a patch for upstream the focus should very much be on > supporting bug reporters and not developers. And for bug reporters a BUG > is much more annoying than a WARN and greatly reduces the chances that > we'll get a bug report. Some more details why a WARN massively beats a BUG for us: BUG kills the current process and ensures all locks are stuck. Usually that means X is dead and you can't vt-switch away to the console to take a quick look at dmesg. Now even when all rendering is down the toilet due to the follow-up damage after the WARN and the gpu a zombie, there's a non-zero chance that vt-switch (or sw rendering in X) will work long enough to grab log files and debugfs data. Hence the first rule to only use a BUG on if we have a guaranteed OOPS otherwise (which again will kill the process and make all locks stuck). -Daniel > > There are imo only very few cases where a BUG instead of a WARN is > justified: > - The kernel is _guaranteed_ to oops in the next few lines anyway, so a > BUG_ON will help in readability of the backtrace. Note that checking 3 > different things for non-NULL in the same BUG actually reduces OOPS > readability (with an oops you can at least reconstruct the faulting > address and so probably the pointer). Also, this means the BUG should > have a neat description of what exactly blew up. > > The "a few lines" part is just a guideline with some big exceptions. A > prime example is refcount over/underflows since those will blow up, but > only sometimes later (and usually no one will have a clue why). > > - BUGs are justified if there's a potential security hole awaiting, e.g. > when something in the userspace input validation has gone wrong. > > - I'm wary of special error handling for WARNs, but if an early return > (with an error code if possible) transforms a BUG into a WARN I'm in. > But trying to fix up e.g. modeset state is usually futile, since we'll > end up with a black/fuzzy/corrupted screen most likely anyway. > > But the most important rule is: In case of doubt, just WARN, don't BUG. > > [I know, I violate it sometimes, too.] > > Patch applied with the s/BUG/WARN bikeshed. > -Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 01/12] drm/i915: Assert mutex_is_locked on context lookup 2013-05-06 9:44 ` Daniel Vetter @ 2013-05-06 17:59 ` Ben Widawsky 2013-05-06 18:35 ` Daniel Vetter 0 siblings, 1 reply; 44+ messages in thread From: Ben Widawsky @ 2013-05-06 17:59 UTC (permalink / raw) To: Daniel Vetter; +Cc: Intel GFX On Mon, May 06, 2013 at 11:44:22AM +0200, Daniel Vetter wrote: > On Mon, May 06, 2013 at 11:40:06AM +0200, Daniel Vetter wrote: > > On Thu, May 02, 2013 at 01:27:32PM -0700, Jesse Barnes wrote: > > > On Tue, 23 Apr 2013 23:15:29 -0700 > > > Ben Widawsky <ben@bwidawsk.net> wrote: > > > > > > > Because our context refcounting doesn't grab a ref at lookup time, it is > > > > unsafe to do so without the lock. > > > > > > > > NOTE: We don't have an easy way to put the assertion in the lookup > > > > function which is where this really belongs. Context switching is good > > > > enough because it actually asserts even more correctness by protecting > > > > the default_context. > > > > > > > > Signed-off-by: Ben Widawsky <ben@bwidawsk.net> > > > > --- > > > > drivers/gpu/drm/i915/i915_gem_context.c | 2 ++ > > > > 1 file changed, 2 insertions(+) > > > > > > > > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c > > > > index a1e8ecb..411ace0 100644 > > > > --- a/drivers/gpu/drm/i915/i915_gem_context.c > > > > +++ b/drivers/gpu/drm/i915/i915_gem_context.c > > > > @@ -444,6 +444,8 @@ int i915_switch_context(struct intel_ring_buffer *ring, > > > > if (dev_priv->hw_contexts_disabled) > > > > return 0; > > > > > > > > + BUG_ON(!mutex_is_locked(&dev_priv->dev->struct_mutex)); > > > > + > > > > if (ring != &dev_priv->ring[RCS]) > > > > return 0; > > > > > > > > > > Simple enough. > > > > > > Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> > > > > > > We usually do WARN_ONs for this stuff though, in case a user actually > > > does hit it, it may not be fatal so why crash the machine? > > > > > > But that's a minor distinction since we shouldn't hit this except in > > > development anyway. > > > > Well, since this is a patch for upstream the focus should very much be on > > supporting bug reporters and not developers. And for bug reporters a BUG > > is much more annoying than a WARN and greatly reduces the chances that > > we'll get a bug report. > > Some more details why a WARN massively beats a BUG for us: BUG kills the > current process and ensures all locks are stuck. Usually that means X is > dead and you can't vt-switch away to the console to take a quick look at > dmesg. > > Now even when all rendering is down the toilet due to the follow-up damage > after the WARN and the gpu a zombie, there's a non-zero chance that > vt-switch (or sw rendering in X) will work long enough to grab log files > and debugfs data. > > Hence the first rule to only use a BUG on if we have a guaranteed OOPS > otherwise (which again will kill the process and make all locks stuck). > -Daniel > > > > > There are imo only very few cases where a BUG instead of a WARN is > > justified: > > - The kernel is _guaranteed_ to oops in the next few lines anyway, so a > > BUG_ON will help in readability of the backtrace. Note that checking 3 > > different things for non-NULL in the same BUG actually reduces OOPS > > readability (with an oops you can at least reconstruct the faulting > > address and so probably the pointer). Also, this means the BUG should > > have a neat description of what exactly blew up. > > > > The "a few lines" part is just a guideline with some big exceptions. A > > prime example is refcount over/underflows since those will blow up, but > > only sometimes later (and usually no one will have a clue why). > > > > - BUGs are justified if there's a potential security hole awaiting, e.g. > > when something in the userspace input validation has gone wrong. > > > > - I'm wary of special error handling for WARNs, but if an early return > > (with an error code if possible) transforms a BUG into a WARN I'm in. > > But trying to fix up e.g. modeset state is usually futile, since we'll > > end up with a black/fuzzy/corrupted screen most likely anyway. > > > > But the most important rule is: In case of doubt, just WARN, don't BUG. > > > > [I know, I violate it sometimes, too.] > > > > Patch applied with the s/BUG/WARN bikeshed. > > -Daniel Thanks for applying the patch, it's certainly better than what we have currently. Why I wanted a BUG: When you get a ref to an object without holding a lock you get a potentially unsafe pointer (to which we will be writing). If the context object memory is freed, and we write to it, we have a potential to late scribble over <insert your file system of choice> memory. There is probably a similar security implication there as well. Many of us are used to, and capable of recovering from GPU hangs, but less of us like to deal with FS recovery. I actually believe all "get" code like this (backed with refcounts) should BUG and not WARN. -- Ben Widawsky, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 01/12] drm/i915: Assert mutex_is_locked on context lookup 2013-05-06 17:59 ` Ben Widawsky @ 2013-05-06 18:35 ` Daniel Vetter 0 siblings, 0 replies; 44+ messages in thread From: Daniel Vetter @ 2013-05-06 18:35 UTC (permalink / raw) To: Ben Widawsky; +Cc: Intel GFX On Mon, May 6, 2013 at 7:59 PM, Ben Widawsky <ben@bwidawsk.net> wrote: > > Why I wanted a BUG: When you get a ref to an object without holding a > lock you get a potentially unsafe pointer (to which we will be writing). > If the context object memory is freed, and we write to it, we have a > potential to late scribble over <insert your file system of choice> > memory. There is probably a similar security implication there as well. > Many of us are used to, and capable of recovering from GPU hangs, but > less of us like to deal with FS recovery. > > I actually believe all "get" code like this (backed with refcounts) > should BUG and not WARN. Historically we've screwed up locking in the driver load/teardown, suspend/resume and panic paths. Blowing up with a BUG_ON in there tends to be pretty bad for debugging. And there's pretty much no chance of a hostile party being able to exploit a race. So _especially_ for locking checks imo only WARN is acceptable. Otoh if such a race is indeed expoitable from userspace and it escaped into a released kernel that means we have a pretty gapping hole in our test coverage. Fixing that sounds more fruitful to me than making bug reporter's life harder. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 02/12] drm/i915: BUG_ON bad PPGTT offset 2013-04-24 6:15 [PATCH 00/12] [RFC] PPGTT prep patches part 1 Ben Widawsky 2013-04-24 6:15 ` [PATCH 01/12] drm/i915: Assert mutex_is_locked on context lookup Ben Widawsky @ 2013-04-24 6:15 ` Ben Widawsky 2013-05-02 20:28 ` Jesse Barnes 2013-04-24 6:15 ` [PATCH 03/12] drm/i915: make PDE|PTE platform specific Ben Widawsky ` (11 subsequent siblings) 13 siblings, 1 reply; 44+ messages in thread From: Ben Widawsky @ 2013-04-24 6:15 UTC (permalink / raw) To: Intel GFX; +Cc: Ben Widawsky Because PPGTT PDEs within the GTT are calculated in cachelines (HW guys consistency ftw) we do a divide which will wreak havoc if this is wrong, and I know that from experience). If/when we move to multiple PPGTTs this will have to become a WARN, and return an error. For now however it should always be considered fatal, and only a developer could hit it. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> --- drivers/gpu/drm/i915/i915_gem_gtt.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index 50df194..0503f09 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -85,6 +85,8 @@ static int gen6_ppgtt_enable(struct drm_device *dev) uint32_t pd_entry; int i; + BUG_ON(ppgtt->pd_offset & 0x3f); + pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm + ppgtt->pd_offset / sizeof(gen6_gtt_pte_t); for (i = 0; i < ppgtt->num_pd_entries; i++) { -- 1.8.2.1 ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 02/12] drm/i915: BUG_ON bad PPGTT offset 2013-04-24 6:15 ` [PATCH 02/12] drm/i915: BUG_ON bad PPGTT offset Ben Widawsky @ 2013-05-02 20:28 ` Jesse Barnes 2013-05-06 9:48 ` Daniel Vetter 0 siblings, 1 reply; 44+ messages in thread From: Jesse Barnes @ 2013-05-02 20:28 UTC (permalink / raw) To: Ben Widawsky; +Cc: Intel GFX On Tue, 23 Apr 2013 23:15:30 -0700 Ben Widawsky <ben@bwidawsk.net> wrote: > Because PPGTT PDEs within the GTT are calculated in cachelines > (HW guys consistency ftw) we do a divide which will wreak havoc if this > is wrong, and I know that from experience). > > If/when we move to multiple PPGTTs this will have to become a WARN, and > return an error. For now however it should always be considered fatal, > and only a developer could hit it. > > Signed-off-by: Ben Widawsky <ben@bwidawsk.net> > --- > drivers/gpu/drm/i915/i915_gem_gtt.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c > index 50df194..0503f09 100644 > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c > @@ -85,6 +85,8 @@ static int gen6_ppgtt_enable(struct drm_device *dev) > uint32_t pd_entry; > int i; > > + BUG_ON(ppgtt->pd_offset & 0x3f); > + > pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm + > ppgtt->pd_offset / sizeof(gen6_gtt_pte_t); > for (i = 0; i < ppgtt->num_pd_entries; i++) { Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> -- Jesse Barnes, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 02/12] drm/i915: BUG_ON bad PPGTT offset 2013-05-02 20:28 ` Jesse Barnes @ 2013-05-06 9:48 ` Daniel Vetter 2013-05-06 18:03 ` Ben Widawsky 0 siblings, 1 reply; 44+ messages in thread From: Daniel Vetter @ 2013-05-06 9:48 UTC (permalink / raw) To: Jesse Barnes; +Cc: Ben Widawsky, Intel GFX On Thu, May 02, 2013 at 01:28:23PM -0700, Jesse Barnes wrote: > On Tue, 23 Apr 2013 23:15:30 -0700 > Ben Widawsky <ben@bwidawsk.net> wrote: > > > Because PPGTT PDEs within the GTT are calculated in cachelines > > (HW guys consistency ftw) we do a divide which will wreak havoc if this > > is wrong, and I know that from experience). > > > > If/when we move to multiple PPGTTs this will have to become a WARN, and > > return an error. For now however it should always be considered fatal, > > and only a developer could hit it. > > > > Signed-off-by: Ben Widawsky <ben@bwidawsk.net> > > --- > > drivers/gpu/drm/i915/i915_gem_gtt.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c > > index 50df194..0503f09 100644 > > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c > > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c > > @@ -85,6 +85,8 @@ static int gen6_ppgtt_enable(struct drm_device *dev) > > uint32_t pd_entry; > > int i; > > > > + BUG_ON(ppgtt->pd_offset & 0x3f); > > + > > pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm + > > ppgtt->pd_offset / sizeof(gen6_gtt_pte_t); > > for (i = 0; i < ppgtt->num_pd_entries; i++) { > > Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Queued for -next with the same s/BUG/WARN bikeshed, thanks for the patch. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 02/12] drm/i915: BUG_ON bad PPGTT offset 2013-05-06 9:48 ` Daniel Vetter @ 2013-05-06 18:03 ` Ben Widawsky 2013-05-06 18:37 ` Daniel Vetter 0 siblings, 1 reply; 44+ messages in thread From: Ben Widawsky @ 2013-05-06 18:03 UTC (permalink / raw) To: Daniel Vetter; +Cc: Intel GFX On Mon, May 06, 2013 at 11:48:05AM +0200, Daniel Vetter wrote: > On Thu, May 02, 2013 at 01:28:23PM -0700, Jesse Barnes wrote: > > On Tue, 23 Apr 2013 23:15:30 -0700 > > Ben Widawsky <ben@bwidawsk.net> wrote: > > > > > Because PPGTT PDEs within the GTT are calculated in cachelines > > > (HW guys consistency ftw) we do a divide which will wreak havoc if this > > > is wrong, and I know that from experience). > > > > > > If/when we move to multiple PPGTTs this will have to become a WARN, and > > > return an error. For now however it should always be considered fatal, > > > and only a developer could hit it. > > > > > > Signed-off-by: Ben Widawsky <ben@bwidawsk.net> > > > --- > > > drivers/gpu/drm/i915/i915_gem_gtt.c | 2 ++ > > > 1 file changed, 2 insertions(+) > > > > > > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c > > > index 50df194..0503f09 100644 > > > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c > > > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c > > > @@ -85,6 +85,8 @@ static int gen6_ppgtt_enable(struct drm_device *dev) > > > uint32_t pd_entry; > > > int i; > > > > > > + BUG_ON(ppgtt->pd_offset & 0x3f); > > > + > > > pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm + > > > ppgtt->pd_offset / sizeof(gen6_gtt_pte_t); > > > for (i = 0; i < ppgtt->num_pd_entries; i++) { > > > > Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> > > Queued for -next with the same s/BUG/WARN bikeshed, thanks for the patch. > -Daniel If you're going to change to WARN, please fixup the patch to return early... if (WARN_ON(...)) return The GPU will end up hanging if we've messed this up... -- Ben Widawsky, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 02/12] drm/i915: BUG_ON bad PPGTT offset 2013-05-06 18:03 ` Ben Widawsky @ 2013-05-06 18:37 ` Daniel Vetter 2013-05-08 16:48 ` Ben Widawsky 0 siblings, 1 reply; 44+ messages in thread From: Daniel Vetter @ 2013-05-06 18:37 UTC (permalink / raw) To: Ben Widawsky; +Cc: Intel GFX On Mon, May 6, 2013 at 8:03 PM, Ben Widawsky <ben@bwidawsk.net> wrote: > On Mon, May 06, 2013 at 11:48:05AM +0200, Daniel Vetter wrote: >> On Thu, May 02, 2013 at 01:28:23PM -0700, Jesse Barnes wrote: >> > On Tue, 23 Apr 2013 23:15:30 -0700 >> > Ben Widawsky <ben@bwidawsk.net> wrote: >> > >> > > Because PPGTT PDEs within the GTT are calculated in cachelines >> > > (HW guys consistency ftw) we do a divide which will wreak havoc if this >> > > is wrong, and I know that from experience). >> > > >> > > If/when we move to multiple PPGTTs this will have to become a WARN, and >> > > return an error. For now however it should always be considered fatal, >> > > and only a developer could hit it. >> > > >> > > Signed-off-by: Ben Widawsky <ben@bwidawsk.net> >> > > --- >> > > drivers/gpu/drm/i915/i915_gem_gtt.c | 2 ++ >> > > 1 file changed, 2 insertions(+) >> > > >> > > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c >> > > index 50df194..0503f09 100644 >> > > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c >> > > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c >> > > @@ -85,6 +85,8 @@ static int gen6_ppgtt_enable(struct drm_device *dev) >> > > uint32_t pd_entry; >> > > int i; >> > > >> > > + BUG_ON(ppgtt->pd_offset & 0x3f); >> > > + >> > > pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm + >> > > ppgtt->pd_offset / sizeof(gen6_gtt_pte_t); >> > > for (i = 0; i < ppgtt->num_pd_entries; i++) { >> > >> > Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> >> >> Queued for -next with the same s/BUG/WARN bikeshed, thanks for the patch. >> -Daniel > > If you're going to change to WARN, please fixup the patch to return > early... if (WARN_ON(...)) return > > The GPU will end up hanging if we've messed this up... I don't care about the gpu, only that the logs hit the disc. And I don't see how an early return can safe anything in here, since something went horribly wrong already anyway. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 02/12] drm/i915: BUG_ON bad PPGTT offset 2013-05-06 18:37 ` Daniel Vetter @ 2013-05-08 16:48 ` Ben Widawsky 2013-05-08 17:55 ` Daniel Vetter 0 siblings, 1 reply; 44+ messages in thread From: Ben Widawsky @ 2013-05-08 16:48 UTC (permalink / raw) To: Daniel Vetter; +Cc: Intel GFX On Mon, May 06, 2013 at 08:37:37PM +0200, Daniel Vetter wrote: > On Mon, May 6, 2013 at 8:03 PM, Ben Widawsky <ben@bwidawsk.net> wrote: > > On Mon, May 06, 2013 at 11:48:05AM +0200, Daniel Vetter wrote: > >> On Thu, May 02, 2013 at 01:28:23PM -0700, Jesse Barnes wrote: > >> > On Tue, 23 Apr 2013 23:15:30 -0700 > >> > Ben Widawsky <ben@bwidawsk.net> wrote: > >> > > >> > > Because PPGTT PDEs within the GTT are calculated in cachelines > >> > > (HW guys consistency ftw) we do a divide which will wreak havoc if this > >> > > is wrong, and I know that from experience). > >> > > > >> > > If/when we move to multiple PPGTTs this will have to become a WARN, and > >> > > return an error. For now however it should always be considered fatal, > >> > > and only a developer could hit it. > >> > > > >> > > Signed-off-by: Ben Widawsky <ben@bwidawsk.net> > >> > > --- > >> > > drivers/gpu/drm/i915/i915_gem_gtt.c | 2 ++ > >> > > 1 file changed, 2 insertions(+) > >> > > > >> > > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c > >> > > index 50df194..0503f09 100644 > >> > > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c > >> > > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c > >> > > @@ -85,6 +85,8 @@ static int gen6_ppgtt_enable(struct drm_device *dev) > >> > > uint32_t pd_entry; > >> > > int i; > >> > > > >> > > + BUG_ON(ppgtt->pd_offset & 0x3f); > >> > > + > >> > > pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm + > >> > > ppgtt->pd_offset / sizeof(gen6_gtt_pte_t); > >> > > for (i = 0; i < ppgtt->num_pd_entries; i++) { > >> > > >> > Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> > >> > >> Queued for -next with the same s/BUG/WARN bikeshed, thanks for the patch. > >> -Daniel > > > > If you're going to change to WARN, please fixup the patch to return > > early... if (WARN_ON(...)) return > > > > The GPU will end up hanging if we've messed this up... > > I don't care about the gpu, only that the logs hit the disc. And I > don't see how an early return can safe anything in here, since > something went horribly wrong already anyway. > -Daniel Seriously? Feels a bit like this has become an argument for arguments sake, but: I'd much rather have an assertion fail than a gpu error state from same random point in time later. > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch -- Ben Widawsky, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 02/12] drm/i915: BUG_ON bad PPGTT offset 2013-05-08 16:48 ` Ben Widawsky @ 2013-05-08 17:55 ` Daniel Vetter 0 siblings, 0 replies; 44+ messages in thread From: Daniel Vetter @ 2013-05-08 17:55 UTC (permalink / raw) To: Ben Widawsky; +Cc: Intel GFX On Wed, May 8, 2013 at 6:48 PM, Ben Widawsky <ben@bwidawsk.net> wrote: >> >> Queued for -next with the same s/BUG/WARN bikeshed, thanks for the patch. >> >> -Daniel >> > >> > If you're going to change to WARN, please fixup the patch to return >> > early... if (WARN_ON(...)) return >> > >> > The GPU will end up hanging if we've messed this up... >> >> I don't care about the gpu, only that the logs hit the disc. And I >> don't see how an early return can safe anything in here, since >> something went horribly wrong already anyway. >> -Daniel > > Seriously? Feels a bit like this has become an argument for arguments > sake, but: I'd much rather have an assertion fail than a gpu error state > from same random point in time later. You still get the assertion fail in dmesg with the WARN. The difference to a BUG is that we don't kill X, which means you can still vt-switch and get out of the graphical console mode and restore fbcon. For normal users that's imo much better (and the gpu crapping out later on, well meh). -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 03/12] drm/i915: make PDE|PTE platform specific 2013-04-24 6:15 [PATCH 00/12] [RFC] PPGTT prep patches part 1 Ben Widawsky 2013-04-24 6:15 ` [PATCH 01/12] drm/i915: Assert mutex_is_locked on context lookup Ben Widawsky 2013-04-24 6:15 ` [PATCH 02/12] drm/i915: BUG_ON bad PPGTT offset Ben Widawsky @ 2013-04-24 6:15 ` Ben Widawsky 2013-05-02 21:26 ` Jesse Barnes 2013-04-24 6:15 ` [PATCH 04/12] drm/i915: Extract PDE writes Ben Widawsky ` (10 subsequent siblings) 13 siblings, 1 reply; 44+ messages in thread From: Ben Widawsky @ 2013-04-24 6:15 UTC (permalink / raw) To: Intel GFX; +Cc: Ben Widawsky Accomplish this be removing the PDE count define which is (and has always been) part of the PPGTT structure anyway. With the addition of the gen specific init function, we can nicely tuck away the magic number in there. In this vain, make the PTE define less of a magic number. The remaining code in the global gtt setup is a bit messy, but an upcoming patch will clean that one up. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> --- drivers/gpu/drm/i915/i915_drv.h | 2 -- drivers/gpu/drm/i915/i915_gem_gtt.c | 10 +++++----- 2 files changed, 5 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index d5dcf7f..0a9c7cd 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -431,8 +431,6 @@ struct i915_gtt { }; #define gtt_total_entries(gtt) ((gtt).total >> PAGE_SHIFT) -#define I915_PPGTT_PD_ENTRIES 512 -#define I915_PPGTT_PT_ENTRIES 1024 struct i915_hw_ppgtt { struct drm_device *dev; unsigned num_pd_entries; diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index 0503f09..11a50cf 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -29,6 +29,7 @@ #include "intel_drv.h" typedef uint32_t gen6_gtt_pte_t; +#define I915_PPGTT_PT_ENTRIES (PAGE_SIZE / sizeof(gen6_gtt_pte_t)) /* PPGTT stuff */ #define GEN6_GTT_ADDR_ENCODE(addr) ((addr) | (((addr) >> 28) & 0xff0)) @@ -235,10 +236,9 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt) /* ppgtt PDEs reside in the global gtt pagetable, which has 512*1024 * entries. For aliasing ppgtt support we just steal them at the end for * now. */ - first_pd_entry_in_global_pt = - gtt_total_entries(dev_priv->gtt) - I915_PPGTT_PD_ENTRIES; + first_pd_entry_in_global_pt = gtt_total_entries(dev_priv->gtt) - 512; - ppgtt->num_pd_entries = I915_PPGTT_PD_ENTRIES; + ppgtt->num_pd_entries = 512; ppgtt->enable = gen6_ppgtt_enable; ppgtt->clear_range = gen6_ppgtt_clear_range; ppgtt->insert_entries = gen6_ppgtt_insert_entries; @@ -646,7 +646,7 @@ void i915_gem_init_global_gtt(struct drm_device *dev) if (INTEL_INFO(dev)->gen <= 7) { /* PPGTT pdes are stolen from global gtt ptes, so shrink the * aperture accordingly when using aliasing ppgtt. */ - gtt_size -= I915_PPGTT_PD_ENTRIES*PAGE_SIZE; + gtt_size -= 512 * PAGE_SIZE; } i915_gem_setup_global_gtt(dev, 0, mappable_size, gtt_size); @@ -657,7 +657,7 @@ void i915_gem_init_global_gtt(struct drm_device *dev) DRM_ERROR("Aliased PPGTT setup failed %d\n", ret); drm_mm_takedown(&dev_priv->mm.gtt_space); - gtt_size += I915_PPGTT_PD_ENTRIES*PAGE_SIZE; + gtt_size += 512 * PAGE_SIZE; } i915_gem_setup_global_gtt(dev, 0, mappable_size, gtt_size); } -- 1.8.2.1 ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 03/12] drm/i915: make PDE|PTE platform specific 2013-04-24 6:15 ` [PATCH 03/12] drm/i915: make PDE|PTE platform specific Ben Widawsky @ 2013-05-02 21:26 ` Jesse Barnes 2013-05-02 22:49 ` Ben Widawsky 2013-05-06 9:47 ` Daniel Vetter 0 siblings, 2 replies; 44+ messages in thread From: Jesse Barnes @ 2013-05-02 21:26 UTC (permalink / raw) To: Ben Widawsky; +Cc: Intel GFX On Tue, 23 Apr 2013 23:15:31 -0700 Ben Widawsky <ben@bwidawsk.net> wrote: > Accomplish this be removing the PDE count define which is (and has > always been) part of the PPGTT structure anyway. With the addition of > the gen specific init function, we can nicely tuck away the magic number > in there. > > In this vain, make the PTE define less of a magic number. vein > > The remaining code in the global gtt setup is a bit messy, but an > upcoming patch will clean that one up. > > Signed-off-by: Ben Widawsky <ben@bwidawsk.net> > --- > drivers/gpu/drm/i915/i915_drv.h | 2 -- > drivers/gpu/drm/i915/i915_gem_gtt.c | 10 +++++----- > 2 files changed, 5 insertions(+), 7 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > index d5dcf7f..0a9c7cd 100644 > --- a/drivers/gpu/drm/i915/i915_drv.h > +++ b/drivers/gpu/drm/i915/i915_drv.h > @@ -431,8 +431,6 @@ struct i915_gtt { > }; > #define gtt_total_entries(gtt) ((gtt).total >> PAGE_SHIFT) > > -#define I915_PPGTT_PD_ENTRIES 512 > -#define I915_PPGTT_PT_ENTRIES 1024 > struct i915_hw_ppgtt { > struct drm_device *dev; > unsigned num_pd_entries; > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c > index 0503f09..11a50cf 100644 > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c > @@ -29,6 +29,7 @@ > #include "intel_drv.h" > > typedef uint32_t gen6_gtt_pte_t; > +#define I915_PPGTT_PT_ENTRIES (PAGE_SIZE / sizeof(gen6_gtt_pte_t)) > > /* PPGTT stuff */ > #define GEN6_GTT_ADDR_ENCODE(addr) ((addr) | (((addr) >> 28) & 0xff0)) > @@ -235,10 +236,9 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt) > /* ppgtt PDEs reside in the global gtt pagetable, which has 512*1024 > * entries. For aliasing ppgtt support we just steal them at the end for > * now. */ > - first_pd_entry_in_global_pt = > - gtt_total_entries(dev_priv->gtt) - I915_PPGTT_PD_ENTRIES; > + first_pd_entry_in_global_pt = gtt_total_entries(dev_priv->gtt) - 512; > > - ppgtt->num_pd_entries = I915_PPGTT_PD_ENTRIES; > + ppgtt->num_pd_entries = 512; > ppgtt->enable = gen6_ppgtt_enable; > ppgtt->clear_range = gen6_ppgtt_clear_range; > ppgtt->insert_entries = gen6_ppgtt_insert_entries; > @@ -646,7 +646,7 @@ void i915_gem_init_global_gtt(struct drm_device *dev) > if (INTEL_INFO(dev)->gen <= 7) { > /* PPGTT pdes are stolen from global gtt ptes, so shrink the > * aperture accordingly when using aliasing ppgtt. */ > - gtt_size -= I915_PPGTT_PD_ENTRIES*PAGE_SIZE; > + gtt_size -= 512 * PAGE_SIZE; > } > > i915_gem_setup_global_gtt(dev, 0, mappable_size, gtt_size); > @@ -657,7 +657,7 @@ void i915_gem_init_global_gtt(struct drm_device *dev) > > DRM_ERROR("Aliased PPGTT setup failed %d\n", ret); > drm_mm_takedown(&dev_priv->mm.gtt_space); > - gtt_size += I915_PPGTT_PD_ENTRIES*PAGE_SIZE; > + gtt_size += 512 * PAGE_SIZE; > } > i915_gem_setup_global_gtt(dev, 0, mappable_size, gtt_size); > } I thought we could have 1024 PDEs? Either way, this just drops the macro, which is fine, unless we want to increase our PDE count. Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> -- Jesse Barnes, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 03/12] drm/i915: make PDE|PTE platform specific 2013-05-02 21:26 ` Jesse Barnes @ 2013-05-02 22:49 ` Ben Widawsky 2013-05-02 22:55 ` Jesse Barnes 2013-05-06 9:47 ` Daniel Vetter 1 sibling, 1 reply; 44+ messages in thread From: Ben Widawsky @ 2013-05-02 22:49 UTC (permalink / raw) To: Jesse Barnes; +Cc: Intel GFX On Thu, May 02, 2013 at 02:26:17PM -0700, Jesse Barnes wrote: > On Tue, 23 Apr 2013 23:15:31 -0700 > Ben Widawsky <ben@bwidawsk.net> wrote: > > > Accomplish this be removing the PDE count define which is (and has > > always been) part of the PPGTT structure anyway. With the addition of > > the gen specific init function, we can nicely tuck away the magic number > > in there. > > > > In this vain, make the PTE define less of a magic number. > > vein > > > > > The remaining code in the global gtt setup is a bit messy, but an > > upcoming patch will clean that one up. > > > > Signed-off-by: Ben Widawsky <ben@bwidawsk.net> > > --- > > drivers/gpu/drm/i915/i915_drv.h | 2 -- > > drivers/gpu/drm/i915/i915_gem_gtt.c | 10 +++++----- > > 2 files changed, 5 insertions(+), 7 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > > index d5dcf7f..0a9c7cd 100644 > > --- a/drivers/gpu/drm/i915/i915_drv.h > > +++ b/drivers/gpu/drm/i915/i915_drv.h > > @@ -431,8 +431,6 @@ struct i915_gtt { > > }; > > #define gtt_total_entries(gtt) ((gtt).total >> PAGE_SHIFT) > > > > -#define I915_PPGTT_PD_ENTRIES 512 > > -#define I915_PPGTT_PT_ENTRIES 1024 > > struct i915_hw_ppgtt { > > struct drm_device *dev; > > unsigned num_pd_entries; > > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c > > index 0503f09..11a50cf 100644 > > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c > > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c > > @@ -29,6 +29,7 @@ > > #include "intel_drv.h" > > > > typedef uint32_t gen6_gtt_pte_t; > > +#define I915_PPGTT_PT_ENTRIES (PAGE_SIZE / sizeof(gen6_gtt_pte_t)) > > > > /* PPGTT stuff */ > > #define GEN6_GTT_ADDR_ENCODE(addr) ((addr) | (((addr) >> 28) & 0xff0)) > > @@ -235,10 +236,9 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt) > > /* ppgtt PDEs reside in the global gtt pagetable, which has 512*1024 > > * entries. For aliasing ppgtt support we just steal them at the end for > > * now. */ > > - first_pd_entry_in_global_pt = > > - gtt_total_entries(dev_priv->gtt) - I915_PPGTT_PD_ENTRIES; > > + first_pd_entry_in_global_pt = gtt_total_entries(dev_priv->gtt) - 512; > > > > - ppgtt->num_pd_entries = I915_PPGTT_PD_ENTRIES; > > + ppgtt->num_pd_entries = 512; > > ppgtt->enable = gen6_ppgtt_enable; > > ppgtt->clear_range = gen6_ppgtt_clear_range; > > ppgtt->insert_entries = gen6_ppgtt_insert_entries; > > @@ -646,7 +646,7 @@ void i915_gem_init_global_gtt(struct drm_device *dev) > > if (INTEL_INFO(dev)->gen <= 7) { > > /* PPGTT pdes are stolen from global gtt ptes, so shrink the > > * aperture accordingly when using aliasing ppgtt. */ > > - gtt_size -= I915_PPGTT_PD_ENTRIES*PAGE_SIZE; > > + gtt_size -= 512 * PAGE_SIZE; > > } > > > > i915_gem_setup_global_gtt(dev, 0, mappable_size, gtt_size); > > @@ -657,7 +657,7 @@ void i915_gem_init_global_gtt(struct drm_device *dev) > > > > DRM_ERROR("Aliased PPGTT setup failed %d\n", ret); > > drm_mm_takedown(&dev_priv->mm.gtt_space); > > - gtt_size += I915_PPGTT_PD_ENTRIES*PAGE_SIZE; > > + gtt_size += 512 * PAGE_SIZE; > > } > > i915_gem_setup_global_gtt(dev, 0, mappable_size, gtt_size); > > } > > I thought we could have 1024 PDEs? Either way, this just drops the > macro, which is fine, unless we want to increase our PDE count. > > Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Thanks for the review. I think the docs must be wrong if they say this. 512PDEs are all you need to map a 2GB address space (which is the max), and furthermore, the DCLV which is used to indicate the size of the PDEs can only allow up to 2GB. > > -- > Jesse Barnes, Intel Open Source Technology Center -- Ben Widawsky, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 03/12] drm/i915: make PDE|PTE platform specific 2013-05-02 22:49 ` Ben Widawsky @ 2013-05-02 22:55 ` Jesse Barnes 0 siblings, 0 replies; 44+ messages in thread From: Jesse Barnes @ 2013-05-02 22:55 UTC (permalink / raw) To: Ben Widawsky; +Cc: Intel GFX On Thu, 2 May 2013 15:49:17 -0700 Ben Widawsky <ben@bwidawsk.net> wrote: > On Thu, May 02, 2013 at 02:26:17PM -0700, Jesse Barnes wrote: > > On Tue, 23 Apr 2013 23:15:31 -0700 > > Ben Widawsky <ben@bwidawsk.net> wrote: > > > > > Accomplish this be removing the PDE count define which is (and has > > > always been) part of the PPGTT structure anyway. With the addition of > > > the gen specific init function, we can nicely tuck away the magic number > > > in there. > > > > > > In this vain, make the PTE define less of a magic number. > > > > vein > > > > > > > > The remaining code in the global gtt setup is a bit messy, but an > > > upcoming patch will clean that one up. > > > > > > Signed-off-by: Ben Widawsky <ben@bwidawsk.net> > > > --- > > > drivers/gpu/drm/i915/i915_drv.h | 2 -- > > > drivers/gpu/drm/i915/i915_gem_gtt.c | 10 +++++----- > > > 2 files changed, 5 insertions(+), 7 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > > > index d5dcf7f..0a9c7cd 100644 > > > --- a/drivers/gpu/drm/i915/i915_drv.h > > > +++ b/drivers/gpu/drm/i915/i915_drv.h > > > @@ -431,8 +431,6 @@ struct i915_gtt { > > > }; > > > #define gtt_total_entries(gtt) ((gtt).total >> PAGE_SHIFT) > > > > > > -#define I915_PPGTT_PD_ENTRIES 512 > > > -#define I915_PPGTT_PT_ENTRIES 1024 > > > struct i915_hw_ppgtt { > > > struct drm_device *dev; > > > unsigned num_pd_entries; > > > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c > > > index 0503f09..11a50cf 100644 > > > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c > > > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c > > > @@ -29,6 +29,7 @@ > > > #include "intel_drv.h" > > > > > > typedef uint32_t gen6_gtt_pte_t; > > > +#define I915_PPGTT_PT_ENTRIES (PAGE_SIZE / sizeof(gen6_gtt_pte_t)) > > > > > > /* PPGTT stuff */ > > > #define GEN6_GTT_ADDR_ENCODE(addr) ((addr) | (((addr) >> 28) & 0xff0)) > > > @@ -235,10 +236,9 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt) > > > /* ppgtt PDEs reside in the global gtt pagetable, which has 512*1024 > > > * entries. For aliasing ppgtt support we just steal them at the end for > > > * now. */ > > > - first_pd_entry_in_global_pt = > > > - gtt_total_entries(dev_priv->gtt) - I915_PPGTT_PD_ENTRIES; > > > + first_pd_entry_in_global_pt = gtt_total_entries(dev_priv->gtt) - 512; > > > > > > - ppgtt->num_pd_entries = I915_PPGTT_PD_ENTRIES; > > > + ppgtt->num_pd_entries = 512; > > > ppgtt->enable = gen6_ppgtt_enable; > > > ppgtt->clear_range = gen6_ppgtt_clear_range; > > > ppgtt->insert_entries = gen6_ppgtt_insert_entries; > > > @@ -646,7 +646,7 @@ void i915_gem_init_global_gtt(struct drm_device *dev) > > > if (INTEL_INFO(dev)->gen <= 7) { > > > /* PPGTT pdes are stolen from global gtt ptes, so shrink the > > > * aperture accordingly when using aliasing ppgtt. */ > > > - gtt_size -= I915_PPGTT_PD_ENTRIES*PAGE_SIZE; > > > + gtt_size -= 512 * PAGE_SIZE; > > > } > > > > > > i915_gem_setup_global_gtt(dev, 0, mappable_size, gtt_size); > > > @@ -657,7 +657,7 @@ void i915_gem_init_global_gtt(struct drm_device *dev) > > > > > > DRM_ERROR("Aliased PPGTT setup failed %d\n", ret); > > > drm_mm_takedown(&dev_priv->mm.gtt_space); > > > - gtt_size += I915_PPGTT_PD_ENTRIES*PAGE_SIZE; > > > + gtt_size += 512 * PAGE_SIZE; > > > } > > > i915_gem_setup_global_gtt(dev, 0, mappable_size, gtt_size); > > > } > > > > I thought we could have 1024 PDEs? Either way, this just drops the > > macro, which is fine, unless we want to increase our PDE count. > > > > Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> > > Thanks for the review. > > I think the docs must be wrong if they say this. 512PDEs are all you > need to map a 2GB address space (which is the max), and furthermore, the > DCLV which is used to indicate the size of the PDEs can only allow up to 2GB. Agreed, one of the diagrams indicates a 4k set of PDEs, but the DCLV reg says 32 * 16 entries. So the 512 is fine. -- Jesse Barnes, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 03/12] drm/i915: make PDE|PTE platform specific 2013-05-02 21:26 ` Jesse Barnes 2013-05-02 22:49 ` Ben Widawsky @ 2013-05-06 9:47 ` Daniel Vetter 2013-05-08 16:49 ` Ben Widawsky 1 sibling, 1 reply; 44+ messages in thread From: Daniel Vetter @ 2013-05-06 9:47 UTC (permalink / raw) To: Jesse Barnes; +Cc: Ben Widawsky, Intel GFX On Thu, May 02, 2013 at 02:26:17PM -0700, Jesse Barnes wrote: > On Tue, 23 Apr 2013 23:15:31 -0700 > Ben Widawsky <ben@bwidawsk.net> wrote: > > > Accomplish this be removing the PDE count define which is (and has > > always been) part of the PPGTT structure anyway. With the addition of > > the gen specific init function, we can nicely tuck away the magic number > > in there. > > > > In this vain, make the PTE define less of a magic number. > > vein > > > > > The remaining code in the global gtt setup is a bit messy, but an > > upcoming patch will clean that one up. > > > > Signed-off-by: Ben Widawsky <ben@bwidawsk.net> > > --- > > drivers/gpu/drm/i915/i915_drv.h | 2 -- > > drivers/gpu/drm/i915/i915_gem_gtt.c | 10 +++++----- > > 2 files changed, 5 insertions(+), 7 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > > index d5dcf7f..0a9c7cd 100644 > > --- a/drivers/gpu/drm/i915/i915_drv.h > > +++ b/drivers/gpu/drm/i915/i915_drv.h > > @@ -431,8 +431,6 @@ struct i915_gtt { > > }; > > #define gtt_total_entries(gtt) ((gtt).total >> PAGE_SHIFT) > > > > -#define I915_PPGTT_PD_ENTRIES 512 > > -#define I915_PPGTT_PT_ENTRIES 1024 > > struct i915_hw_ppgtt { > > struct drm_device *dev; > > unsigned num_pd_entries; > > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c > > index 0503f09..11a50cf 100644 > > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c > > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c > > @@ -29,6 +29,7 @@ > > #include "intel_drv.h" > > > > typedef uint32_t gen6_gtt_pte_t; > > +#define I915_PPGTT_PT_ENTRIES (PAGE_SIZE / sizeof(gen6_gtt_pte_t)) > > > > /* PPGTT stuff */ > > #define GEN6_GTT_ADDR_ENCODE(addr) ((addr) | (((addr) >> 28) & 0xff0)) > > @@ -235,10 +236,9 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt) > > /* ppgtt PDEs reside in the global gtt pagetable, which has 512*1024 > > * entries. For aliasing ppgtt support we just steal them at the end for > > * now. */ > > - first_pd_entry_in_global_pt = > > - gtt_total_entries(dev_priv->gtt) - I915_PPGTT_PD_ENTRIES; > > + first_pd_entry_in_global_pt = gtt_total_entries(dev_priv->gtt) - 512; > > > > - ppgtt->num_pd_entries = I915_PPGTT_PD_ENTRIES; > > + ppgtt->num_pd_entries = 512; > > ppgtt->enable = gen6_ppgtt_enable; > > ppgtt->clear_range = gen6_ppgtt_clear_range; > > ppgtt->insert_entries = gen6_ppgtt_insert_entries; > > @@ -646,7 +646,7 @@ void i915_gem_init_global_gtt(struct drm_device *dev) > > if (INTEL_INFO(dev)->gen <= 7) { > > /* PPGTT pdes are stolen from global gtt ptes, so shrink the > > * aperture accordingly when using aliasing ppgtt. */ > > - gtt_size -= I915_PPGTT_PD_ENTRIES*PAGE_SIZE; > > + gtt_size -= 512 * PAGE_SIZE; > > } > > > > i915_gem_setup_global_gtt(dev, 0, mappable_size, gtt_size); > > @@ -657,7 +657,7 @@ void i915_gem_init_global_gtt(struct drm_device *dev) > > > > DRM_ERROR("Aliased PPGTT setup failed %d\n", ret); > > drm_mm_takedown(&dev_priv->mm.gtt_space); > > - gtt_size += I915_PPGTT_PD_ENTRIES*PAGE_SIZE; > > + gtt_size += 512 * PAGE_SIZE; > > } > > i915_gem_setup_global_gtt(dev, 0, mappable_size, gtt_size); > > } > > I thought we could have 1024 PDEs? Either way, this just drops the > macro, which is fine, unless we want to increase our PDE count. > > Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> I've written this code, so I know what the magic 512 is all about. But since Jesse is already confused about them I don't like removing the #define that much. Heck, we even tend to use PAGE_SIZE instead of 4096. What about a GEN6_ prefix instead? Or if you want to make this dynamic (iirc we can allocated PDEs in groups of 32) call it _MAX or so? I like the PT entries part though. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 03/12] drm/i915: make PDE|PTE platform specific 2013-05-06 9:47 ` Daniel Vetter @ 2013-05-08 16:49 ` Ben Widawsky 2013-05-08 17:52 ` Daniel Vetter 0 siblings, 1 reply; 44+ messages in thread From: Ben Widawsky @ 2013-05-08 16:49 UTC (permalink / raw) To: Daniel Vetter; +Cc: Intel GFX On Mon, May 06, 2013 at 11:47:42AM +0200, Daniel Vetter wrote: > On Thu, May 02, 2013 at 02:26:17PM -0700, Jesse Barnes wrote: > > On Tue, 23 Apr 2013 23:15:31 -0700 > > Ben Widawsky <ben@bwidawsk.net> wrote: > > > > > Accomplish this be removing the PDE count define which is (and has > > > always been) part of the PPGTT structure anyway. With the addition of > > > the gen specific init function, we can nicely tuck away the magic number > > > in there. > > > > > > In this vain, make the PTE define less of a magic number. > > > > vein > > > > > > > > The remaining code in the global gtt setup is a bit messy, but an > > > upcoming patch will clean that one up. > > > > > > Signed-off-by: Ben Widawsky <ben@bwidawsk.net> > > > --- > > > drivers/gpu/drm/i915/i915_drv.h | 2 -- > > > drivers/gpu/drm/i915/i915_gem_gtt.c | 10 +++++----- > > > 2 files changed, 5 insertions(+), 7 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > > > index d5dcf7f..0a9c7cd 100644 > > > --- a/drivers/gpu/drm/i915/i915_drv.h > > > +++ b/drivers/gpu/drm/i915/i915_drv.h > > > @@ -431,8 +431,6 @@ struct i915_gtt { > > > }; > > > #define gtt_total_entries(gtt) ((gtt).total >> PAGE_SHIFT) > > > > > > -#define I915_PPGTT_PD_ENTRIES 512 > > > -#define I915_PPGTT_PT_ENTRIES 1024 > > > struct i915_hw_ppgtt { > > > struct drm_device *dev; > > > unsigned num_pd_entries; > > > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c > > > index 0503f09..11a50cf 100644 > > > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c > > > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c > > > @@ -29,6 +29,7 @@ > > > #include "intel_drv.h" > > > > > > typedef uint32_t gen6_gtt_pte_t; > > > +#define I915_PPGTT_PT_ENTRIES (PAGE_SIZE / sizeof(gen6_gtt_pte_t)) > > > > > > /* PPGTT stuff */ > > > #define GEN6_GTT_ADDR_ENCODE(addr) ((addr) | (((addr) >> 28) & 0xff0)) > > > @@ -235,10 +236,9 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt) > > > /* ppgtt PDEs reside in the global gtt pagetable, which has 512*1024 > > > * entries. For aliasing ppgtt support we just steal them at the end for > > > * now. */ > > > - first_pd_entry_in_global_pt = > > > - gtt_total_entries(dev_priv->gtt) - I915_PPGTT_PD_ENTRIES; > > > + first_pd_entry_in_global_pt = gtt_total_entries(dev_priv->gtt) - 512; > > > > > > - ppgtt->num_pd_entries = I915_PPGTT_PD_ENTRIES; > > > + ppgtt->num_pd_entries = 512; > > > ppgtt->enable = gen6_ppgtt_enable; > > > ppgtt->clear_range = gen6_ppgtt_clear_range; > > > ppgtt->insert_entries = gen6_ppgtt_insert_entries; > > > @@ -646,7 +646,7 @@ void i915_gem_init_global_gtt(struct drm_device *dev) > > > if (INTEL_INFO(dev)->gen <= 7) { > > > /* PPGTT pdes are stolen from global gtt ptes, so shrink the > > > * aperture accordingly when using aliasing ppgtt. */ > > > - gtt_size -= I915_PPGTT_PD_ENTRIES*PAGE_SIZE; > > > + gtt_size -= 512 * PAGE_SIZE; > > > } > > > > > > i915_gem_setup_global_gtt(dev, 0, mappable_size, gtt_size); > > > @@ -657,7 +657,7 @@ void i915_gem_init_global_gtt(struct drm_device *dev) > > > > > > DRM_ERROR("Aliased PPGTT setup failed %d\n", ret); > > > drm_mm_takedown(&dev_priv->mm.gtt_space); > > > - gtt_size += I915_PPGTT_PD_ENTRIES*PAGE_SIZE; > > > + gtt_size += 512 * PAGE_SIZE; > > > } > > > i915_gem_setup_global_gtt(dev, 0, mappable_size, gtt_size); > > > } > > > > I thought we could have 1024 PDEs? Either way, this just drops the > > macro, which is fine, unless we want to increase our PDE count. > > > > Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> > > I've written this code, so I know what the magic 512 is all about. But > since Jesse is already confused about them I don't like removing the > #define that much. Heck, we even tend to use PAGE_SIZE instead of 4096. > > What about a GEN6_ prefix instead? Or if you want to make this dynamic > (iirc we can allocated PDEs in groups of 32) call it _MAX or so? > > I like the PT entries part though. > -Daniel If I just bring back #define I915_PPGTT_PD_ENTRIES 512, does this make you sufficient happy to accept the patch? > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch -- Ben Widawsky, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 03/12] drm/i915: make PDE|PTE platform specific 2013-05-08 16:49 ` Ben Widawsky @ 2013-05-08 17:52 ` Daniel Vetter 0 siblings, 0 replies; 44+ messages in thread From: Daniel Vetter @ 2013-05-08 17:52 UTC (permalink / raw) To: Ben Widawsky; +Cc: Intel GFX On Wed, May 8, 2013 at 6:49 PM, Ben Widawsky <ben@bwidawsk.net> wrote: >> I've written this code, so I know what the magic 512 is all about. But >> since Jesse is already confused about them I don't like removing the >> #define that much. Heck, we even tend to use PAGE_SIZE instead of 4096. >> >> What about a GEN6_ prefix instead? Or if you want to make this dynamic >> (iirc we can allocated PDEs in groups of 32) call it _MAX or so? >> >> I like the PT entries part though. >> -Daniel > > If I just bring back #define I915_PPGTT_PD_ENTRIES 512, does this make > you sufficient happy to accept the patch? Yeah, maybe call it GEN6_ instead of I915_ while at it ... -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 04/12] drm/i915: Extract PDE writes 2013-04-24 6:15 [PATCH 00/12] [RFC] PPGTT prep patches part 1 Ben Widawsky ` (2 preceding siblings ...) 2013-04-24 6:15 ` [PATCH 03/12] drm/i915: make PDE|PTE platform specific Ben Widawsky @ 2013-04-24 6:15 ` Ben Widawsky 2013-05-02 21:27 ` Jesse Barnes 2013-04-24 6:15 ` [PATCH 05/12] drm: Optionally create mm blocks from top-to-bottom Ben Widawsky ` (9 subsequent siblings) 13 siblings, 1 reply; 44+ messages in thread From: Ben Widawsky @ 2013-04-24 6:15 UTC (permalink / raw) To: Intel GFX; +Cc: Ben Widawsky It also makes some sense IMO to have these two functions separate irrespective of the number of callers. Only the single caller for now, but that will change as we add more PPGTTs. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> --- drivers/gpu/drm/i915/i915_gem_gtt.c | 22 +++++++++++++++------- 1 file changed, 15 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index 11a50cf..975adaa 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -76,18 +76,13 @@ static inline gen6_gtt_pte_t gen6_pte_encode(struct drm_device *dev, return pte; } -static int gen6_ppgtt_enable(struct drm_device *dev) +static void gen6_write_pdes(struct i915_hw_ppgtt *ppgtt) { - drm_i915_private_t *dev_priv = dev->dev_private; - uint32_t pd_offset; - struct intel_ring_buffer *ring; - struct i915_hw_ppgtt *ppgtt = dev_priv->mm.aliasing_ppgtt; + struct drm_i915_private *dev_priv = ppgtt->dev->dev_private; gen6_gtt_pte_t __iomem *pd_addr; uint32_t pd_entry; int i; - BUG_ON(ppgtt->pd_offset & 0x3f); - pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm + ppgtt->pd_offset / sizeof(gen6_gtt_pte_t); for (i = 0; i < ppgtt->num_pd_entries; i++) { @@ -100,6 +95,19 @@ static int gen6_ppgtt_enable(struct drm_device *dev) writel(pd_entry, pd_addr + i); } readl(pd_addr); +} + +static int gen6_ppgtt_enable(struct drm_device *dev) +{ + drm_i915_private_t *dev_priv = dev->dev_private; + uint32_t pd_offset; + struct intel_ring_buffer *ring; + struct i915_hw_ppgtt *ppgtt = dev_priv->mm.aliasing_ppgtt; + int i; + + BUG_ON(ppgtt->pd_offset & 0x3f); + + gen6_write_pdes(ppgtt); pd_offset = ppgtt->pd_offset; pd_offset /= 64; /* in cachelines, */ -- 1.8.2.1 ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 04/12] drm/i915: Extract PDE writes 2013-04-24 6:15 ` [PATCH 04/12] drm/i915: Extract PDE writes Ben Widawsky @ 2013-05-02 21:27 ` Jesse Barnes 2013-05-06 9:50 ` Daniel Vetter 0 siblings, 1 reply; 44+ messages in thread From: Jesse Barnes @ 2013-05-02 21:27 UTC (permalink / raw) To: Ben Widawsky; +Cc: Intel GFX On Tue, 23 Apr 2013 23:15:32 -0700 Ben Widawsky <ben@bwidawsk.net> wrote: > It also makes some sense IMO to have these two functions separate > irrespective of the number of callers. > > Only the single caller for now, but that will change as we add more > PPGTTs. > > Signed-off-by: Ben Widawsky <ben@bwidawsk.net> > --- > drivers/gpu/drm/i915/i915_gem_gtt.c | 22 +++++++++++++++------- > 1 file changed, 15 insertions(+), 7 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c > index 11a50cf..975adaa 100644 > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c > @@ -76,18 +76,13 @@ static inline gen6_gtt_pte_t gen6_pte_encode(struct drm_device *dev, > return pte; > } > > -static int gen6_ppgtt_enable(struct drm_device *dev) > +static void gen6_write_pdes(struct i915_hw_ppgtt *ppgtt) > { > - drm_i915_private_t *dev_priv = dev->dev_private; > - uint32_t pd_offset; > - struct intel_ring_buffer *ring; > - struct i915_hw_ppgtt *ppgtt = dev_priv->mm.aliasing_ppgtt; > + struct drm_i915_private *dev_priv = ppgtt->dev->dev_private; > gen6_gtt_pte_t __iomem *pd_addr; > uint32_t pd_entry; > int i; > > - BUG_ON(ppgtt->pd_offset & 0x3f); > - > pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm + > ppgtt->pd_offset / sizeof(gen6_gtt_pte_t); > for (i = 0; i < ppgtt->num_pd_entries; i++) { > @@ -100,6 +95,19 @@ static int gen6_ppgtt_enable(struct drm_device *dev) > writel(pd_entry, pd_addr + i); > } > readl(pd_addr); > +} > + > +static int gen6_ppgtt_enable(struct drm_device *dev) > +{ > + drm_i915_private_t *dev_priv = dev->dev_private; > + uint32_t pd_offset; > + struct intel_ring_buffer *ring; > + struct i915_hw_ppgtt *ppgtt = dev_priv->mm.aliasing_ppgtt; > + int i; > + > + BUG_ON(ppgtt->pd_offset & 0x3f); > + > + gen6_write_pdes(ppgtt); > > pd_offset = ppgtt->pd_offset; > pd_offset /= 64; /* in cachelines, */ Yep, looks nicer this way. Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> -- Jesse Barnes, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 04/12] drm/i915: Extract PDE writes 2013-05-02 21:27 ` Jesse Barnes @ 2013-05-06 9:50 ` Daniel Vetter 0 siblings, 0 replies; 44+ messages in thread From: Daniel Vetter @ 2013-05-06 9:50 UTC (permalink / raw) To: Jesse Barnes; +Cc: Ben Widawsky, Intel GFX On Thu, May 02, 2013 at 02:27:41PM -0700, Jesse Barnes wrote: > On Tue, 23 Apr 2013 23:15:32 -0700 > Ben Widawsky <ben@bwidawsk.net> wrote: > > > It also makes some sense IMO to have these two functions separate > > irrespective of the number of callers. > > > > Only the single caller for now, but that will change as we add more > > PPGTTs. > > > > Signed-off-by: Ben Widawsky <ben@bwidawsk.net> > > --- > > drivers/gpu/drm/i915/i915_gem_gtt.c | 22 +++++++++++++++------- > > 1 file changed, 15 insertions(+), 7 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c > > index 11a50cf..975adaa 100644 > > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c > > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c > > @@ -76,18 +76,13 @@ static inline gen6_gtt_pte_t gen6_pte_encode(struct drm_device *dev, > > return pte; > > } > > > > -static int gen6_ppgtt_enable(struct drm_device *dev) > > +static void gen6_write_pdes(struct i915_hw_ppgtt *ppgtt) > > { > > - drm_i915_private_t *dev_priv = dev->dev_private; > > - uint32_t pd_offset; > > - struct intel_ring_buffer *ring; > > - struct i915_hw_ppgtt *ppgtt = dev_priv->mm.aliasing_ppgtt; > > + struct drm_i915_private *dev_priv = ppgtt->dev->dev_private; > > gen6_gtt_pte_t __iomem *pd_addr; > > uint32_t pd_entry; > > int i; > > > > - BUG_ON(ppgtt->pd_offset & 0x3f); > > - > > pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm + > > ppgtt->pd_offset / sizeof(gen6_gtt_pte_t); > > for (i = 0; i < ppgtt->num_pd_entries; i++) { > > @@ -100,6 +95,19 @@ static int gen6_ppgtt_enable(struct drm_device *dev) > > writel(pd_entry, pd_addr + i); > > } > > readl(pd_addr); > > +} > > + > > +static int gen6_ppgtt_enable(struct drm_device *dev) > > +{ > > + drm_i915_private_t *dev_priv = dev->dev_private; > > + uint32_t pd_offset; > > + struct intel_ring_buffer *ring; > > + struct i915_hw_ppgtt *ppgtt = dev_priv->mm.aliasing_ppgtt; > > + int i; > > + > > + BUG_ON(ppgtt->pd_offset & 0x3f); > > + > > + gen6_write_pdes(ppgtt); > > > > pd_offset = ppgtt->pd_offset; > > pd_offset /= 64; /* in cachelines, */ > > Yep, looks nicer this way. > > Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Queued for -next, thanks for the patch. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 05/12] drm: Optionally create mm blocks from top-to-bottom 2013-04-24 6:15 [PATCH 00/12] [RFC] PPGTT prep patches part 1 Ben Widawsky ` (3 preceding siblings ...) 2013-04-24 6:15 ` [PATCH 04/12] drm/i915: Extract PDE writes Ben Widawsky @ 2013-04-24 6:15 ` Ben Widawsky 2013-04-24 6:15 ` [PATCH 06/12] drm/i915: Use drm_mm for PPGTT PDEs Ben Widawsky ` (8 subsequent siblings) 13 siblings, 0 replies; 44+ messages in thread From: Ben Widawsky @ 2013-04-24 6:15 UTC (permalink / raw) To: Intel GFX; +Cc: Ben Widawsky From: Chris Wilson <chris@chris-wilson.co.uk> Clients like i915 needs to segregate cache domains within the GTT which can lead to small amounts of fragmentation. By allocating the uncached buffers from the bottom and the cacheable buffers from the top, we can reduce the amount of wasted space and also optimize allocation of the mappable portion of the GTT to only those buffers that require CPU access through the GTT. v2 by Ben: Update callers in i915_gem_object_bind_to_gtt() Turn search flags and allocation flags into separate enums Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> --- drivers/gpu/drm/drm_mm.c | 94 +++++++++++++++++++++++------------------ drivers/gpu/drm/i915/i915_gem.c | 8 +++- include/drm/drm_mm.h | 75 ++++++++++++++++++++------------ 3 files changed, 106 insertions(+), 71 deletions(-) diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c index db1e2d6..6f3bb07 100644 --- a/drivers/gpu/drm/drm_mm.c +++ b/drivers/gpu/drm/drm_mm.c @@ -49,7 +49,7 @@ #define MM_UNUSED_TARGET 4 -static struct drm_mm_node *drm_mm_kmalloc(struct drm_mm *mm, int atomic) +static struct drm_mm_node *drm_mm_kmalloc(struct drm_mm *mm, bool atomic) { struct drm_mm_node *child; @@ -105,7 +105,8 @@ EXPORT_SYMBOL(drm_mm_pre_get); static void drm_mm_insert_helper(struct drm_mm_node *hole_node, struct drm_mm_node *node, unsigned long size, unsigned alignment, - unsigned long color) + unsigned long color, + enum drm_mm_allocator_flags flags) { struct drm_mm *mm = hole_node->mm; unsigned long hole_start = drm_mm_hole_node_start(hole_node); @@ -118,12 +119,22 @@ static void drm_mm_insert_helper(struct drm_mm_node *hole_node, if (mm->color_adjust) mm->color_adjust(hole_node, color, &adj_start, &adj_end); + if (flags & DRM_MM_CREATE_TOP) + adj_start = adj_end - size; + if (alignment) { unsigned tmp = adj_start % alignment; - if (tmp) - adj_start += alignment - tmp; + if (tmp) { + if (flags & DRM_MM_CREATE_TOP) + adj_start -= tmp; + else + adj_start += alignment - tmp; + } } + BUG_ON(adj_start < hole_start); + BUG_ON(adj_end > hole_end); + if (adj_start == hole_start) { hole_node->hole_follows = 0; list_del(&hole_node->hole_stack); @@ -150,7 +161,7 @@ static void drm_mm_insert_helper(struct drm_mm_node *hole_node, struct drm_mm_node *drm_mm_create_block(struct drm_mm *mm, unsigned long start, unsigned long size, - bool atomic) + enum drm_mm_allocator_flags flags) { struct drm_mm_node *hole, *node; unsigned long end = start + size; @@ -161,7 +172,7 @@ struct drm_mm_node *drm_mm_create_block(struct drm_mm *mm, if (hole_start > start || hole_end < end) continue; - node = drm_mm_kmalloc(mm, atomic); + node = drm_mm_kmalloc(mm, flags & DRM_MM_CREATE_ATOMIC); if (unlikely(node == NULL)) return NULL; @@ -196,15 +207,15 @@ struct drm_mm_node *drm_mm_get_block_generic(struct drm_mm_node *hole_node, unsigned long size, unsigned alignment, unsigned long color, - int atomic) + enum drm_mm_allocator_flags flags) { struct drm_mm_node *node; - node = drm_mm_kmalloc(hole_node->mm, atomic); + node = drm_mm_kmalloc(hole_node->mm, flags & DRM_MM_CREATE_ATOMIC); if (unlikely(node == NULL)) return NULL; - drm_mm_insert_helper(hole_node, node, size, alignment, color); + drm_mm_insert_helper(hole_node, node, size, alignment, color, flags); return node; } @@ -217,32 +228,28 @@ EXPORT_SYMBOL(drm_mm_get_block_generic); */ int drm_mm_insert_node_generic(struct drm_mm *mm, struct drm_mm_node *node, unsigned long size, unsigned alignment, - unsigned long color) + unsigned long color, + enum drm_mm_allocator_flags aflags, + enum drm_mm_search_flags sflags) { struct drm_mm_node *hole_node; hole_node = drm_mm_search_free_generic(mm, size, alignment, - color, 0); + color, sflags); if (!hole_node) return -ENOSPC; - drm_mm_insert_helper(hole_node, node, size, alignment, color); + drm_mm_insert_helper(hole_node, node, size, alignment, color, aflags); return 0; } EXPORT_SYMBOL(drm_mm_insert_node_generic); -int drm_mm_insert_node(struct drm_mm *mm, struct drm_mm_node *node, - unsigned long size, unsigned alignment) -{ - return drm_mm_insert_node_generic(mm, node, size, alignment, 0); -} -EXPORT_SYMBOL(drm_mm_insert_node); - static void drm_mm_insert_helper_range(struct drm_mm_node *hole_node, struct drm_mm_node *node, unsigned long size, unsigned alignment, unsigned long color, - unsigned long start, unsigned long end) + unsigned long start, unsigned long end, + enum drm_mm_search_flags flags) { struct drm_mm *mm = hole_node->mm; unsigned long hole_start = drm_mm_hole_node_start(hole_node); @@ -257,13 +264,20 @@ static void drm_mm_insert_helper_range(struct drm_mm_node *hole_node, if (adj_end > end) adj_end = end; + if (flags & DRM_MM_CREATE_TOP) + adj_start = adj_end - size; + if (mm->color_adjust) mm->color_adjust(hole_node, color, &adj_start, &adj_end); if (alignment) { unsigned tmp = adj_start % alignment; - if (tmp) - adj_start += alignment - tmp; + if (tmp) { + if (flags & DRM_MM_CREATE_TOP) + adj_start -= tmp; + else + adj_start += alignment - tmp; + } } if (adj_start == hole_start) { @@ -280,6 +294,8 @@ static void drm_mm_insert_helper_range(struct drm_mm_node *hole_node, INIT_LIST_HEAD(&node->hole_stack); list_add(&node->node_list, &hole_node->node_list); + BUG_ON(node->start < start); + BUG_ON(node->start < adj_start); BUG_ON(node->start + node->size > adj_end); BUG_ON(node->start + node->size > end); @@ -296,16 +312,16 @@ struct drm_mm_node *drm_mm_get_block_range_generic(struct drm_mm_node *hole_node unsigned long color, unsigned long start, unsigned long end, - int atomic) + enum drm_mm_allocator_flags flags) { struct drm_mm_node *node; - node = drm_mm_kmalloc(hole_node->mm, atomic); + node = drm_mm_kmalloc(hole_node->mm, flags & DRM_MM_CREATE_ATOMIC); if (unlikely(node == NULL)) return NULL; drm_mm_insert_helper_range(hole_node, node, size, alignment, color, - start, end); + start, end, flags); return node; } @@ -318,31 +334,25 @@ EXPORT_SYMBOL(drm_mm_get_block_range_generic); */ int drm_mm_insert_node_in_range_generic(struct drm_mm *mm, struct drm_mm_node *node, unsigned long size, unsigned alignment, unsigned long color, - unsigned long start, unsigned long end) + unsigned long start, unsigned long end, + enum drm_mm_allocator_flags aflags, + enum drm_mm_search_flags sflags) { struct drm_mm_node *hole_node; hole_node = drm_mm_search_free_in_range_generic(mm, size, alignment, color, - start, end, 0); + start, end, sflags); if (!hole_node) return -ENOSPC; drm_mm_insert_helper_range(hole_node, node, size, alignment, color, - start, end); + start, end, aflags); return 0; } EXPORT_SYMBOL(drm_mm_insert_node_in_range_generic); -int drm_mm_insert_node_in_range(struct drm_mm *mm, struct drm_mm_node *node, - unsigned long size, unsigned alignment, - unsigned long start, unsigned long end) -{ - return drm_mm_insert_node_in_range_generic(mm, node, size, alignment, 0, start, end); -} -EXPORT_SYMBOL(drm_mm_insert_node_in_range); - /** * Remove a memory node from the allocator. */ @@ -418,7 +428,7 @@ struct drm_mm_node *drm_mm_search_free_generic(const struct drm_mm *mm, unsigned long size, unsigned alignment, unsigned long color, - bool best_match) + enum drm_mm_search_flags flags) { struct drm_mm_node *entry; struct drm_mm_node *best; @@ -431,7 +441,7 @@ struct drm_mm_node *drm_mm_search_free_generic(const struct drm_mm *mm, best = NULL; best_size = ~0UL; - drm_mm_for_each_hole(entry, mm, adj_start, adj_end) { + __drm_mm_for_each_hole(entry, mm, adj_start, adj_end, flags & DRM_MM_SEARCH_BELOW) { if (mm->color_adjust) { mm->color_adjust(entry, color, &adj_start, &adj_end); if (adj_end <= adj_start) @@ -441,7 +451,7 @@ struct drm_mm_node *drm_mm_search_free_generic(const struct drm_mm *mm, if (!check_free_hole(adj_start, adj_end, size, alignment)) continue; - if (!best_match) + if ((flags & DRM_MM_SEARCH_BEST) == 0) return entry; if (entry->size < best_size) { @@ -460,7 +470,7 @@ struct drm_mm_node *drm_mm_search_free_in_range_generic(const struct drm_mm *mm, unsigned long color, unsigned long start, unsigned long end, - bool best_match) + enum drm_mm_search_flags flags) { struct drm_mm_node *entry; struct drm_mm_node *best; @@ -473,7 +483,7 @@ struct drm_mm_node *drm_mm_search_free_in_range_generic(const struct drm_mm *mm, best = NULL; best_size = ~0UL; - drm_mm_for_each_hole(entry, mm, adj_start, adj_end) { + __drm_mm_for_each_hole(entry, mm, adj_start, adj_end, flags & DRM_MM_SEARCH_BELOW) { if (adj_start < start) adj_start = start; if (adj_end > end) @@ -488,7 +498,7 @@ struct drm_mm_node *drm_mm_search_free_in_range_generic(const struct drm_mm *mm, if (!check_free_hole(adj_start, adj_end, size, alignment)) continue; - if (!best_match) + if ((flags & DRM_MM_SEARCH_BEST) == 0) return entry; if (entry->size < best_size) { diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 6be940e..af56802 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2985,10 +2985,14 @@ i915_gem_object_bind_to_gtt(struct drm_i915_gem_object *obj, if (map_and_fenceable) ret = drm_mm_insert_node_in_range_generic(&dev_priv->mm.gtt_space, node, size, alignment, obj->cache_level, - 0, dev_priv->gtt.mappable_end); + 0, dev_priv->gtt.mappable_end, + DRM_MM_CREATE_DEFAULT, + DRM_MM_SEARCH_DEFAULT); else ret = drm_mm_insert_node_generic(&dev_priv->mm.gtt_space, node, - size, alignment, obj->cache_level); + size, alignment, obj->cache_level, + DRM_MM_SEARCH_DEFAULT, + DRM_MM_CREATE_DEFAULT); if (ret) { ret = i915_gem_evict_something(dev, size, alignment, obj->cache_level, diff --git a/include/drm/drm_mm.h b/include/drm/drm_mm.h index 88591ef..752e846 100644 --- a/include/drm/drm_mm.h +++ b/include/drm/drm_mm.h @@ -41,6 +41,18 @@ #include <linux/seq_file.h> #endif +enum drm_mm_allocator_flags { + DRM_MM_CREATE_DEFAULT = 0, + DRM_MM_CREATE_ATOMIC = 1<<0, + DRM_MM_CREATE_TOP = 1<<1, +}; + +enum drm_mm_search_flags { + DRM_MM_SEARCH_DEFAULT = 0, + DRM_MM_SEARCH_BEST = 1<<0, + DRM_MM_SEARCH_BELOW = 1<<1, +}; + struct drm_mm_node { struct list_head node_list; struct list_head hole_stack; @@ -135,18 +147,26 @@ static inline unsigned long drm_mm_hole_node_end(struct drm_mm_node *hole_node) 1 : 0; \ entry = list_entry(entry->hole_stack.next, struct drm_mm_node, hole_stack)) +#define __drm_mm_for_each_hole(entry, mm, hole_start, hole_end, backwards) \ + for (entry = list_entry((backwards) ? (mm)->hole_stack.prev : (mm)->hole_stack.next, struct drm_mm_node, hole_stack); \ + &entry->hole_stack != &(mm)->hole_stack ? \ + hole_start = drm_mm_hole_node_start(entry), \ + hole_end = drm_mm_hole_node_end(entry), \ + 1 : 0; \ + entry = list_entry((backwards) ? entry->hole_stack.prev : entry->hole_stack.next, struct drm_mm_node, hole_stack)) + /* * Basic range manager support (drm_mm.c) */ extern struct drm_mm_node *drm_mm_create_block(struct drm_mm *mm, unsigned long start, unsigned long size, - bool atomic); + enum drm_mm_allocator_flags flags); extern struct drm_mm_node *drm_mm_get_block_generic(struct drm_mm_node *node, unsigned long size, unsigned alignment, unsigned long color, - int atomic); + enum drm_mm_allocator_flags flags); extern struct drm_mm_node *drm_mm_get_block_range_generic( struct drm_mm_node *node, unsigned long size, @@ -154,7 +174,8 @@ extern struct drm_mm_node *drm_mm_get_block_range_generic( unsigned long color, unsigned long start, unsigned long end, - int atomic); + enum drm_mm_allocator_flags flags); + static inline struct drm_mm_node *drm_mm_get_block(struct drm_mm_node *parent, unsigned long size, unsigned alignment) @@ -165,7 +186,7 @@ static inline struct drm_mm_node *drm_mm_get_block_atomic(struct drm_mm_node *pa unsigned long size, unsigned alignment) { - return drm_mm_get_block_generic(parent, size, alignment, 0, 1); + return drm_mm_get_block_generic(parent, size, alignment, 0, DRM_MM_CREATE_ATOMIC); } static inline struct drm_mm_node *drm_mm_get_block_range( struct drm_mm_node *parent, @@ -196,39 +217,38 @@ static inline struct drm_mm_node *drm_mm_get_block_atomic_range( unsigned long end) { return drm_mm_get_block_range_generic(parent, size, alignment, 0, - start, end, 1); + start, end, DRM_MM_CREATE_ATOMIC); } -extern int drm_mm_insert_node(struct drm_mm *mm, - struct drm_mm_node *node, - unsigned long size, - unsigned alignment); -extern int drm_mm_insert_node_in_range(struct drm_mm *mm, - struct drm_mm_node *node, - unsigned long size, - unsigned alignment, - unsigned long start, - unsigned long end); extern int drm_mm_insert_node_generic(struct drm_mm *mm, struct drm_mm_node *node, unsigned long size, unsigned alignment, - unsigned long color); + unsigned long color, + enum drm_mm_allocator_flags aflags, + enum drm_mm_search_flags sflags); +#define drm_mm_insert_node(mm, node, size, alignment) \ + drm_mm_insert_node_generic(mm, node, size, alignment, 0, 0) extern int drm_mm_insert_node_in_range_generic(struct drm_mm *mm, struct drm_mm_node *node, unsigned long size, unsigned alignment, unsigned long color, unsigned long start, - unsigned long end); + unsigned long end, + enum drm_mm_allocator_flags aflags, + enum drm_mm_search_flags sflags); +#define drm_mm_insert_node_in_range(mm, node, size, alignment, start, end) \ + drm_mm_insert_node_in_range_generic(mm, node, size, alignment, 0, start, end, 0) extern void drm_mm_put_block(struct drm_mm_node *cur); extern void drm_mm_remove_node(struct drm_mm_node *node); extern void drm_mm_replace_node(struct drm_mm_node *old, struct drm_mm_node *new); + extern struct drm_mm_node *drm_mm_search_free_generic(const struct drm_mm *mm, unsigned long size, unsigned alignment, unsigned long color, - bool best_match); + enum drm_mm_search_flags flags); extern struct drm_mm_node *drm_mm_search_free_in_range_generic( const struct drm_mm *mm, unsigned long size, @@ -236,13 +256,14 @@ extern struct drm_mm_node *drm_mm_search_free_in_range_generic( unsigned long color, unsigned long start, unsigned long end, - bool best_match); + enum drm_mm_search_flags flags); + static inline struct drm_mm_node *drm_mm_search_free(const struct drm_mm *mm, unsigned long size, unsigned alignment, - bool best_match) + enum drm_mm_search_flags flags) { - return drm_mm_search_free_generic(mm,size, alignment, 0, best_match); + return drm_mm_search_free_generic(mm,size, alignment, 0, flags); } static inline struct drm_mm_node *drm_mm_search_free_in_range( const struct drm_mm *mm, @@ -250,18 +271,18 @@ static inline struct drm_mm_node *drm_mm_search_free_in_range( unsigned alignment, unsigned long start, unsigned long end, - bool best_match) + enum drm_mm_search_flags flags) { return drm_mm_search_free_in_range_generic(mm, size, alignment, 0, - start, end, best_match); + start, end, flags); } static inline struct drm_mm_node *drm_mm_search_free_color(const struct drm_mm *mm, unsigned long size, unsigned alignment, unsigned long color, - bool best_match) + enum drm_mm_search_flags flags) { - return drm_mm_search_free_generic(mm,size, alignment, color, best_match); + return drm_mm_search_free_generic(mm,size, alignment, color, flags); } static inline struct drm_mm_node *drm_mm_search_free_in_range_color( const struct drm_mm *mm, @@ -270,10 +291,10 @@ static inline struct drm_mm_node *drm_mm_search_free_in_range_color( unsigned long color, unsigned long start, unsigned long end, - bool best_match) + enum drm_mm_search_flags flags) { return drm_mm_search_free_in_range_generic(mm, size, alignment, color, - start, end, best_match); + start, end, flags); } extern int drm_mm_init(struct drm_mm *mm, unsigned long start, -- 1.8.2.1 ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 06/12] drm/i915: Use drm_mm for PPGTT PDEs 2013-04-24 6:15 [PATCH 00/12] [RFC] PPGTT prep patches part 1 Ben Widawsky ` (4 preceding siblings ...) 2013-04-24 6:15 ` [PATCH 05/12] drm: Optionally create mm blocks from top-to-bottom Ben Widawsky @ 2013-04-24 6:15 ` Ben Widawsky 2013-05-02 21:42 ` Jesse Barnes 2013-04-24 6:15 ` [PATCH 07/12] drm/i915: Use PDEs as the guard page Ben Widawsky ` (7 subsequent siblings) 13 siblings, 1 reply; 44+ messages in thread From: Ben Widawsky @ 2013-04-24 6:15 UTC (permalink / raw) To: Intel GFX; +Cc: Ben Widawsky I think this is a nice generalization on it's own, but it's primarily prep work for my PPGTT support. Does this bother anyone? The only down side I can see is we waste 2k of cpu unmappable space (unless we have something else that is <= 2k and doesn't need to be page aligned). v2: Align PDEs to 64b in GTT Allocate the node dynamically so we can use drm_mm_put_block Now tested on IGT Allocate node at the top to avoid fragmentation (Chris) v3: Use Chris' top down allocator Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> --- drivers/gpu/drm/i915/i915_drv.h | 1 + drivers/gpu/drm/i915/i915_gem_gtt.c | 54 ++++++++++++++++++++++++++----------- include/drm/drm_mm.h | 3 +++ 3 files changed, 42 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 0a9c7cd..9ab6254 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -432,6 +432,7 @@ struct i915_gtt { #define gtt_total_entries(gtt) ((gtt).total >> PAGE_SHIFT) struct i915_hw_ppgtt { + struct drm_mm_node *node; struct drm_device *dev; unsigned num_pd_entries; struct page **pt_pages; diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index 975adaa..b825d7b 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -219,6 +219,8 @@ static void gen6_ppgtt_cleanup(struct i915_hw_ppgtt *ppgtt) { int i; + drm_mm_put_block(ppgtt->node); + if (ppgtt->pt_dma_addr) { for (i = 0; i < ppgtt->num_pd_entries; i++) pci_unmap_page(ppgtt->dev->pdev, @@ -235,16 +237,27 @@ static void gen6_ppgtt_cleanup(struct i915_hw_ppgtt *ppgtt) static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt) { +#define GEN6_PD_ALIGN (PAGE_SIZE * 16) +#define GEN6_PD_SIZE (512 * PAGE_SIZE) struct drm_device *dev = ppgtt->dev; struct drm_i915_private *dev_priv = dev->dev_private; - unsigned first_pd_entry_in_global_pt; int i; int ret = -ENOMEM; - /* ppgtt PDEs reside in the global gtt pagetable, which has 512*1024 - * entries. For aliasing ppgtt support we just steal them at the end for - * now. */ - first_pd_entry_in_global_pt = gtt_total_entries(dev_priv->gtt) - 512; + /* PPGTT PDEs reside in the GGTT stolen space, and consists of 512 + * entries. The allocator works in address space sizes, so it's + * multiplied by page size. We allocate at the top of the GTT to avoid + * fragmentation. + */ + BUG_ON(!drm_mm_initialized(&dev_priv->mm.gtt_space)); + ret = drm_mm_insert_node_in_range_generic(&dev_priv->mm.gtt_space, + ppgtt->node, GEN6_PD_SIZE, + GEN6_PD_ALIGN, 0, + dev_priv->gtt.mappable_end, + dev_priv->gtt.total - PAGE_SIZE, + DRM_MM_TOPDOWN); + if (ret) + return ret; ppgtt->num_pd_entries = 512; ppgtt->enable = gen6_ppgtt_enable; @@ -253,8 +266,10 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt) ppgtt->cleanup = gen6_ppgtt_cleanup; ppgtt->pt_pages = kzalloc(sizeof(struct page *)*ppgtt->num_pd_entries, GFP_KERNEL); - if (!ppgtt->pt_pages) + if (!ppgtt->pt_pages) { + drm_mm_put_block(ppgtt->node); return -ENOMEM; + } for (i = 0; i < ppgtt->num_pd_entries; i++) { ppgtt->pt_pages[i] = alloc_page(GFP_KERNEL); @@ -284,7 +299,10 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt) ppgtt->clear_range(ppgtt, 0, ppgtt->num_pd_entries*I915_PPGTT_PT_ENTRIES); - ppgtt->pd_offset = first_pd_entry_in_global_pt * sizeof(gen6_gtt_pte_t); + DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n", + ppgtt->node->size >> 20, + ppgtt->node->start / PAGE_SIZE); + ppgtt->pd_offset = ppgtt->node->start / PAGE_SIZE * sizeof(gen6_gtt_pte_t); return 0; @@ -315,6 +333,12 @@ static int i915_gem_init_aliasing_ppgtt(struct drm_device *dev) if (!ppgtt) return -ENOMEM; + ppgtt->node = kzalloc(sizeof(*ppgtt->node), GFP_KERNEL); + if (!ppgtt->node) { + kfree(ppgtt); + return -ENOMEM; + } + ppgtt->dev = dev; ppgtt->scratch_page_dma_addr = dev_priv->gtt.scratch_page_dma; @@ -323,10 +347,12 @@ static int i915_gem_init_aliasing_ppgtt(struct drm_device *dev) else BUG(); - if (ret) + if (ret) { + drm_mm_put_block(ppgtt->node); kfree(ppgtt); - else + } else { dev_priv->mm.aliasing_ppgtt = ppgtt; + } return ret; } @@ -407,6 +433,9 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev) dev_priv->gtt.gtt_clear_range(dev, dev_priv->gtt.start / PAGE_SIZE, dev_priv->gtt.total / PAGE_SIZE); + if (dev_priv->mm.aliasing_ppgtt) + gen6_write_pdes(dev_priv->mm.aliasing_ppgtt); + list_for_each_entry(obj, &dev_priv->mm.bound_list, gtt_list) { i915_gem_clflush_object(obj); i915_gem_gtt_bind_object(obj, obj->cache_level); @@ -651,12 +680,6 @@ void i915_gem_init_global_gtt(struct drm_device *dev) if (intel_enable_ppgtt(dev) && HAS_ALIASING_PPGTT(dev)) { int ret; - if (INTEL_INFO(dev)->gen <= 7) { - /* PPGTT pdes are stolen from global gtt ptes, so shrink the - * aperture accordingly when using aliasing ppgtt. */ - gtt_size -= 512 * PAGE_SIZE; - } - i915_gem_setup_global_gtt(dev, 0, mappable_size, gtt_size); ret = i915_gem_init_aliasing_ppgtt(dev); @@ -665,7 +688,6 @@ void i915_gem_init_global_gtt(struct drm_device *dev) DRM_ERROR("Aliased PPGTT setup failed %d\n", ret); drm_mm_takedown(&dev_priv->mm.gtt_space); - gtt_size += 512 * PAGE_SIZE; } i915_gem_setup_global_gtt(dev, 0, mappable_size, gtt_size); } diff --git a/include/drm/drm_mm.h b/include/drm/drm_mm.h index 752e846..ce7b13b 100644 --- a/include/drm/drm_mm.h +++ b/include/drm/drm_mm.h @@ -53,6 +53,9 @@ enum drm_mm_search_flags { DRM_MM_SEARCH_BELOW = 1<<1, }; +#define DRM_MM_BOTTOMUP DRM_MM_CREATE_DEFAULT, DRM_MM_SEARCH_DEFAULT +#define DRM_MM_TOPDOWN DRM_MM_CREATE_TOP, DRM_MM_SEARCH_BELOW + struct drm_mm_node { struct list_head node_list; struct list_head hole_stack; -- 1.8.2.1 ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 06/12] drm/i915: Use drm_mm for PPGTT PDEs 2013-04-24 6:15 ` [PATCH 06/12] drm/i915: Use drm_mm for PPGTT PDEs Ben Widawsky @ 2013-05-02 21:42 ` Jesse Barnes 0 siblings, 0 replies; 44+ messages in thread From: Jesse Barnes @ 2013-05-02 21:42 UTC (permalink / raw) To: Ben Widawsky; +Cc: Intel GFX On Tue, 23 Apr 2013 23:15:34 -0700 Ben Widawsky <ben@bwidawsk.net> wrote: > I think this is a nice generalization on it's own, but it's primarily > prep work for my PPGTT support. Does this bother anyone? > > The only down side I can see is we waste 2k of cpu unmappable space > (unless we have something else that is <= 2k and doesn't need to be page > aligned). > > v2: Align PDEs to 64b in GTT > Allocate the node dynamically so we can use drm_mm_put_block > Now tested on IGT > Allocate node at the top to avoid fragmentation (Chris) > > v3: Use Chris' top down allocator > > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Signed-off-by: Ben Widawsky <ben@bwidawsk.net> > --- > drivers/gpu/drm/i915/i915_drv.h | 1 + > drivers/gpu/drm/i915/i915_gem_gtt.c | 54 ++++++++++++++++++++++++++----------- > include/drm/drm_mm.h | 3 +++ > 3 files changed, 42 insertions(+), 16 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > index 0a9c7cd..9ab6254 100644 > --- a/drivers/gpu/drm/i915/i915_drv.h > +++ b/drivers/gpu/drm/i915/i915_drv.h > @@ -432,6 +432,7 @@ struct i915_gtt { > #define gtt_total_entries(gtt) ((gtt).total >> PAGE_SHIFT) > > struct i915_hw_ppgtt { > + struct drm_mm_node *node; > struct drm_device *dev; > unsigned num_pd_entries; > struct page **pt_pages; > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c > index 975adaa..b825d7b 100644 > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c > @@ -219,6 +219,8 @@ static void gen6_ppgtt_cleanup(struct i915_hw_ppgtt *ppgtt) > { > int i; > > + drm_mm_put_block(ppgtt->node); > + > if (ppgtt->pt_dma_addr) { > for (i = 0; i < ppgtt->num_pd_entries; i++) > pci_unmap_page(ppgtt->dev->pdev, > @@ -235,16 +237,27 @@ static void gen6_ppgtt_cleanup(struct i915_hw_ppgtt *ppgtt) > > static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt) > { > +#define GEN6_PD_ALIGN (PAGE_SIZE * 16) > +#define GEN6_PD_SIZE (512 * PAGE_SIZE) > struct drm_device *dev = ppgtt->dev; > struct drm_i915_private *dev_priv = dev->dev_private; > - unsigned first_pd_entry_in_global_pt; > int i; > int ret = -ENOMEM; > > - /* ppgtt PDEs reside in the global gtt pagetable, which has 512*1024 > - * entries. For aliasing ppgtt support we just steal them at the end for > - * now. */ > - first_pd_entry_in_global_pt = gtt_total_entries(dev_priv->gtt) - 512; > + /* PPGTT PDEs reside in the GGTT stolen space, and consists of 512 > + * entries. The allocator works in address space sizes, so it's > + * multiplied by page size. We allocate at the top of the GTT to avoid > + * fragmentation. > + */ > + BUG_ON(!drm_mm_initialized(&dev_priv->mm.gtt_space)); > + ret = drm_mm_insert_node_in_range_generic(&dev_priv->mm.gtt_space, > + ppgtt->node, GEN6_PD_SIZE, > + GEN6_PD_ALIGN, 0, > + dev_priv->gtt.mappable_end, > + dev_priv->gtt.total - PAGE_SIZE, > + DRM_MM_TOPDOWN); > + if (ret) > + return ret; I guess I won't see why you're doing this until later; presumably it makes PPGTT management easier in subsequent patches? > > ppgtt->num_pd_entries = 512; > ppgtt->enable = gen6_ppgtt_enable; > @@ -253,8 +266,10 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt) > ppgtt->cleanup = gen6_ppgtt_cleanup; > ppgtt->pt_pages = kzalloc(sizeof(struct page *)*ppgtt->num_pd_entries, > GFP_KERNEL); > - if (!ppgtt->pt_pages) > + if (!ppgtt->pt_pages) { > + drm_mm_put_block(ppgtt->node); > return -ENOMEM; > + } > > for (i = 0; i < ppgtt->num_pd_entries; i++) { > ppgtt->pt_pages[i] = alloc_page(GFP_KERNEL); > @@ -284,7 +299,10 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt) > ppgtt->clear_range(ppgtt, 0, > ppgtt->num_pd_entries*I915_PPGTT_PT_ENTRIES); > > - ppgtt->pd_offset = first_pd_entry_in_global_pt * sizeof(gen6_gtt_pte_t); > + DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n", > + ppgtt->node->size >> 20, > + ppgtt->node->start / PAGE_SIZE); > + ppgtt->pd_offset = ppgtt->node->start / PAGE_SIZE * sizeof(gen6_gtt_pte_t); > > return 0; > > @@ -315,6 +333,12 @@ static int i915_gem_init_aliasing_ppgtt(struct drm_device *dev) > if (!ppgtt) > return -ENOMEM; > > + ppgtt->node = kzalloc(sizeof(*ppgtt->node), GFP_KERNEL); > + if (!ppgtt->node) { > + kfree(ppgtt); > + return -ENOMEM; > + } > + Why not just put the drm_mm node directly into the hw_ppgtt struct instead of allocating it? Every context/ppgtt instance will need one, right? > ppgtt->dev = dev; > ppgtt->scratch_page_dma_addr = dev_priv->gtt.scratch_page_dma; > > @@ -323,10 +347,12 @@ static int i915_gem_init_aliasing_ppgtt(struct drm_device *dev) > else > BUG(); > > - if (ret) > + if (ret) { > + drm_mm_put_block(ppgtt->node); > kfree(ppgtt); > - else > + } else { > dev_priv->mm.aliasing_ppgtt = ppgtt; > + } > > return ret; > } > @@ -407,6 +433,9 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev) > dev_priv->gtt.gtt_clear_range(dev, dev_priv->gtt.start / PAGE_SIZE, > dev_priv->gtt.total / PAGE_SIZE); > > + if (dev_priv->mm.aliasing_ppgtt) > + gen6_write_pdes(dev_priv->mm.aliasing_ppgtt); > + > list_for_each_entry(obj, &dev_priv->mm.bound_list, gtt_list) { > i915_gem_clflush_object(obj); > i915_gem_gtt_bind_object(obj, obj->cache_level); > @@ -651,12 +680,6 @@ void i915_gem_init_global_gtt(struct drm_device *dev) > if (intel_enable_ppgtt(dev) && HAS_ALIASING_PPGTT(dev)) { > int ret; > > - if (INTEL_INFO(dev)->gen <= 7) { > - /* PPGTT pdes are stolen from global gtt ptes, so shrink the > - * aperture accordingly when using aliasing ppgtt. */ > - gtt_size -= 512 * PAGE_SIZE; > - } > - > i915_gem_setup_global_gtt(dev, 0, mappable_size, gtt_size); > > ret = i915_gem_init_aliasing_ppgtt(dev); > @@ -665,7 +688,6 @@ void i915_gem_init_global_gtt(struct drm_device *dev) > > DRM_ERROR("Aliased PPGTT setup failed %d\n", ret); > drm_mm_takedown(&dev_priv->mm.gtt_space); > - gtt_size += 512 * PAGE_SIZE; > } > i915_gem_setup_global_gtt(dev, 0, mappable_size, gtt_size); > } > diff --git a/include/drm/drm_mm.h b/include/drm/drm_mm.h > index 752e846..ce7b13b 100644 > --- a/include/drm/drm_mm.h > +++ b/include/drm/drm_mm.h > @@ -53,6 +53,9 @@ enum drm_mm_search_flags { > DRM_MM_SEARCH_BELOW = 1<<1, > }; > > +#define DRM_MM_BOTTOMUP DRM_MM_CREATE_DEFAULT, DRM_MM_SEARCH_DEFAULT > +#define DRM_MM_TOPDOWN DRM_MM_CREATE_TOP, DRM_MM_SEARCH_BELOW > + > struct drm_mm_node { > struct list_head node_list; > struct list_head hole_stack; These last few hunks seem like they belong in different patches? With the above addressed: Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> -- Jesse Barnes, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 07/12] drm/i915: Use PDEs as the guard page 2013-04-24 6:15 [PATCH 00/12] [RFC] PPGTT prep patches part 1 Ben Widawsky ` (5 preceding siblings ...) 2013-04-24 6:15 ` [PATCH 06/12] drm/i915: Use drm_mm for PPGTT PDEs Ben Widawsky @ 2013-04-24 6:15 ` Ben Widawsky 2013-04-24 6:15 ` [PATCH 08/12] drm/i915: Update context_fini Ben Widawsky ` (6 subsequent siblings) 13 siblings, 0 replies; 44+ messages in thread From: Ben Widawsky @ 2013-04-24 6:15 UTC (permalink / raw) To: Intel GFX; +Cc: Ben Widawsky Scary alert. AFAICT, we simply do not need the guard page if we have the PDEs at the top since all prefetching is CS related, and it should always be safe to prefetch into a PDE (provided the PDE is valid). The PDE fetching itself should not be subject to the prefetching problem, though without full PPGTT support, we may never know. Potentially this is achievable even without using the PPGTT reworks I've done prior to this, but I figure there is no point in rocking the boat without the PPGTT generalizations I've submitted. There is a very simple bool left to help test if things break and we fear guard page related stuff. Of course the commit is easily revertable as well. One reason why it's desirable to do this is with the drm_mm_node allocation I've done with the PDEs, we fragment a bit of space finding the the first properly aligned address (if my math is right, we get 11 spare pages fragmented at the top of our address space). This problem didn't exist with the original implementation that stole the PDEs out from under the drm_mm. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> --- drivers/gpu/drm/i915/i915_drv.h | 3 +- drivers/gpu/drm/i915/i915_gem.c | 2 +- drivers/gpu/drm/i915/i915_gem_gtt.c | 94 +++++++++++++++++++++---------------- 3 files changed, 56 insertions(+), 43 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 9ab6254..9717c69 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1723,7 +1723,8 @@ void i915_gem_gtt_unbind_object(struct drm_i915_gem_object *obj); void i915_gem_gtt_finish_object(struct drm_i915_gem_object *obj); void i915_gem_init_global_gtt(struct drm_device *dev); void i915_gem_setup_global_gtt(struct drm_device *dev, unsigned long start, - unsigned long mappable_end, unsigned long end); + unsigned long mappable_end, unsigned long end, + bool guard_page); int i915_gem_gtt_init(struct drm_device *dev); static inline void i915_gem_chipset_flush(struct drm_device *dev) { diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index af56802..9c5eaf0 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -161,7 +161,7 @@ i915_gem_init_ioctl(struct drm_device *dev, void *data, mutex_lock(&dev->struct_mutex); i915_gem_setup_global_gtt(dev, args->gtt_start, args->gtt_end, - args->gtt_end); + args->gtt_end, true); dev_priv->gtt.mappable_end = args->gtt_end; mutex_unlock(&dev->struct_mutex); diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index b825d7b..39ac37d 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -254,7 +254,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt) ppgtt->node, GEN6_PD_SIZE, GEN6_PD_ALIGN, 0, dev_priv->gtt.mappable_end, - dev_priv->gtt.total - PAGE_SIZE, + dev_priv->gtt.total, DRM_MM_TOPDOWN); if (ret) return ret; @@ -323,21 +323,14 @@ err_pt_alloc: return ret; } -static int i915_gem_init_aliasing_ppgtt(struct drm_device *dev) +int i915_gem_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt) { struct drm_i915_private *dev_priv = dev->dev_private; - struct i915_hw_ppgtt *ppgtt; int ret; - ppgtt = kzalloc(sizeof(*ppgtt), GFP_KERNEL); - if (!ppgtt) - return -ENOMEM; - ppgtt->node = kzalloc(sizeof(*ppgtt->node), GFP_KERNEL); - if (!ppgtt->node) { - kfree(ppgtt); + if (!ppgtt->node) return -ENOMEM; - } ppgtt->dev = dev; ppgtt->scratch_page_dma_addr = dev_priv->gtt.scratch_page_dma; @@ -347,12 +340,8 @@ static int i915_gem_init_aliasing_ppgtt(struct drm_device *dev) else BUG(); - if (ret) { + if (ret) drm_mm_put_block(ppgtt->node); - kfree(ppgtt); - } else { - dev_priv->mm.aliasing_ppgtt = ppgtt; - } return ret; } @@ -599,10 +588,26 @@ static void i915_gtt_color_adjust(struct drm_mm_node *node, *end -= 4096; } } + +static bool intel_enable_ppgtt(struct drm_device *dev) +{ + if (i915_enable_ppgtt >= 0) + return i915_enable_ppgtt; + +#ifdef CONFIG_INTEL_IOMMU + /* Disable ppgtt on SNB if VT-d is on. */ + if (INTEL_INFO(dev)->gen == 6 && intel_iommu_gfx_mapped) + return false; +#endif + + return true; +} + void i915_gem_setup_global_gtt(struct drm_device *dev, unsigned long start, unsigned long mappable_end, - unsigned long end) + unsigned long end, + bool guard_page) { /* Let GEM Manage all of the aperture. * @@ -620,8 +625,12 @@ void i915_gem_setup_global_gtt(struct drm_device *dev, BUG_ON(mappable_end > end); - /* Subtract the guard page ... */ - drm_mm_init(&dev_priv->mm.gtt_space, start, end - start - PAGE_SIZE); + if (!guard_page) + drm_mm_init(&dev_priv->mm.gtt_space, start, end - start); + else + drm_mm_init(&dev_priv->mm.gtt_space, start, + end - start - PAGE_SIZE); /* Guard page */ + if (!HAS_LLC(dev)) dev_priv->mm.gtt_space.color_adjust = i915_gtt_color_adjust; @@ -650,46 +659,49 @@ void i915_gem_setup_global_gtt(struct drm_device *dev, (hole_end-hole_start) / PAGE_SIZE); } - /* And finally clear the reserved guard page */ - dev_priv->gtt.gtt_clear_range(dev, end / PAGE_SIZE - 1, 1); -} - -static bool -intel_enable_ppgtt(struct drm_device *dev) -{ - if (i915_enable_ppgtt >= 0) - return i915_enable_ppgtt; - -#ifdef CONFIG_INTEL_IOMMU - /* Disable ppgtt on SNB if VT-d is on. */ - if (INTEL_INFO(dev)->gen == 6 && intel_iommu_gfx_mapped) - return false; -#endif - - return true; + /* And finally clear the reserved guard page (if exists) */ + if (guard_page) + dev_priv->gtt.gtt_clear_range(dev, end / PAGE_SIZE - 1, 1); } void i915_gem_init_global_gtt(struct drm_device *dev) { struct drm_i915_private *dev_priv = dev->dev_private; unsigned long gtt_size, mappable_size; + int ret = 0; gtt_size = dev_priv->gtt.total; mappable_size = dev_priv->gtt.mappable_end; if (intel_enable_ppgtt(dev) && HAS_ALIASING_PPGTT(dev)) { - int ret; + struct i915_hw_ppgtt *ppgtt; - i915_gem_setup_global_gtt(dev, 0, mappable_size, gtt_size); + i915_gem_setup_global_gtt(dev, 0, mappable_size, gtt_size, false); - ret = i915_gem_init_aliasing_ppgtt(dev); - if (!ret) - return; + ppgtt = kzalloc(sizeof(*ppgtt), GFP_KERNEL); + if (!ppgtt) { + ret = -ENOMEM; + goto ggtt_only; + } + + ret = i915_gem_ppgtt_init(dev, ppgtt); + if (ret) + goto ggtt_only; + dev_priv->mm.aliasing_ppgtt = ppgtt; + return; + } + +/* XXX: We need to takedown the drm_mm and have this fall back because of the + * conditional use of the guard page. + * TODO: It's a bit hackish and could use cleanup. + */ +ggtt_only: + if (ret) { DRM_ERROR("Aliased PPGTT setup failed %d\n", ret); drm_mm_takedown(&dev_priv->mm.gtt_space); } - i915_gem_setup_global_gtt(dev, 0, mappable_size, gtt_size); + i915_gem_setup_global_gtt(dev, 0, mappable_size, gtt_size, true); } static int setup_scratch_page(struct drm_device *dev) -- 1.8.2.1 ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 08/12] drm/i915: Update context_fini 2013-04-24 6:15 [PATCH 00/12] [RFC] PPGTT prep patches part 1 Ben Widawsky ` (6 preceding siblings ...) 2013-04-24 6:15 ` [PATCH 07/12] drm/i915: Use PDEs as the guard page Ben Widawsky @ 2013-04-24 6:15 ` Ben Widawsky 2013-04-24 15:11 ` Mika Kuoppala 2013-04-24 6:15 ` [PATCH 09/12] drm/i915: Split context enabling from init Ben Widawsky ` (5 subsequent siblings) 13 siblings, 1 reply; 44+ messages in thread From: Ben Widawsky @ 2013-04-24 6:15 UTC (permalink / raw) To: Intel GFX; +Cc: Ben Widawsky Make resets optional for fini. fini is only ever called in module_unload. It was therefore a good assumption that the GPU reset (see the comment in the function) was what we wanted. With an upcoming patch, we're going to add a few more callsites, one of which is GPU reset, where doing the extra reset isn't usual. On that same note, with more callers it makes sense to make the default context state consistent with the actual state (via NULLing the pointer). Signed-off-by: Ben Widawsky <ben@bwidawsk.net> --- drivers/gpu/drm/i915/i915_dma.c | 2 +- drivers/gpu/drm/i915/i915_drv.h | 2 +- drivers/gpu/drm/i915/i915_gem_context.c | 7 +++++-- 3 files changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c index 3b315ba..a141b8a 100644 --- a/drivers/gpu/drm/i915/i915_dma.c +++ b/drivers/gpu/drm/i915/i915_dma.c @@ -1764,7 +1764,7 @@ int i915_driver_unload(struct drm_device *dev) mutex_lock(&dev->struct_mutex); i915_gem_free_all_phys_object(dev); i915_gem_cleanup_ringbuffer(dev); - i915_gem_context_fini(dev); + i915_gem_context_fini(dev, true); mutex_unlock(&dev->struct_mutex); i915_gem_cleanup_aliasing_ppgtt(dev); i915_gem_cleanup_stolen(dev); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 9717c69..618845e 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1698,7 +1698,7 @@ struct dma_buf *i915_gem_prime_export(struct drm_device *dev, /* i915_gem_context.c */ void i915_gem_context_init(struct drm_device *dev); -void i915_gem_context_fini(struct drm_device *dev); +void i915_gem_context_fini(struct drm_device *dev, bool reset); void i915_gem_context_close(struct drm_device *dev, struct drm_file *file); int i915_switch_context(struct intel_ring_buffer *ring, struct drm_file *file, int to_id); diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index 411ace0..6030d83 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -259,7 +259,7 @@ void i915_gem_context_init(struct drm_device *dev) DRM_DEBUG_DRIVER("HW context support initialized\n"); } -void i915_gem_context_fini(struct drm_device *dev) +void i915_gem_context_fini(struct drm_device *dev, bool reset) { struct drm_i915_private *dev_priv = dev->dev_private; @@ -269,11 +269,14 @@ void i915_gem_context_fini(struct drm_device *dev) /* The only known way to stop the gpu from accessing the hw context is * to reset it. Do this as the very last operation to avoid confusing * other code, leading to spurious errors. */ - intel_gpu_reset(dev); + if (reset) + intel_gpu_reset(dev); i915_gem_object_unpin(dev_priv->ring[RCS].default_context->obj); do_destroy(dev_priv->ring[RCS].default_context); + + dev_priv->ring[RCS].default_context = NULL; } static int context_idr_cleanup(int id, void *p, void *data) -- 1.8.2.1 ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 08/12] drm/i915: Update context_fini 2013-04-24 6:15 ` [PATCH 08/12] drm/i915: Update context_fini Ben Widawsky @ 2013-04-24 15:11 ` Mika Kuoppala 2013-04-25 4:11 ` Ben Widawsky 0 siblings, 1 reply; 44+ messages in thread From: Mika Kuoppala @ 2013-04-24 15:11 UTC (permalink / raw) To: Intel GFX; +Cc: Ben Widawsky Ben Widawsky <ben@bwidawsk.net> writes: > Make resets optional for fini. fini is only ever called in > module_unload. It was therefore a good assumption that the GPU reset > (see the comment in the function) was what we wanted. With an upcoming > patch, we're going to add a few more callsites, one of which is GPU > reset, where doing the extra reset isn't usual. > > On that same note, with more callers it makes sense to make the default > context state consistent with the actual state (via NULLing the > pointer). > > Signed-off-by: Ben Widawsky <ben@bwidawsk.net> > --- > drivers/gpu/drm/i915/i915_dma.c | 2 +- > drivers/gpu/drm/i915/i915_drv.h | 2 +- > drivers/gpu/drm/i915/i915_gem_context.c | 7 +++++-- > 3 files changed, 7 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c > index 3b315ba..a141b8a 100644 > --- a/drivers/gpu/drm/i915/i915_dma.c > +++ b/drivers/gpu/drm/i915/i915_dma.c > @@ -1764,7 +1764,7 @@ int i915_driver_unload(struct drm_device *dev) > mutex_lock(&dev->struct_mutex); > i915_gem_free_all_phys_object(dev); > i915_gem_cleanup_ringbuffer(dev); > - i915_gem_context_fini(dev); > + i915_gem_context_fini(dev, true); > mutex_unlock(&dev->struct_mutex); > i915_gem_cleanup_aliasing_ppgtt(dev); > i915_gem_cleanup_stolen(dev); > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > index 9717c69..618845e 100644 > --- a/drivers/gpu/drm/i915/i915_drv.h > +++ b/drivers/gpu/drm/i915/i915_drv.h > @@ -1698,7 +1698,7 @@ struct dma_buf *i915_gem_prime_export(struct drm_device *dev, > > /* i915_gem_context.c */ > void i915_gem_context_init(struct drm_device *dev); > -void i915_gem_context_fini(struct drm_device *dev); > +void i915_gem_context_fini(struct drm_device *dev, bool reset); > void i915_gem_context_close(struct drm_device *dev, struct drm_file *file); > int i915_switch_context(struct intel_ring_buffer *ring, > struct drm_file *file, int to_id); > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c > index 411ace0..6030d83 100644 > --- a/drivers/gpu/drm/i915/i915_gem_context.c > +++ b/drivers/gpu/drm/i915/i915_gem_context.c > @@ -259,7 +259,7 @@ void i915_gem_context_init(struct drm_device *dev) > DRM_DEBUG_DRIVER("HW context support initialized\n"); > } > > -void i915_gem_context_fini(struct drm_device *dev) > +void i915_gem_context_fini(struct drm_device *dev, bool reset) > { > struct drm_i915_private *dev_priv = dev->dev_private; > > @@ -269,11 +269,14 @@ void i915_gem_context_fini(struct drm_device *dev) > /* The only known way to stop the gpu from accessing the hw context is > * to reset it. Do this as the very last operation to avoid confusing > * other code, leading to spurious errors. */ Should we switch to default context in here to be sure that the running context will get unreferenced correctly?... > - intel_gpu_reset(dev); > + if (reset) > + intel_gpu_reset(dev); > > i915_gem_object_unpin(dev_priv->ring[RCS].default_context->obj); > ...and unreference the default context here so that it will get freed on module unload. --Mika > do_destroy(dev_priv->ring[RCS].default_context); > + > + dev_priv->ring[RCS].default_context = NULL; > } > > static int context_idr_cleanup(int id, void *p, void *data) > -- > 1.8.2.1 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 08/12] drm/i915: Update context_fini 2013-04-24 15:11 ` Mika Kuoppala @ 2013-04-25 4:11 ` Ben Widawsky 2013-04-25 5:17 ` Ben Widawsky 2013-04-25 15:01 ` Mika Kuoppala 0 siblings, 2 replies; 44+ messages in thread From: Ben Widawsky @ 2013-04-25 4:11 UTC (permalink / raw) To: Mika Kuoppala; +Cc: Intel GFX On Wed, Apr 24, 2013 at 06:11:40PM +0300, Mika Kuoppala wrote: > Ben Widawsky <ben@bwidawsk.net> writes: > > > Make resets optional for fini. fini is only ever called in > > module_unload. It was therefore a good assumption that the GPU reset > > (see the comment in the function) was what we wanted. With an upcoming > > patch, we're going to add a few more callsites, one of which is GPU > > reset, where doing the extra reset isn't usual. > > > > On that same note, with more callers it makes sense to make the default > > context state consistent with the actual state (via NULLing the > > pointer). > > > > Signed-off-by: Ben Widawsky <ben@bwidawsk.net> > > --- > > drivers/gpu/drm/i915/i915_dma.c | 2 +- > > drivers/gpu/drm/i915/i915_drv.h | 2 +- > > drivers/gpu/drm/i915/i915_gem_context.c | 7 +++++-- > > 3 files changed, 7 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c > > index 3b315ba..a141b8a 100644 > > --- a/drivers/gpu/drm/i915/i915_dma.c > > +++ b/drivers/gpu/drm/i915/i915_dma.c > > @@ -1764,7 +1764,7 @@ int i915_driver_unload(struct drm_device *dev) > > mutex_lock(&dev->struct_mutex); > > i915_gem_free_all_phys_object(dev); > > i915_gem_cleanup_ringbuffer(dev); > > - i915_gem_context_fini(dev); > > + i915_gem_context_fini(dev, true); > > mutex_unlock(&dev->struct_mutex); > > i915_gem_cleanup_aliasing_ppgtt(dev); > > i915_gem_cleanup_stolen(dev); > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > > index 9717c69..618845e 100644 > > --- a/drivers/gpu/drm/i915/i915_drv.h > > +++ b/drivers/gpu/drm/i915/i915_drv.h > > @@ -1698,7 +1698,7 @@ struct dma_buf *i915_gem_prime_export(struct drm_device *dev, > > > > /* i915_gem_context.c */ > > void i915_gem_context_init(struct drm_device *dev); > > -void i915_gem_context_fini(struct drm_device *dev); > > +void i915_gem_context_fini(struct drm_device *dev, bool reset); > > void i915_gem_context_close(struct drm_device *dev, struct drm_file *file); > > int i915_switch_context(struct intel_ring_buffer *ring, > > struct drm_file *file, int to_id); > > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c > > index 411ace0..6030d83 100644 > > --- a/drivers/gpu/drm/i915/i915_gem_context.c > > +++ b/drivers/gpu/drm/i915/i915_gem_context.c > > @@ -259,7 +259,7 @@ void i915_gem_context_init(struct drm_device *dev) > > DRM_DEBUG_DRIVER("HW context support initialized\n"); > > } > > > > -void i915_gem_context_fini(struct drm_device *dev) > > +void i915_gem_context_fini(struct drm_device *dev, bool reset) > > { > > struct drm_i915_private *dev_priv = dev->dev_private; > > > > @@ -269,11 +269,14 @@ void i915_gem_context_fini(struct drm_device *dev) > > /* The only known way to stop the gpu from accessing the hw context is > > * to reset it. Do this as the very last operation to avoid confusing > > * other code, leading to spurious errors. */ > > Should we switch to default context in here to be sure that > the running context will get unreferenced correctly?... I'm confused if you're talking about the object refcount, or the context refcount. The latter isn't yet introduced in this part of the series. But maybe I've missed your point, so let's discuss... There are 3 ways we can end up at fini(): 1. failed during some part of driver initialization 2. unload Conceptually, 1 and 2 are the same, the GPU is idle, (explicitly so in the latter case) and so therefore we know the default context must be the last context context[1]. We know the retire worker has either been called, or we've not yet submitted work, and we should therefore be able to assert the object refcount is exactly 1. Given our discussion and the new reset parameter in fini, I can assert last_context_obj == default_context->obj in the !reset case, and that the refcount is 1. 3. reset In this case, it's a good question. Normally reset is called on a hang, and we can't expect to be able to switch. I think reset_ring_lists does everything I need, and then it follows into case 1 and 2. If I have missed something though, could you please explain it a bit further fo rme? > > > - intel_gpu_reset(dev); + if (reset) + > > intel_gpu_reset(dev); > > > > i915_gem_object_unpin(dev_priv->ring[RCS].default_context->obj); > > > > ...and unreference the default context here so that it will get freed > on module unload. Can you explain what I'm missing. Doesn't do_destroy just below take care of it? > > --Mika For the context reference counting case, I think the questions are still applicable, and the answers are similar. You deref in reset_ring_lists, init fail and unload are the same... If you want me to use the context_unref interface, I can, but I would want to make unref return the val of kref_put() and do while(!context_unref(...)) just to be safe. You can take a look here to see how that's progressing: http://cgit.freedesktop.org/~bwidawsk/drm-intel/log/?h=ppgtt-ctx It's quite volatile though, so be warned. > > > do_destroy(dev_priv->ring[RCS].default_context); > > + > > + dev_priv->ring[RCS].default_context = NULL; > > } > > > > static int context_idr_cleanup(int id, void *p, void *data) > > -- > > 1.8.2.1 > > > > _______________________________________________ > > Intel-gfx mailing list > > Intel-gfx@lists.freedesktop.org > > http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Ben Widawsky, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 08/12] drm/i915: Update context_fini 2013-04-25 4:11 ` Ben Widawsky @ 2013-04-25 5:17 ` Ben Widawsky 2013-04-25 15:01 ` Mika Kuoppala 1 sibling, 0 replies; 44+ messages in thread From: Ben Widawsky @ 2013-04-25 5:17 UTC (permalink / raw) To: Mika Kuoppala; +Cc: Intel GFX On Wed, Apr 24, 2013 at 09:11:02PM -0700, Ben Widawsky wrote: > On Wed, Apr 24, 2013 at 06:11:40PM +0300, Mika Kuoppala wrote: > > Ben Widawsky <ben@bwidawsk.net> writes: > > > > > Make resets optional for fini. fini is only ever called in > > > module_unload. It was therefore a good assumption that the GPU reset > > > (see the comment in the function) was what we wanted. With an upcoming > > > patch, we're going to add a few more callsites, one of which is GPU > > > reset, where doing the extra reset isn't usual. > > > > > > On that same note, with more callers it makes sense to make the default > > > context state consistent with the actual state (via NULLing the > > > pointer). > > > > > > Signed-off-by: Ben Widawsky <ben@bwidawsk.net> > > > --- > > > drivers/gpu/drm/i915/i915_dma.c | 2 +- > > > drivers/gpu/drm/i915/i915_drv.h | 2 +- > > > drivers/gpu/drm/i915/i915_gem_context.c | 7 +++++-- > > > 3 files changed, 7 insertions(+), 4 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c > > > index 3b315ba..a141b8a 100644 > > > --- a/drivers/gpu/drm/i915/i915_dma.c > > > +++ b/drivers/gpu/drm/i915/i915_dma.c > > > @@ -1764,7 +1764,7 @@ int i915_driver_unload(struct drm_device *dev) > > > mutex_lock(&dev->struct_mutex); > > > i915_gem_free_all_phys_object(dev); > > > i915_gem_cleanup_ringbuffer(dev); > > > - i915_gem_context_fini(dev); > > > + i915_gem_context_fini(dev, true); > > > mutex_unlock(&dev->struct_mutex); > > > i915_gem_cleanup_aliasing_ppgtt(dev); > > > i915_gem_cleanup_stolen(dev); > > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > > > index 9717c69..618845e 100644 > > > --- a/drivers/gpu/drm/i915/i915_drv.h > > > +++ b/drivers/gpu/drm/i915/i915_drv.h > > > @@ -1698,7 +1698,7 @@ struct dma_buf *i915_gem_prime_export(struct drm_device *dev, > > > > > > /* i915_gem_context.c */ > > > void i915_gem_context_init(struct drm_device *dev); > > > -void i915_gem_context_fini(struct drm_device *dev); > > > +void i915_gem_context_fini(struct drm_device *dev, bool reset); > > > void i915_gem_context_close(struct drm_device *dev, struct drm_file *file); > > > int i915_switch_context(struct intel_ring_buffer *ring, > > > struct drm_file *file, int to_id); > > > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c > > > index 411ace0..6030d83 100644 > > > --- a/drivers/gpu/drm/i915/i915_gem_context.c > > > +++ b/drivers/gpu/drm/i915/i915_gem_context.c > > > @@ -259,7 +259,7 @@ void i915_gem_context_init(struct drm_device *dev) > > > DRM_DEBUG_DRIVER("HW context support initialized\n"); > > > } > > > > > > -void i915_gem_context_fini(struct drm_device *dev) > > > +void i915_gem_context_fini(struct drm_device *dev, bool reset) > > > { > > > struct drm_i915_private *dev_priv = dev->dev_private; > > > > > > @@ -269,11 +269,14 @@ void i915_gem_context_fini(struct drm_device *dev) > > > /* The only known way to stop the gpu from accessing the hw context is > > > * to reset it. Do this as the very last operation to avoid confusing > > > * other code, leading to spurious errors. */ > > > > Should we switch to default context in here to be sure that > > the running context will get unreferenced correctly?... > > I'm confused if you're talking about the object refcount, or the context > refcount. The latter isn't yet introduced in this part of the series. > But maybe I've missed your point, so let's discuss... > > There are 3 ways we can end up at fini(): > 1. failed during some part of driver initialization > 2. unload > Conceptually, 1 and 2 are the same, the GPU is idle, (explicitly so in > the latter case) and so therefore we know the default context must be > the last context context[1]. We know the retire worker has either been > called, or we've not yet submitted work, and we should therefore be > able to assert the object refcount is exactly 1. > > Given our discussion and the new reset parameter in fini, I can assert > last_context_obj == default_context->obj in the !reset case, and that > the refcount is 1. I should add to this statement: Getting the object reference count isn't possible, and it'd involve changing drm_gem_object_unreference to return the value of kref_put() which I'd prefer not to d. Also the other assertion requires some extra artificial code to actually assert last_context_obj == default_context->obj because the init case will only work after enable(). > > 3. reset > In this case, it's a good question. Normally reset is called on a > hang, and we can't expect to be able to switch. I think > reset_ring_lists does everything I need, and then it follows into case > 1 and 2. If I have missed something though, could you please explain > it a bit further fo rme? > > > > > > - intel_gpu_reset(dev); + if (reset) + > > > intel_gpu_reset(dev); > > > > > > i915_gem_object_unpin(dev_priv->ring[RCS].default_context->obj); > > > > > > > ...and unreference the default context here so that it will get freed > > on module unload. > > Can you explain what I'm missing. Doesn't do_destroy just below take > care of it? > > > > > > --Mika > > For the context reference counting case, I think the questions are still > applicable, and the answers are similar. You deref in reset_ring_lists, > init fail and unload are the same... If you want me to use the > context_unref interface, I can, but I would want to make unref return > the val of kref_put() and do while(!context_unref(...)) just to be safe. > > > You can take a look here to see how that's progressing: > http://cgit.freedesktop.org/~bwidawsk/drm-intel/log/?h=ppgtt-ctx > > It's quite volatile though, so be warned. > > > > > > > do_destroy(dev_priv->ring[RCS].default_context); > > > + > > > + dev_priv->ring[RCS].default_context = NULL; > > > } > > > > > > static int context_idr_cleanup(int id, void *p, void *data) > > > -- > > > 1.8.2.1 > > > > > > _______________________________________________ > > > Intel-gfx mailing list > > > Intel-gfx@lists.freedesktop.org > > > http://lists.freedesktop.org/mailman/listinfo/intel-gfx > > -- > Ben Widawsky, Intel Open Source Technology Center -- Ben Widawsky, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 08/12] drm/i915: Update context_fini 2013-04-25 4:11 ` Ben Widawsky 2013-04-25 5:17 ` Ben Widawsky @ 2013-04-25 15:01 ` Mika Kuoppala 2013-04-25 17:22 ` Ben Widawsky 1 sibling, 1 reply; 44+ messages in thread From: Mika Kuoppala @ 2013-04-25 15:01 UTC (permalink / raw) To: Ben Widawsky; +Cc: Intel GFX Ben Widawsky <ben@bwidawsk.net> writes: > On Wed, Apr 24, 2013 at 06:11:40PM +0300, Mika Kuoppala wrote: >> Ben Widawsky <ben@bwidawsk.net> writes: >> >> > Make resets optional for fini. fini is only ever called in >> > module_unload. It was therefore a good assumption that the GPU reset >> > (see the comment in the function) was what we wanted. With an upcoming >> > patch, we're going to add a few more callsites, one of which is GPU >> > reset, where doing the extra reset isn't usual. >> > >> > On that same note, with more callers it makes sense to make the default >> > context state consistent with the actual state (via NULLing the >> > pointer). >> > >> > Signed-off-by: Ben Widawsky <ben@bwidawsk.net> >> > --- >> > drivers/gpu/drm/i915/i915_dma.c | 2 +- >> > drivers/gpu/drm/i915/i915_drv.h | 2 +- >> > drivers/gpu/drm/i915/i915_gem_context.c | 7 +++++-- >> > 3 files changed, 7 insertions(+), 4 deletions(-) >> > >> > diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c >> > index 3b315ba..a141b8a 100644 >> > --- a/drivers/gpu/drm/i915/i915_dma.c >> > +++ b/drivers/gpu/drm/i915/i915_dma.c >> > @@ -1764,7 +1764,7 @@ int i915_driver_unload(struct drm_device *dev) >> > mutex_lock(&dev->struct_mutex); >> > i915_gem_free_all_phys_object(dev); >> > i915_gem_cleanup_ringbuffer(dev); >> > - i915_gem_context_fini(dev); >> > + i915_gem_context_fini(dev, true); >> > mutex_unlock(&dev->struct_mutex); >> > i915_gem_cleanup_aliasing_ppgtt(dev); >> > i915_gem_cleanup_stolen(dev); >> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h >> > index 9717c69..618845e 100644 >> > --- a/drivers/gpu/drm/i915/i915_drv.h >> > +++ b/drivers/gpu/drm/i915/i915_drv.h >> > @@ -1698,7 +1698,7 @@ struct dma_buf *i915_gem_prime_export(struct drm_device *dev, >> > >> > /* i915_gem_context.c */ >> > void i915_gem_context_init(struct drm_device *dev); >> > -void i915_gem_context_fini(struct drm_device *dev); >> > +void i915_gem_context_fini(struct drm_device *dev, bool reset); >> > void i915_gem_context_close(struct drm_device *dev, struct drm_file *file); >> > int i915_switch_context(struct intel_ring_buffer *ring, >> > struct drm_file *file, int to_id); >> > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c >> > index 411ace0..6030d83 100644 >> > --- a/drivers/gpu/drm/i915/i915_gem_context.c >> > +++ b/drivers/gpu/drm/i915/i915_gem_context.c >> > @@ -259,7 +259,7 @@ void i915_gem_context_init(struct drm_device *dev) >> > DRM_DEBUG_DRIVER("HW context support initialized\n"); >> > } >> > >> > -void i915_gem_context_fini(struct drm_device *dev) >> > +void i915_gem_context_fini(struct drm_device *dev, bool reset) >> > { >> > struct drm_i915_private *dev_priv = dev->dev_private; >> > >> > @@ -269,11 +269,14 @@ void i915_gem_context_fini(struct drm_device *dev) >> > /* The only known way to stop the gpu from accessing the hw context is >> > * to reset it. Do this as the very last operation to avoid confusing >> > * other code, leading to spurious errors. */ >> >> Should we switch to default context in here to be sure that >> the running context will get unreferenced correctly?... > > I'm confused if you're talking about the object refcount, or the context > refcount. The latter isn't yet introduced in this part of the series. > But maybe I've missed your point, so let's discuss... I should have been more specific. Object refcount it is. > There are 3 ways we can end up at fini(): > 1. failed during some part of driver initialization > 2. unload > Conceptually, 1 and 2 are the same, the GPU is idle, (explicitly so in > the latter case) and so therefore we know the default context must be > the last context context[1]. We know the retire worker has either been > called, or we've not yet submitted work, and we should therefore be > able to assert the object refcount is exactly 1. > > Given our discussion and the new reset parameter in fini, I can assert > last_context_obj == default_context->obj in the !reset case, and that > the refcount is 1. Yes. I forgot the gpu_idle which indeed switches to default context. So no need to switch. > 3. reset > In this case, it's a good question. Normally reset is called on a > hang, and we can't expect to be able to switch. I think > reset_ring_lists does everything I need, and then it follows into case > 1 and 2. If I have missed something though, could you please explain > it a bit further fo rme? My only concern was that the drivers bookkeepping of what context it has currently switched to the hw, is out of sync after reset. And as a result first call to i915_switch_context might skip the switch even if it was needed. >> >> > - intel_gpu_reset(dev); + if (reset) + >> > intel_gpu_reset(dev); >> > >> > i915_gem_object_unpin(dev_priv->ring[RCS].default_context->obj); >> > >> >> ...and unreference the default context here so that it will get freed >> on module unload. > > Can you explain what I'm missing. Doesn't do_destroy just below take > care of it? do_destroy is enough for all the other contexts that were created by the ioctl. However the default context is different as we are owning it, adding to the refcount. So we need to unreference it also once before calling do destroy. This is what i think is the cause as i see default context leaking with the current codebase, on module unload. Adding + drm_gem_object_unreference(&dev_priv->ring[RCS].default_context->obj->base); after the unpin makes the leak go away. -Mika >> > > >> --Mika > > For the context reference counting case, I think the questions are still > applicable, and the answers are similar. You deref in reset_ring_lists, > init fail and unload are the same... If you want me to use the > context_unref interface, I can, but I would want to make unref return > the val of kref_put() and do while(!context_unref(...)) just to be safe. > > > You can take a look here to see how that's progressing: > http://cgit.freedesktop.org/~bwidawsk/drm-intel/log/?h=ppgtt-ctx > > It's quite volatile though, so be warned. > > >> >> > do_destroy(dev_priv->ring[RCS].default_context); >> > + >> > + dev_priv->ring[RCS].default_context = NULL; >> > } >> > >> > static int context_idr_cleanup(int id, void *p, void *data) >> > -- >> > 1.8.2.1 >> > >> > _______________________________________________ >> > Intel-gfx mailing list >> > Intel-gfx@lists.freedesktop.org >> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx > > -- > Ben Widawsky, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 08/12] drm/i915: Update context_fini 2013-04-25 15:01 ` Mika Kuoppala @ 2013-04-25 17:22 ` Ben Widawsky 0 siblings, 0 replies; 44+ messages in thread From: Ben Widawsky @ 2013-04-25 17:22 UTC (permalink / raw) To: Mika Kuoppala; +Cc: Intel GFX On Thu, Apr 25, 2013 at 06:01:56PM +0300, Mika Kuoppala wrote: > Ben Widawsky <ben@bwidawsk.net> writes: > > > On Wed, Apr 24, 2013 at 06:11:40PM +0300, Mika Kuoppala wrote: > >> Ben Widawsky <ben@bwidawsk.net> writes: > >> > >> > Make resets optional for fini. fini is only ever called in > >> > module_unload. It was therefore a good assumption that the GPU reset > >> > (see the comment in the function) was what we wanted. With an upcoming > >> > patch, we're going to add a few more callsites, one of which is GPU > >> > reset, where doing the extra reset isn't usual. > >> > > >> > On that same note, with more callers it makes sense to make the default > >> > context state consistent with the actual state (via NULLing the > >> > pointer). > >> > > >> > Signed-off-by: Ben Widawsky <ben@bwidawsk.net> > >> > --- > >> > drivers/gpu/drm/i915/i915_dma.c | 2 +- > >> > drivers/gpu/drm/i915/i915_drv.h | 2 +- > >> > drivers/gpu/drm/i915/i915_gem_context.c | 7 +++++-- > >> > 3 files changed, 7 insertions(+), 4 deletions(-) > >> > > >> > diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c > >> > index 3b315ba..a141b8a 100644 > >> > --- a/drivers/gpu/drm/i915/i915_dma.c > >> > +++ b/drivers/gpu/drm/i915/i915_dma.c > >> > @@ -1764,7 +1764,7 @@ int i915_driver_unload(struct drm_device *dev) > >> > mutex_lock(&dev->struct_mutex); > >> > i915_gem_free_all_phys_object(dev); > >> > i915_gem_cleanup_ringbuffer(dev); > >> > - i915_gem_context_fini(dev); > >> > + i915_gem_context_fini(dev, true); > >> > mutex_unlock(&dev->struct_mutex); > >> > i915_gem_cleanup_aliasing_ppgtt(dev); > >> > i915_gem_cleanup_stolen(dev); > >> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > >> > index 9717c69..618845e 100644 > >> > --- a/drivers/gpu/drm/i915/i915_drv.h > >> > +++ b/drivers/gpu/drm/i915/i915_drv.h > >> > @@ -1698,7 +1698,7 @@ struct dma_buf *i915_gem_prime_export(struct drm_device *dev, > >> > > >> > /* i915_gem_context.c */ > >> > void i915_gem_context_init(struct drm_device *dev); > >> > -void i915_gem_context_fini(struct drm_device *dev); > >> > +void i915_gem_context_fini(struct drm_device *dev, bool reset); > >> > void i915_gem_context_close(struct drm_device *dev, struct drm_file *file); > >> > int i915_switch_context(struct intel_ring_buffer *ring, > >> > struct drm_file *file, int to_id); > >> > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c > >> > index 411ace0..6030d83 100644 > >> > --- a/drivers/gpu/drm/i915/i915_gem_context.c > >> > +++ b/drivers/gpu/drm/i915/i915_gem_context.c > >> > @@ -259,7 +259,7 @@ void i915_gem_context_init(struct drm_device *dev) > >> > DRM_DEBUG_DRIVER("HW context support initialized\n"); > >> > } > >> > > >> > -void i915_gem_context_fini(struct drm_device *dev) > >> > +void i915_gem_context_fini(struct drm_device *dev, bool reset) > >> > { > >> > struct drm_i915_private *dev_priv = dev->dev_private; > >> > > >> > @@ -269,11 +269,14 @@ void i915_gem_context_fini(struct drm_device *dev) > >> > /* The only known way to stop the gpu from accessing the hw context is > >> > * to reset it. Do this as the very last operation to avoid confusing > >> > * other code, leading to spurious errors. */ > >> > >> Should we switch to default context in here to be sure that > >> the running context will get unreferenced correctly?... > > > > I'm confused if you're talking about the object refcount, or the context > > refcount. The latter isn't yet introduced in this part of the series. > > But maybe I've missed your point, so let's discuss... > > I should have been more specific. Object refcount it is. > > > There are 3 ways we can end up at fini(): > > 1. failed during some part of driver initialization > > 2. unload > > Conceptually, 1 and 2 are the same, the GPU is idle, (explicitly so in > > the latter case) and so therefore we know the default context must be > > the last context context[1]. We know the retire worker has either been > > called, or we've not yet submitted work, and we should therefore be > > able to assert the object refcount is exactly 1. > > > > Given our discussion and the new reset parameter in fini, I can assert > > last_context_obj == default_context->obj in the !reset case, and that > > the refcount is 1. > > Yes. I forgot the gpu_idle which indeed switches to default context. So > no need to switch. > > > 3. reset > > In this case, it's a good question. Normally reset is called on a > > hang, and we can't expect to be able to switch. I think > > reset_ring_lists does everything I need, and then it follows into case > > 1 and 2. If I have missed something though, could you please explain > > it a bit further fo rme? > > My only concern was that the drivers bookkeepping of what context it has > currently switched to the hw, is out of sync after reset. And as a > result first call to i915_switch_context might skip the switch even > if it was needed. If you want to submit something that tries to force-fake a context switch (ie. do all the book keeping manually without actually using the ring), we can see how horrible it looks and decide if we want to keep it. > > >> > >> > - intel_gpu_reset(dev); + if (reset) + > >> > intel_gpu_reset(dev); > >> > > >> > i915_gem_object_unpin(dev_priv->ring[RCS].default_context->obj); > >> > > >> > >> ...and unreference the default context here so that it will get freed > >> on module unload. > > > > Can you explain what I'm missing. Doesn't do_destroy just below take > > care of it? > > do_destroy is enough for all the other contexts that were created > by the ioctl. However the default context is different as we are owning > it, adding to the refcount. So we need to unreference it also once before > calling do destroy. Yeah. This changes a bit in the series because refcount doesn't go to 1 until we successfully execute do_switch (i915_gem_context_enable). But I think with my comment below, it's fine. > > This is what i think is the cause as i see default context leaking > with the current codebase, on module unload. > > Adding > + drm_gem_object_unreference(&dev_priv->ring[RCS].default_context->obj->base); > after the unpin makes the leak go away. > > -Mika This looks like a bug to me as well: 10:19:04 bwidawsk | mkuoppal: you should submit that patch... maybe add a comment on the first unref that it may | be unsafe to access the object after that point 10:19:11 bwidawsk | the one you added after unpin > > >> > > > > > >> --Mika > > > > For the context reference counting case, I think the questions are still > > applicable, and the answers are similar. You deref in reset_ring_lists, > > init fail and unload are the same... If you want me to use the > > context_unref interface, I can, but I would want to make unref return > > the val of kref_put() and do while(!context_unref(...)) just to be safe. > > > > > > You can take a look here to see how that's progressing: > > http://cgit.freedesktop.org/~bwidawsk/drm-intel/log/?h=ppgtt-ctx > > > > It's quite volatile though, so be warned. > > > > > >> > >> > do_destroy(dev_priv->ring[RCS].default_context); > >> > + > >> > + dev_priv->ring[RCS].default_context = NULL; > >> > } > >> > > >> > static int context_idr_cleanup(int id, void *p, void *data) > >> > -- > >> > 1.8.2.1 > >> > > >> > _______________________________________________ > >> > Intel-gfx mailing list > >> > Intel-gfx@lists.freedesktop.org > >> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx > > > > -- > > Ben Widawsky, Intel Open Source Technology Center -- Ben Widawsky, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 09/12] drm/i915: Split context enabling from init 2013-04-24 6:15 [PATCH 00/12] [RFC] PPGTT prep patches part 1 Ben Widawsky ` (7 preceding siblings ...) 2013-04-24 6:15 ` [PATCH 08/12] drm/i915: Update context_fini Ben Widawsky @ 2013-04-24 6:15 ` Ben Widawsky 2013-04-24 10:04 ` Chris Wilson 2013-04-24 6:15 ` [PATCH 10/12] drm/i915: destroy i915_gem_init_global_gtt Ben Widawsky ` (4 subsequent siblings) 13 siblings, 1 reply; 44+ messages in thread From: Ben Widawsky @ 2013-04-24 6:15 UTC (permalink / raw) To: Intel GFX; +Cc: Ben Widawsky Context init should have only been called once, ever. The code was hackish because it predated some of the init cleanups. This patch refactors the code to leave the part which must be run more than once (reset, and thaw) as a distinct function. I need this for some of my future plans to more closely tie a context to a PPGTT. Since context_init comes before ring_init now, we also have to add the call to fini() if things go badly. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> --- drivers/gpu/drm/i915/i915_drv.h | 1 + drivers/gpu/drm/i915/i915_gem.c | 18 +++++++++++++++--- drivers/gpu/drm/i915/i915_gem_context.c | 18 +++++++++--------- 3 files changed, 25 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 618845e..11cc159 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1699,6 +1699,7 @@ struct dma_buf *i915_gem_prime_export(struct drm_device *dev, /* i915_gem_context.c */ void i915_gem_context_init(struct drm_device *dev); void i915_gem_context_fini(struct drm_device *dev, bool reset); +int i915_gem_context_enable(struct drm_i915_private *dev_priv); void i915_gem_context_close(struct drm_device *dev, struct drm_file *file); int i915_switch_context(struct intel_ring_buffer *ring, struct drm_file *file, int to_id); diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 9c5eaf0..fbd289d 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2151,6 +2151,8 @@ void i915_gem_reset(struct drm_device *dev) for_each_ring(ring, dev_priv, i) i915_gem_reset_ring_lists(dev_priv, ring); + i915_gem_context_fini(dev, false); + /* Move everything out of the GPU domains to ensure we do any * necessary invalidation upon reuse. */ @@ -4029,10 +4031,18 @@ i915_gem_init_hw(struct drm_device *dev) return ret; /* - * XXX: There was some w/a described somewhere suggesting loading - * contexts before PPGTT. + * XXX: Contexts should only be initialized once. Doing a switch to the + * default context switch however is something we'd like to do after + * reset or thaw (the latter may not actually be necessary for HW, but + * goes with our code better). Context switching requires rings (for + * the do_switch), but before enabling PPGTT. So don't move this. */ - i915_gem_context_init(dev); + if (!dev_priv->hw_contexts_disabled && + i915_gem_context_enable(dev_priv)) { + i915_gem_context_fini(dev, false); + dev_priv->hw_contexts_disabled = true; + } + if (dev_priv->mm.aliasing_ppgtt) { ret = dev_priv->mm.aliasing_ppgtt->enable(dev); if (ret) { @@ -4060,6 +4070,8 @@ int i915_gem_init(struct drm_device *dev) i915_gem_init_global_gtt(dev); + i915_gem_context_init(dev); + ret = i915_gem_init_hw(dev); mutex_unlock(&dev->struct_mutex); if (ret) { diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index 6030d83..b2deddd 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -216,15 +216,10 @@ static int create_default_context(struct drm_i915_private *dev_priv) if (ret) goto err_destroy; - ret = do_switch(ctx); - if (ret) - goto err_unpin; + DRM_DEBUG_DRIVER("Default HW context created\n"); - DRM_DEBUG_DRIVER("Default HW context loaded\n"); return 0; -err_unpin: - i915_gem_object_unpin(ctx->obj); err_destroy: do_destroy(ctx); return ret; @@ -239,9 +234,8 @@ void i915_gem_context_init(struct drm_device *dev) return; } - /* If called from reset, or thaw... we've been here already */ - if (dev_priv->hw_contexts_disabled || - dev_priv->ring[RCS].default_context) + if (WARN_ON(dev_priv->hw_contexts_disabled || + dev_priv->ring[RCS].default_context)) return; dev_priv->hw_context_size = round_up(get_context_size(dev), 4096); @@ -279,6 +273,12 @@ void i915_gem_context_fini(struct drm_device *dev, bool reset) dev_priv->ring[RCS].default_context = NULL; } +int i915_gem_context_enable(struct drm_i915_private *dev_priv) +{ + BUG_ON(!dev_priv->ring[RCS].default_context); + return do_switch(dev_priv->ring[RCS].default_context); +} + static int context_idr_cleanup(int id, void *p, void *data) { struct i915_hw_context *ctx = p; -- 1.8.2.1 ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 09/12] drm/i915: Split context enabling from init 2013-04-24 6:15 ` [PATCH 09/12] drm/i915: Split context enabling from init Ben Widawsky @ 2013-04-24 10:04 ` Chris Wilson 2013-04-24 16:39 ` Ben Widawsky 0 siblings, 1 reply; 44+ messages in thread From: Chris Wilson @ 2013-04-24 10:04 UTC (permalink / raw) To: Ben Widawsky; +Cc: Intel GFX On Tue, Apr 23, 2013 at 11:15:37PM -0700, Ben Widawsky wrote: > - i915_gem_context_init(dev); > + if (!dev_priv->hw_contexts_disabled && > + i915_gem_context_enable(dev_priv)) { Move the hw_contexts_disabled test into i915_gem_context_enable(), just makes this bit of code cleaner and less incestrous. -Chris -- Chris Wilson, Intel Open Source Technology Centre ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 09/12] drm/i915: Split context enabling from init 2013-04-24 10:04 ` Chris Wilson @ 2013-04-24 16:39 ` Ben Widawsky 0 siblings, 0 replies; 44+ messages in thread From: Ben Widawsky @ 2013-04-24 16:39 UTC (permalink / raw) To: Chris Wilson, Intel GFX On Wed, Apr 24, 2013 at 11:04:40AM +0100, Chris Wilson wrote: > On Tue, Apr 23, 2013 at 11:15:37PM -0700, Ben Widawsky wrote: > > - i915_gem_context_init(dev); > > + if (!dev_priv->hw_contexts_disabled && > > + i915_gem_context_enable(dev_priv)) { > > Move the hw_contexts_disabled test into i915_gem_context_enable(), just > makes this bit of code cleaner and less incestrous. > -Chris Got it locally, thanks for the idea. I'll wait a bit to resend in case more review comes in. > > -- > Chris Wilson, Intel Open Source Technology Centre -- Ben Widawsky, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 10/12] drm/i915: destroy i915_gem_init_global_gtt 2013-04-24 6:15 [PATCH 00/12] [RFC] PPGTT prep patches part 1 Ben Widawsky ` (8 preceding siblings ...) 2013-04-24 6:15 ` [PATCH 09/12] drm/i915: Split context enabling from init Ben Widawsky @ 2013-04-24 6:15 ` Ben Widawsky 2013-04-24 6:15 ` [PATCH 11/12] drm/i915: Embed PPGTT into the context Ben Widawsky ` (3 subsequent siblings) 13 siblings, 0 replies; 44+ messages in thread From: Ben Widawsky @ 2013-04-24 6:15 UTC (permalink / raw) To: Intel GFX; +Cc: Ben Widawsky The resulting code isn't a huge improvement, but that will change in the next patch (at least a bit). Signed-off-by: Ben Widawsky <ben@bwidawsk.net> --- drivers/gpu/drm/i915/i915_drv.h | 4 ++-- drivers/gpu/drm/i915/i915_gem.c | 29 ++++++++++++++++++++++--- drivers/gpu/drm/i915/i915_gem_gtt.c | 42 +------------------------------------ 3 files changed, 29 insertions(+), 46 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 11cc159..9f984cf 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1709,20 +1709,20 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data, struct drm_file *file); /* i915_gem_gtt.c */ +bool intel_enable_ppgtt(struct drm_device *dev); +int i915_gem_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt); void i915_gem_cleanup_aliasing_ppgtt(struct drm_device *dev); void i915_ppgtt_bind_object(struct i915_hw_ppgtt *ppgtt, struct drm_i915_gem_object *obj, enum i915_cache_level cache_level); void i915_ppgtt_unbind_object(struct i915_hw_ppgtt *ppgtt, struct drm_i915_gem_object *obj); - void i915_gem_restore_gtt_mappings(struct drm_device *dev); int __must_check i915_gem_gtt_prepare_object(struct drm_i915_gem_object *obj); void i915_gem_gtt_bind_object(struct drm_i915_gem_object *obj, enum i915_cache_level cache_level); void i915_gem_gtt_unbind_object(struct drm_i915_gem_object *obj); void i915_gem_gtt_finish_object(struct drm_i915_gem_object *obj); -void i915_gem_init_global_gtt(struct drm_device *dev); void i915_gem_setup_global_gtt(struct drm_device *dev, unsigned long start, unsigned long mappable_end, unsigned long end, bool guard_page); diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index fbd289d..24f3aef 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -4057,7 +4057,7 @@ i915_gem_init_hw(struct drm_device *dev) int i915_gem_init(struct drm_device *dev) { struct drm_i915_private *dev_priv = dev->dev_private; - int ret; + int ret = -ENOMEM; mutex_lock(&dev->struct_mutex); @@ -4068,9 +4068,32 @@ int i915_gem_init(struct drm_device *dev) DRM_DEBUG_DRIVER("allow wake ack timed out\n"); } - i915_gem_init_global_gtt(dev); + if (intel_enable_ppgtt(dev) && HAS_ALIASING_PPGTT(dev)) { + struct i915_hw_ppgtt *ppgtt; + ppgtt = kzalloc(sizeof(*ppgtt), GFP_KERNEL); + if (!ppgtt) + goto ggtt_only; + + + i915_gem_setup_global_gtt(dev, 0, dev_priv->gtt.mappable_end, + dev_priv->gtt.total, false); + i915_gem_context_init(dev); + ret = i915_gem_ppgtt_init(dev, ppgtt); + if (ret) { + kfree(ppgtt); + drm_mm_takedown(&dev_priv->mm.gtt_space); + goto ggtt_only; + } + + dev_priv->mm.aliasing_ppgtt = ppgtt; + } - i915_gem_context_init(dev); +ggtt_only: + if (ret) { + DRM_ERROR("Aliased PPGTT setup failed %d\n", ret); + i915_gem_setup_global_gtt(dev, 0, dev_priv->gtt.mappable_end, + dev_priv->gtt.total, true); + } ret = i915_gem_init_hw(dev); mutex_unlock(&dev->struct_mutex); diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index 39ac37d..0598c6d 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -589,7 +589,7 @@ static void i915_gtt_color_adjust(struct drm_mm_node *node, } } -static bool intel_enable_ppgtt(struct drm_device *dev) +bool intel_enable_ppgtt(struct drm_device *dev) { if (i915_enable_ppgtt >= 0) return i915_enable_ppgtt; @@ -664,46 +664,6 @@ void i915_gem_setup_global_gtt(struct drm_device *dev, dev_priv->gtt.gtt_clear_range(dev, end / PAGE_SIZE - 1, 1); } -void i915_gem_init_global_gtt(struct drm_device *dev) -{ - struct drm_i915_private *dev_priv = dev->dev_private; - unsigned long gtt_size, mappable_size; - int ret = 0; - - gtt_size = dev_priv->gtt.total; - mappable_size = dev_priv->gtt.mappable_end; - - if (intel_enable_ppgtt(dev) && HAS_ALIASING_PPGTT(dev)) { - struct i915_hw_ppgtt *ppgtt; - - i915_gem_setup_global_gtt(dev, 0, mappable_size, gtt_size, false); - - ppgtt = kzalloc(sizeof(*ppgtt), GFP_KERNEL); - if (!ppgtt) { - ret = -ENOMEM; - goto ggtt_only; - } - - ret = i915_gem_ppgtt_init(dev, ppgtt); - if (ret) - goto ggtt_only; - dev_priv->mm.aliasing_ppgtt = ppgtt; - - return; - } - -/* XXX: We need to takedown the drm_mm and have this fall back because of the - * conditional use of the guard page. - * TODO: It's a bit hackish and could use cleanup. - */ -ggtt_only: - if (ret) { - DRM_ERROR("Aliased PPGTT setup failed %d\n", ret); - drm_mm_takedown(&dev_priv->mm.gtt_space); - } - i915_gem_setup_global_gtt(dev, 0, mappable_size, gtt_size, true); -} - static int setup_scratch_page(struct drm_device *dev) { struct drm_i915_private *dev_priv = dev->dev_private; -- 1.8.2.1 ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 11/12] drm/i915: Embed PPGTT into the context 2013-04-24 6:15 [PATCH 00/12] [RFC] PPGTT prep patches part 1 Ben Widawsky ` (9 preceding siblings ...) 2013-04-24 6:15 ` [PATCH 10/12] drm/i915: destroy i915_gem_init_global_gtt Ben Widawsky @ 2013-04-24 6:15 ` Ben Widawsky 2013-04-24 6:15 ` [PATCH 12/12] drm/i915: No contexts without ppgtt Ben Widawsky ` (2 subsequent siblings) 13 siblings, 0 replies; 44+ messages in thread From: Ben Widawsky @ 2013-04-24 6:15 UTC (permalink / raw) To: Intel GFX; +Cc: Ben Widawsky The aliasing ppgtt is just the ppgtt for the default context. The obvious downside is until we actually do ppgtt switches, this wastes a bit of memory. ie. by the end of the series, it's a don't care. The other downside is PPGTT can't work without contexts, which *should* have already been the case except for debugging scenarios. (Note the does break the potential to easily use contexts an GEN5) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> --- drivers/gpu/drm/i915/i915_drv.h | 2 ++ drivers/gpu/drm/i915/i915_gem.c | 14 ++++++++------ drivers/gpu/drm/i915/i915_gem_gtt.c | 1 - 3 files changed, 10 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 9f984cf..c123df2 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -461,6 +461,8 @@ struct i915_hw_context { struct drm_i915_file_private *file_priv; struct intel_ring_buffer *ring; struct drm_i915_gem_object *obj; + + struct i915_hw_ppgtt ppgtt; }; enum no_fbc_reason { diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 24f3aef..cb9fcfb 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -4070,17 +4070,19 @@ int i915_gem_init(struct drm_device *dev) if (intel_enable_ppgtt(dev) && HAS_ALIASING_PPGTT(dev)) { struct i915_hw_ppgtt *ppgtt; - ppgtt = kzalloc(sizeof(*ppgtt), GFP_KERNEL); - if (!ppgtt) - goto ggtt_only; - i915_gem_setup_global_gtt(dev, 0, dev_priv->gtt.mappable_end, dev_priv->gtt.total, false); i915_gem_context_init(dev); + if (dev_priv->hw_contexts_disabled) { + drm_mm_takedown(&dev_priv->mm.gtt_space); + goto ggtt_only; + } + + ppgtt = &dev_priv->ring[RCS].default_context->ppgtt; + ret = i915_gem_ppgtt_init(dev, ppgtt); - if (ret) { - kfree(ppgtt); + if (ret) { drm_mm_takedown(&dev_priv->mm.gtt_space); goto ggtt_only; } diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index 0598c6d..c260e92 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -232,7 +232,6 @@ static void gen6_ppgtt_cleanup(struct i915_hw_ppgtt *ppgtt) for (i = 0; i < ppgtt->num_pd_entries; i++) __free_page(ppgtt->pt_pages[i]); kfree(ppgtt->pt_pages); - kfree(ppgtt); } static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt) -- 1.8.2.1 ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 12/12] drm/i915: No contexts without ppgtt 2013-04-24 6:15 [PATCH 00/12] [RFC] PPGTT prep patches part 1 Ben Widawsky ` (10 preceding siblings ...) 2013-04-24 6:15 ` [PATCH 11/12] drm/i915: Embed PPGTT into the context Ben Widawsky @ 2013-04-24 6:15 ` Ben Widawsky 2013-04-24 10:06 ` Chris Wilson 2013-04-24 9:53 ` [PATCH 00/12] [RFC] PPGTT prep patches part 1 Chris Wilson 2013-04-24 19:58 ` Chris Wilson 13 siblings, 1 reply; 44+ messages in thread From: Ben Widawsky @ 2013-04-24 6:15 UTC (permalink / raw) To: Intel GFX; +Cc: Ben Widawsky Going forward, every context will have its own address space. So rip off the band aid now. If contexts fail, don't do ppgtt, and vice versa. Similarly to the last patch, this is somewhat wasteful of PPGTT address space with contexts, since we're not actually utilizing the new PPGTT. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> --- drivers/gpu/drm/i915/i915_gem.c | 16 ++++------------ drivers/gpu/drm/i915/i915_gem_context.c | 6 ++++++ 2 files changed, 10 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index cb9fcfb..53192e7 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -4057,7 +4057,7 @@ i915_gem_init_hw(struct drm_device *dev) int i915_gem_init(struct drm_device *dev) { struct drm_i915_private *dev_priv = dev->dev_private; - int ret = -ENOMEM; + int ret = -ENODEV; mutex_lock(&dev->struct_mutex); @@ -4069,8 +4069,6 @@ int i915_gem_init(struct drm_device *dev) } if (intel_enable_ppgtt(dev) && HAS_ALIASING_PPGTT(dev)) { - struct i915_hw_ppgtt *ppgtt; - i915_gem_setup_global_gtt(dev, 0, dev_priv->gtt.mappable_end, dev_priv->gtt.total, false); i915_gem_context_init(dev); @@ -4079,15 +4077,9 @@ int i915_gem_init(struct drm_device *dev) goto ggtt_only; } - ppgtt = &dev_priv->ring[RCS].default_context->ppgtt; - - ret = i915_gem_ppgtt_init(dev, ppgtt); - if (ret) { - drm_mm_takedown(&dev_priv->mm.gtt_space); - goto ggtt_only; - } - - dev_priv->mm.aliasing_ppgtt = ppgtt; + dev_priv->mm.aliasing_ppgtt = + &dev_priv->ring[RCS].default_context->ppgtt; + ret = 0; } ggtt_only: diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index b2deddd..64f8009 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -129,6 +129,8 @@ static void do_destroy(struct i915_hw_context *ctx) if (ctx->file_priv) idr_remove(&ctx->file_priv->context_idr, ctx->id); + ctx->ppgtt.cleanup(&ctx->ppgtt); + drm_gem_object_unreference(&ctx->obj->base); kfree(ctx); } @@ -159,6 +161,10 @@ create_hw_context(struct drm_device *dev, goto err_out; } + ret = i915_gem_ppgtt_init(dev, &ctx->ppgtt); + if (ret) + goto err_out; + /* The ring associated with the context object is handled by the normal * object tracking code. We give an initial ring value simple to pass an * assertion in the context switch code. -- 1.8.2.1 ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 12/12] drm/i915: No contexts without ppgtt 2013-04-24 6:15 ` [PATCH 12/12] drm/i915: No contexts without ppgtt Ben Widawsky @ 2013-04-24 10:06 ` Chris Wilson 2013-04-24 16:39 ` Ben Widawsky 0 siblings, 1 reply; 44+ messages in thread From: Chris Wilson @ 2013-04-24 10:06 UTC (permalink / raw) To: Ben Widawsky; +Cc: Intel GFX On Tue, Apr 23, 2013 at 11:15:40PM -0700, Ben Widawsky wrote: > Going forward, every context will have its own address space. So rip off > the band aid now. If contexts fail, don't do ppgtt, and vice versa. > > Similarly to the last patch, this is somewhat wasteful of PPGTT address > space with contexts, since we're not actually utilizing the new PPGTT. What strategy do we have for isolating bugs between ctx and ppggt? I'd like to be able disable each of them independently and together. -Chris -- Chris Wilson, Intel Open Source Technology Centre ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 12/12] drm/i915: No contexts without ppgtt 2013-04-24 10:06 ` Chris Wilson @ 2013-04-24 16:39 ` Ben Widawsky 0 siblings, 0 replies; 44+ messages in thread From: Ben Widawsky @ 2013-04-24 16:39 UTC (permalink / raw) To: Chris Wilson, Intel GFX On Wed, Apr 24, 2013 at 11:06:12AM +0100, Chris Wilson wrote: > On Tue, Apr 23, 2013 at 11:15:40PM -0700, Ben Widawsky wrote: > > Going forward, every context will have its own address space. So rip off > > the band aid now. If contexts fail, don't do ppgtt, and vice versa. > > > > Similarly to the last patch, this is somewhat wasteful of PPGTT address > > space with contexts, since we're not actually utilizing the new PPGTT. > > What strategy do we have for isolating bugs between ctx and ppggt? I'd > like to be able disable each of them independently and together. > -Chris None. Up through the end of this series we have the ability to turn off: PPGTT or Contexts & PPGTT. Given our history, I think that probably concerns you more than PPGTT. I don't know how to make you feel better.AFAIK, Mesa plans to become dependent on contexts anyway, so having a way of backing out of that, even for debug, seems to be of limited use to me. > > -- > Chris Wilson, Intel Open Source Technology Centre -- Ben Widawsky, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 00/12] [RFC] PPGTT prep patches part 1 2013-04-24 6:15 [PATCH 00/12] [RFC] PPGTT prep patches part 1 Ben Widawsky ` (11 preceding siblings ...) 2013-04-24 6:15 ` [PATCH 12/12] drm/i915: No contexts without ppgtt Ben Widawsky @ 2013-04-24 9:53 ` Chris Wilson 2013-04-24 19:58 ` Chris Wilson 13 siblings, 0 replies; 44+ messages in thread From: Chris Wilson @ 2013-04-24 9:53 UTC (permalink / raw) To: Ben Widawsky; +Cc: Intel GFX On Tue, Apr 23, 2013 at 11:15:28PM -0700, Ben Widawsky wrote: > x11perf -aa10text > before: 1925000/s > after: 1941667/s (clearly not a good test) What are you running that is that slow? Implementing ppgtt on 830g? -Chris -- Chris Wilson, Intel Open Source Technology Centre ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 00/12] [RFC] PPGTT prep patches part 1 2013-04-24 6:15 [PATCH 00/12] [RFC] PPGTT prep patches part 1 Ben Widawsky ` (12 preceding siblings ...) 2013-04-24 9:53 ` [PATCH 00/12] [RFC] PPGTT prep patches part 1 Chris Wilson @ 2013-04-24 19:58 ` Chris Wilson 13 siblings, 0 replies; 44+ messages in thread From: Chris Wilson @ 2013-04-24 19:58 UTC (permalink / raw) To: Ben Widawsky; +Cc: Intel GFX On Tue, Apr 23, 2013 at 11:15:28PM -0700, Ben Widawsky wrote: > I haven't run performance numbers on this series specifically, but I did > run performance numbers on the slightly more invasive patches which come > after (through step 1 above) > > x11perf -aa10text > before: 1925000/s > after: 1941667/s (clearly not a good test) So looking at cairo on a SNB i5-2500 (desktop gt1) with only this prep series also suggests no discernible impact. The remaining test will be to exercise a compositor and multiple contexts. -Chris -- Chris Wilson, Intel Open Source Technology Centre ^ permalink raw reply [flat|nested] 44+ messages in thread
end of thread, other threads:[~2013-05-08 17:55 UTC | newest] Thread overview: 44+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-04-24 6:15 [PATCH 00/12] [RFC] PPGTT prep patches part 1 Ben Widawsky 2013-04-24 6:15 ` [PATCH 01/12] drm/i915: Assert mutex_is_locked on context lookup Ben Widawsky 2013-05-02 20:27 ` Jesse Barnes 2013-05-06 9:40 ` Daniel Vetter 2013-05-06 9:44 ` Daniel Vetter 2013-05-06 17:59 ` Ben Widawsky 2013-05-06 18:35 ` Daniel Vetter 2013-04-24 6:15 ` [PATCH 02/12] drm/i915: BUG_ON bad PPGTT offset Ben Widawsky 2013-05-02 20:28 ` Jesse Barnes 2013-05-06 9:48 ` Daniel Vetter 2013-05-06 18:03 ` Ben Widawsky 2013-05-06 18:37 ` Daniel Vetter 2013-05-08 16:48 ` Ben Widawsky 2013-05-08 17:55 ` Daniel Vetter 2013-04-24 6:15 ` [PATCH 03/12] drm/i915: make PDE|PTE platform specific Ben Widawsky 2013-05-02 21:26 ` Jesse Barnes 2013-05-02 22:49 ` Ben Widawsky 2013-05-02 22:55 ` Jesse Barnes 2013-05-06 9:47 ` Daniel Vetter 2013-05-08 16:49 ` Ben Widawsky 2013-05-08 17:52 ` Daniel Vetter 2013-04-24 6:15 ` [PATCH 04/12] drm/i915: Extract PDE writes Ben Widawsky 2013-05-02 21:27 ` Jesse Barnes 2013-05-06 9:50 ` Daniel Vetter 2013-04-24 6:15 ` [PATCH 05/12] drm: Optionally create mm blocks from top-to-bottom Ben Widawsky 2013-04-24 6:15 ` [PATCH 06/12] drm/i915: Use drm_mm for PPGTT PDEs Ben Widawsky 2013-05-02 21:42 ` Jesse Barnes 2013-04-24 6:15 ` [PATCH 07/12] drm/i915: Use PDEs as the guard page Ben Widawsky 2013-04-24 6:15 ` [PATCH 08/12] drm/i915: Update context_fini Ben Widawsky 2013-04-24 15:11 ` Mika Kuoppala 2013-04-25 4:11 ` Ben Widawsky 2013-04-25 5:17 ` Ben Widawsky 2013-04-25 15:01 ` Mika Kuoppala 2013-04-25 17:22 ` Ben Widawsky 2013-04-24 6:15 ` [PATCH 09/12] drm/i915: Split context enabling from init Ben Widawsky 2013-04-24 10:04 ` Chris Wilson 2013-04-24 16:39 ` Ben Widawsky 2013-04-24 6:15 ` [PATCH 10/12] drm/i915: destroy i915_gem_init_global_gtt Ben Widawsky 2013-04-24 6:15 ` [PATCH 11/12] drm/i915: Embed PPGTT into the context Ben Widawsky 2013-04-24 6:15 ` [PATCH 12/12] drm/i915: No contexts without ppgtt Ben Widawsky 2013-04-24 10:06 ` Chris Wilson 2013-04-24 16:39 ` Ben Widawsky 2013-04-24 9:53 ` [PATCH 00/12] [RFC] PPGTT prep patches part 1 Chris Wilson 2013-04-24 19:58 ` Chris Wilson
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.