Re: [Intel-gfx] [PATCH v2 1/3] drm/i915: Enable lockless lookup of request tracking via RCU

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: [Intel-gfx] [PATCH v2 1/3] drm/i915: Enable lockless lookup of request tracking via RCU
       [not found] ` <1450877756-2902-1-git-send-email-chris@chris-wilson.co.uk>
@ 2016-01-05 14:59   ` Daniel Vetter
  2016-01-05 15:02     ` Peter Zijlstra
  0 siblings, 1 reply; 7+ messages in thread
From: Daniel Vetter @ 2016-01-05 14:59 UTC (permalink / raw)
  To: Chris Wilson
  Cc: intel-gfx, Linux MM, Peter Zijlstra, Jens Axboe, Paul E. McKenney,
	Christoph Lameter, Hugh Dickins, Pekka Enberg

On Wed, Dec 23, 2015 at 01:35:54PM +0000, Chris Wilson wrote:
> If we enable RCU for the requests (providing a grace period where we can
> inspect a "dead" request before it is freed), we can allow callers to
> carefully perform lockless lookup of an active request.
> 
> However, by enabling deferred freeing of requests, we can potentially
> hog a lot of memory when dealing with tens of thousands of requests per
> second - with a quick insertion of the a synchronize_rcu() inside our
> shrinker callback, that issue disappears.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_gem.c          |  3 ++-
>  drivers/gpu/drm/i915/i915_gem_request.c  |  2 +-
>  drivers/gpu/drm/i915/i915_gem_request.h  | 24 +++++++++++++++++++++++-
>  drivers/gpu/drm/i915/i915_gem_shrinker.c |  1 +
>  4 files changed, 27 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index c169574758d5..696ada3891ed 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4222,7 +4222,8 @@ i915_gem_load(struct drm_device *dev)
>  	dev_priv->requests =
>  		kmem_cache_create("i915_gem_request",
>  				  sizeof(struct drm_i915_gem_request), 0,
> -				  SLAB_HWCACHE_ALIGN,
> +				  SLAB_HWCACHE_ALIGN |
> +				  SLAB_DESTROY_BY_RCU,
>  				  NULL);
>  
>  	INIT_LIST_HEAD(&dev_priv->context_list);

[snip i915 private changes, leave just slab/shrinker changes]

> diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> index c561ed2b8287..03a8bbb3e31e 100644
> --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
> +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> @@ -142,6 +142,7 @@ i915_gem_shrink(struct drm_i915_private *dev_priv,
>  	}
>  
>  	i915_gem_retire_requests(dev_priv->dev);
> +	synchronize_rcu(); /* expedite the grace period to free the requests */

Shouldn't the slab subsystem do this for us if we request it delays the
actual kfree? Seems like a core bug to me ... Adding more folks.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Intel-gfx] [PATCH v2 1/3] drm/i915: Enable lockless lookup of request tracking via RCU
  2016-01-05 14:59   ` [Intel-gfx] [PATCH v2 1/3] drm/i915: Enable lockless lookup of request tracking via RCU Daniel Vetter
@ 2016-01-05 15:02     ` Peter Zijlstra
  2016-01-05 15:06       ` Peter Zijlstra
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2016-01-05 15:02 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx, Linux MM, Jens Axboe, Paul E. McKenney,
	Christoph Lameter, Hugh Dickins, Pekka Enberg

On Tue, Jan 05, 2016 at 03:59:51PM +0100, Daniel Vetter wrote:
> On Wed, Dec 23, 2015 at 01:35:54PM +0000, Chris Wilson wrote:
> > If we enable RCU for the requests (providing a grace period where we can
> > inspect a "dead" request before it is freed), we can allow callers to
> > carefully perform lockless lookup of an active request.
> > 
> > However, by enabling deferred freeing of requests, we can potentially
> > hog a lot of memory when dealing with tens of thousands of requests per
> > second - with a quick insertion of the a synchronize_rcu() inside our
> > shrinker callback, that issue disappears.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/i915_gem.c          |  3 ++-
> >  drivers/gpu/drm/i915/i915_gem_request.c  |  2 +-
> >  drivers/gpu/drm/i915/i915_gem_request.h  | 24 +++++++++++++++++++++++-
> >  drivers/gpu/drm/i915/i915_gem_shrinker.c |  1 +
> >  4 files changed, 27 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index c169574758d5..696ada3891ed 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -4222,7 +4222,8 @@ i915_gem_load(struct drm_device *dev)
> >  	dev_priv->requests =
> >  		kmem_cache_create("i915_gem_request",
> >  				  sizeof(struct drm_i915_gem_request), 0,
> > -				  SLAB_HWCACHE_ALIGN,
> > +				  SLAB_HWCACHE_ALIGN |
> > +				  SLAB_DESTROY_BY_RCU,
> >  				  NULL);
> >  
> >  	INIT_LIST_HEAD(&dev_priv->context_list);
> 
> [snip i915 private changes, leave just slab/shrinker changes]
> 
> > diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> > index c561ed2b8287..03a8bbb3e31e 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> > @@ -142,6 +142,7 @@ i915_gem_shrink(struct drm_i915_private *dev_priv,
> >  	}
> >  
> >  	i915_gem_retire_requests(dev_priv->dev);
> > +	synchronize_rcu(); /* expedite the grace period to free the requests */
> 
> Shouldn't the slab subsystem do this for us if we request it delays the
> actual kfree? Seems like a core bug to me ... Adding more folks.

note that sync_rcu() can take a terribly long time.. but yes, I seem to
remember Paul talking about adding this to reclaim paths for just this
reason. Not sure that ever happened thouhg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Intel-gfx] [PATCH v2 1/3] drm/i915: Enable lockless lookup of request tracking via RCU
  2016-01-05 15:02     ` Peter Zijlstra
@ 2016-01-05 15:06       ` Peter Zijlstra
  2016-01-05 16:35         ` Paul E. McKenney
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2016-01-05 15:06 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx, Linux MM, Jens Axboe, Paul E. McKenney,
	Christoph Lameter, Hugh Dickins, Pekka Enberg

On Tue, Jan 05, 2016 at 04:02:13PM +0100, Peter Zijlstra wrote:
> > Shouldn't the slab subsystem do this for us if we request it delays the
> > actual kfree? Seems like a core bug to me ... Adding more folks.
> 
> note that sync_rcu() can take a terribly long time.. but yes, I seem to
> remember Paul talking about adding this to reclaim paths for just this
> reason. Not sure that ever happened thouhg.

Also, you might be wanting rcu_barrier() instead, that not only waits
for a GP to complete, but also for all pending callbacks to be
processed.

Without the latter there might still not be anything to free after it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Intel-gfx] [PATCH v2 1/3] drm/i915: Enable lockless lookup of request tracking via RCU
  2016-01-05 15:06       ` Peter Zijlstra
@ 2016-01-05 16:35         ` Paul E. McKenney
  2016-01-06  8:06           ` Daniel Vetter
  0 siblings, 1 reply; 7+ messages in thread
From: Paul E. McKenney @ 2016-01-05 16:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Chris Wilson, intel-gfx, Linux MM, Jens Axboe, Christoph Lameter,
	Hugh Dickins, Pekka Enberg

On Tue, Jan 05, 2016 at 04:06:48PM +0100, Peter Zijlstra wrote:
> On Tue, Jan 05, 2016 at 04:02:13PM +0100, Peter Zijlstra wrote:
> > > Shouldn't the slab subsystem do this for us if we request it delays the
> > > actual kfree? Seems like a core bug to me ... Adding more folks.
> > 
> > note that sync_rcu() can take a terribly long time.. but yes, I seem to
> > remember Paul talking about adding this to reclaim paths for just this
> > reason. Not sure that ever happened thouhg.

There is an RCU OOM notifier, but it just ensures that existing callbacks
get processed in a timely fashion.  It does not block, as that would
prevent other OOM notifiers from getting their memory freed quickly.

> Also, you might be wanting rcu_barrier() instead, that not only waits
> for a GP to complete, but also for all pending callbacks to be
> processed.

And in fact what the RCU OOM notifier does can be thought of as an
asynchronous open-coded rcu_barrier().  If you are interested, please
see rcu_register_oom_notifier() and friends.

> Without the latter there might still not be anything to free after it.

Another approach is synchronize_rcu() after some largish number of
requests.  The advantage of this approach is that it throttles the
production of callbacks at the source.  The corresponding disadvantage
is that it slows things up.

Another approach is to use call_rcu(), but if the previous call_rcu()
is still in flight, block waiting for it.  Yet another approach is
the get_state_synchronize_rcu() / cond_synchronize_rcu() pair.  The
idea is to do something like this:

	cond_synchronize_rcu(cookie);
	cookie = get_state_synchronize_rcu();

You would of course do an initial get_state_synchronize_rcu() to
get things going.  This would not block unless there was less than
one grace period's worth of time between invocations.  But this
assumes a busy system, where there is almost always a grace period
in flight.  But you can make that happen as follows:

	cond_synchronize_rcu(cookie);
	cookie = get_state_synchronize_rcu();
	call_rcu(&my_rcu_head, noop_function);

Note that you need additional code to make sure that the old callback
has completed before doing a new one.  Setting and clearing a flag
with appropriate memory ordering control suffices (e.g,. smp_load_acquire()
and smp_store_release()).

And there are probably other approaches as well...

							Thanx, Paul

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Intel-gfx] [PATCH v2 1/3] drm/i915: Enable lockless lookup of request tracking via RCU
  2016-01-05 16:35         ` Paul E. McKenney
@ 2016-01-06  8:06           ` Daniel Vetter
  2016-01-06  8:38             ` Peter Zijlstra
  0 siblings, 1 reply; 7+ messages in thread
From: Daniel Vetter @ 2016-01-06  8:06 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, Chris Wilson, intel-gfx, Linux MM, Jens Axboe,
	Christoph Lameter, Hugh Dickins, Pekka Enberg

On Tue, Jan 05, 2016 at 08:35:37AM -0800, Paul E. McKenney wrote:
> On Tue, Jan 05, 2016 at 04:06:48PM +0100, Peter Zijlstra wrote:
> > On Tue, Jan 05, 2016 at 04:02:13PM +0100, Peter Zijlstra wrote:
> > > > Shouldn't the slab subsystem do this for us if we request it delays the
> > > > actual kfree? Seems like a core bug to me ... Adding more folks.
> > > 
> > > note that sync_rcu() can take a terribly long time.. but yes, I seem to
> > > remember Paul talking about adding this to reclaim paths for just this
> > > reason. Not sure that ever happened thouhg.
> 
> There is an RCU OOM notifier, but it just ensures that existing callbacks
> get processed in a timely fashion.  It does not block, as that would
> prevent other OOM notifiers from getting their memory freed quickly.
> 
> > Also, you might be wanting rcu_barrier() instead, that not only waits
> > for a GP to complete, but also for all pending callbacks to be
> > processed.
> 
> And in fact what the RCU OOM notifier does can be thought of as an
> asynchronous open-coded rcu_barrier().  If you are interested, please
> see rcu_register_oom_notifier() and friends.
> 
> > Without the latter there might still not be anything to free after it.
> 
> Another approach is synchronize_rcu() after some largish number of
> requests.  The advantage of this approach is that it throttles the
> production of callbacks at the source.  The corresponding disadvantage
> is that it slows things up.
> 
> Another approach is to use call_rcu(), but if the previous call_rcu()
> is still in flight, block waiting for it.  Yet another approach is
> the get_state_synchronize_rcu() / cond_synchronize_rcu() pair.  The
> idea is to do something like this:
> 
> 	cond_synchronize_rcu(cookie);
> 	cookie = get_state_synchronize_rcu();
> 
> You would of course do an initial get_state_synchronize_rcu() to
> get things going.  This would not block unless there was less than
> one grace period's worth of time between invocations.  But this
> assumes a busy system, where there is almost always a grace period
> in flight.  But you can make that happen as follows:
> 
> 	cond_synchronize_rcu(cookie);
> 	cookie = get_state_synchronize_rcu();
> 	call_rcu(&my_rcu_head, noop_function);
> 
> Note that you need additional code to make sure that the old callback
> has completed before doing a new one.  Setting and clearing a flag
> with appropriate memory ordering control suffices (e.g,. smp_load_acquire()
> and smp_store_release()).

This pretty much went over my head ;-) What I naively hoped for is that
kfree() on an rcu-freeing slab could be tought to magically stall a bit
(or at least expedite the delayed freeing) if we're piling up too many
freed objects. Doing that only in OOM is probably too late since OOM
handling is a bit unreliable/unpredictable. And I thought we're not the
first ones running into this problem.

Do all the other users of rcu-freed slabs just open-code their own custom
approach? If that's the recommendation we can certainly follow that, too.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Intel-gfx] [PATCH v2 1/3] drm/i915: Enable lockless lookup of request tracking via RCU
  2016-01-06  8:06           ` Daniel Vetter
@ 2016-01-06  8:38             ` Peter Zijlstra
  2016-01-06 15:56               ` Paul E. McKenney
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2016-01-06  8:38 UTC (permalink / raw)
  To: Paul E. McKenney, Chris Wilson, intel-gfx, Linux MM, Jens Axboe,
	Christoph Lameter, Hugh Dickins, Pekka Enberg

On Wed, Jan 06, 2016 at 09:06:58AM +0100, Daniel Vetter wrote:
> This pretty much went over my head ;-) What I naively hoped for is that
> kfree() on an rcu-freeing slab could be tought to magically stall a bit
> (or at least expedite the delayed freeing) if we're piling up too many
> freed objects.

Well, RCU does try harder when the callback list is getting 'big' (10k
IIRC).

> Doing that only in OOM is probably too late since OOM
> handling is a bit unreliable/unpredictable. And I thought we're not the
> first ones running into this problem.

The whole memory pressure thing is unreliable/unpredictable last time I
looked at it, but sure, I suppose we could try and poke RCU sooner, but
then you get into the problem of when, doing it too soon will be
detrimental to performance, doing it too late is, well, too late.

> Do all the other users of rcu-freed slabs just open-code their own custom
> approach? If that's the recommendation we can certainly follow that, too.

The ones I know of seem to simply ignore this problem..

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Intel-gfx] [PATCH v2 1/3] drm/i915: Enable lockless lookup of request tracking via RCU
  2016-01-06  8:38             ` Peter Zijlstra
@ 2016-01-06 15:56               ` Paul E. McKenney
  0 siblings, 0 replies; 7+ messages in thread
From: Paul E. McKenney @ 2016-01-06 15:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Chris Wilson, intel-gfx, Linux MM, Jens Axboe, Christoph Lameter,
	Hugh Dickins, Pekka Enberg

On Wed, Jan 06, 2016 at 09:38:30AM +0100, Peter Zijlstra wrote:
> On Wed, Jan 06, 2016 at 09:06:58AM +0100, Daniel Vetter wrote:
> > This pretty much went over my head ;-) What I naively hoped for is that
> > kfree() on an rcu-freeing slab could be tought to magically stall a bit
> > (or at least expedite the delayed freeing) if we're piling up too many
> > freed objects.
> 
> Well, RCU does try harder when the callback list is getting 'big' (10k
> IIRC).

You got it, 10k by default, can be adjusted with the rcutree.qhimark
kernel-boot/sysfs parameter.  When a given CPU's callback list exceeds
this limit, it more aggressively starts a grace period, and if a grace
period is already in progress, it does more aggressive quiescent-state
forcing.  It does nothing to push back on processes generating callbacks,
other than by soaking up extra CPU cycles.

So, Daniel, if you haven't tried hammering the system hard, give it a
shot and see if qhimark is helping enough.  And perhaps adjust its value
if need be.  (Though please let me know if this is necessary -- if it is,
we should try to automate its setting.)

> > Doing that only in OOM is probably too late since OOM
> > handling is a bit unreliable/unpredictable. And I thought we're not the
> > first ones running into this problem.
> 
> The whole memory pressure thing is unreliable/unpredictable last time I
> looked at it, but sure, I suppose we could try and poke RCU sooner, but
> then you get into the problem of when, doing it too soon will be
> detrimental to performance, doing it too late is, well, too late.
> 
> > Do all the other users of rcu-freed slabs just open-code their own custom
> > approach? If that's the recommendation we can certainly follow that, too.
> 
> The ones I know of seem to simply ignore this problem..

I believe that there are a few that do the occasional synchronize_rcu()
to throttle themselves, but have not checked recently.

							Thanx, Paul

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-01-06 15:56 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1450869563-23892-1-git-send-email-chris@chris-wilson.co.uk>
     [not found] ` <1450877756-2902-1-git-send-email-chris@chris-wilson.co.uk>
2016-01-05 14:59   ` [Intel-gfx] [PATCH v2 1/3] drm/i915: Enable lockless lookup of request tracking via RCU Daniel Vetter
2016-01-05 15:02     ` Peter Zijlstra
2016-01-05 15:06       ` Peter Zijlstra
2016-01-05 16:35         ` Paul E. McKenney
2016-01-06  8:06           ` Daniel Vetter
2016-01-06  8:38             ` Peter Zijlstra
2016-01-06 15:56               ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).