Re: [PATCH 2/2] drm/i915: Use atomic waits for short non-atomic ones

From: Imre Deak <imre.deak@intel.com>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>,
	Intel-gfx@lists.freedesktop.org
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Subject: Re: [PATCH 2/2] drm/i915: Use atomic waits for short non-atomic ones
Date: Tue, 28 Jun 2016 20:45:03 +0300	[thread overview]
Message-ID: <1467135903.13066.13.camel@intel.com> (raw)
In-Reply-To: <57728BC9.1090701@linux.intel.com>

On Tue, 2016-06-28 at 15:38 +0100, Tvrtko Ursulin wrote:
> On 28/06/16 14:53, Imre Deak wrote:
> > On ti, 2016-06-28 at 14:29 +0100, Tvrtko Ursulin wrote:
> > > On 28/06/16 13:19, Imre Deak wrote:
> > > > On ti, 2016-06-28 at 12:51 +0100, Tvrtko Ursulin wrote:
> > > > > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > > > > 
> > > > > usleep_range is not recommended for waits shorten than 10us.
> > > > > 
> > > > > Make the wait_for_us use the atomic variant for such waits.
> > > > > 
> > > > > To do so we need to disable the !in_atomic warning for such uses
> > > > > and also disable preemption since the macro is written in a way
> > > > > to only be safe to be used in atomic context (local_clock() and
> > > > > no second COND check after the timeout).
> > > > > 
> > > > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > > > > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > > > > Cc: Imre Deak <imre.deak@intel.com>
> > > > > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > > > > ---
> > > > >    drivers/gpu/drm/i915/intel_drv.h | 29 +++++++++++++++++++++--------
> > > > >    1 file changed, 21 insertions(+), 8 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> > > > > index 3156d8df7921..e21bf6e6f119 100644
> > > > > --- a/drivers/gpu/drm/i915/intel_drv.h
> > > > > +++ b/drivers/gpu/drm/i915/intel_drv.h
> > > > > @@ -69,20 +69,21 @@
> > > > >    })
> > > > > 
> > > > >    #define wait_for(COND, MS)	  	_wait_for((COND), (MS) * 1000, 1000)
> > > > > -#define wait_for_us(COND, US)	  	_wait_for((COND), (US), 1)
> > > > > 
> > > > >    /* If CONFIG_PREEMPT_COUNT is disabled, in_atomic() always reports false. */
> > > > >    #if defined(CONFIG_DRM_I915_DEBUG) && defined(CONFIG_PREEMPT_COUNT)
> > > > > -# define _WAIT_FOR_ATOMIC_CHECK WARN_ON_ONCE(!in_atomic())
> > > > > +# define _WAIT_FOR_ATOMIC_CHECK(ATOMIC) WARN_ON_ONCE((ATOMIC) && !in_atomic())
> > > > >    #else
> > > > > -# define _WAIT_FOR_ATOMIC_CHECK do { } while (0)
> > > > > +# define _WAIT_FOR_ATOMIC_CHECK(ATOMIC) do { } while (0)
> > > > >    #endif
> > > > > 
> > > > > -#define _wait_for_atomic(COND, US) ({ \
> > > > > +#define _wait_for_atomic(COND, US, ATOMIC) ({ \
> > > > >    	unsigned long end__; \
> > > > >    	int ret__ = 0; \
> > > > > -	_WAIT_FOR_ATOMIC_CHECK; \
> > > > > -	BUILD_BUG_ON((US) > 50000); \
> > > > > +	_WAIT_FOR_ATOMIC_CHECK(ATOMIC); \
> > > > > +	BUILD_BUG_ON((ATOMIC) && (US) > 50000); \
> > > > > +	if (!(ATOMIC)) \
> > > > > +		preempt_disable(); \
> > > > 
> > > > Disabling preemption for this purpose (scheduling a timeout) could be
> > > > frowned upon, although for 10us may be not an issue. Another
> > > 
> > > Possibly, but I don't see how to otherwise do it.
> > > 
> > > And about the number itself - I chose 10us just because usleep_range is
> > > not recommended for <10us due setup overhead.
> > > 
> > > > possibility would be to use cpu_clock() instead which would have some
> > > > overhead in case of scheduling away from the initial CPU, but we'd only
> > > > incur it for the non-atomic <10us case, so would be negligible imo.
> > > > You'd also have to re-check the condition with that solution.
> > > 
> > > How would you implement it with cpu_clock? What would you do when
> > > re-scheduled?
> > 
> > By calculating the expiry in the beginning with cpu_clock()
> > using raw_smp_processor_id() and then calling cpu_clock() in
> > time_after() with the same CPU id. cpu_clock() would then internally
> > handle the scheduling away scenario.
> 
> Right, but that is also not ideal since if the two cpu_clocks differ the 
> running time domain is not identical to the timeout one.

Hm, this is what I meant:

int cpu = raw_smp_processor_id();
end = cpu_clock(cpu) + timeout;
while (!time_after(cpu_clock(cpu), end))
   check condition...

So cpu_clock() would be always called with the same CPU id and hence it
would return timestamps from the same time domain. In case of
scheduling away at any point cpu_clock() would internally detect this
and return the timestamp from the original CPU time domain.

This also has the advantage that in case of a stable, synchronized TSC
- which should be the generic case - cpu_clock() simply translates to
reading the TSC, without disabling pre-emption.

I don't see any problem with Chris' approach either though.

> Probably would not matter but feels hacky.
> 
> > > > Also could you explain how can we ignore hard IRQs as hinted by the
> > > > comment in _wait_for_atomic()?
> > > 
> > > Hm, in retrospect it does not look safe. Upside that after your fixes
> > > from today it will be, since all remaining callers are with interrupts
> > > disabled.
> > 
> > Well, except for the GuC path, but that's for a 10ms timeout, so
> > probably doesn't matter (or else we have a bigger problem).
> 
> I've just sent a patch for that.
> 
> > > And downside that the patch from this thread is not safe then
> > > and would need the condition put back in. Possibly only in the !ATOMIC
> > > case but that might be too fragile for the future.
> > 
> > I'd say we'd need the extra check at least whenever hard IRQs are not
> > disabled. Even then there could be NMIs or some other background stuff
> > (ME) that could be a problem. OTOH we'd incur the overhead from the
> > extra check only in the exceptional timeout case, so I think doing it
> > in all cases wouldn't be a big problem.
> 
> Yeah I'll put it in.
> 
> Regards,
> 
> Tvrtko
> 
> 
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx