Re: [PATCH] drm/i915: Use rcu instead of stop_machine

From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Daniel Vetter <daniel@ffwll.ch>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>,
	Intel Graphics Development <intel-gfx@lists.freedesktop.org>,
	Mika Kuoppala <mika.kuoppala@intel.com>,
	Daniel Vetter <daniel.vetter@intel.com>
Subject: Re: [PATCH] drm/i915: Use rcu instead of stop_machine
Date: Fri, 6 Oct 2017 09:30:57 +0100	[thread overview]
Message-ID: <bbd9b71d-897e-1dcf-cc81-e1a2847ebce8@linux.intel.com> (raw)
In-Reply-To: <20171005162445.hyrjntshdc26jfjj@phenom.ffwll.local>

On 05/10/2017 17:24, Daniel Vetter wrote:
> On Thu, Oct 05, 2017 at 03:55:19PM +0100, Tvrtko Ursulin wrote:
>>
>> On 05/10/2017 15:09, Daniel Vetter wrote:
>>> stop_machine is not really a locking primitive we should use, except
>>> when the hw folks tell us the hw is broken and that's the only way to
>>> work around it.
>>>
>>> This patch here is just a suggestion for how to fix it up, possible
>>> changes needed to make it actually work:
>>>
>>> - Set the nop_submit_request first for _all_ engines, before
>>>     proceeding.
>>>
>>> - Make sure engine->cancel_requests copes with the possibility that
>>>     not all tests have consistently used the new or old version. I dont
>>>     think this is a problem, since the same can happen really with the
>>>     stop_machine() locking - stop_machine also doesn't give you any kind
>>>     of global ordering against other cpu threads, it just makes them
>>>     stop.
>>>
>>> This patch tries to address the locking snafu from
>>>
>>> commit 20e4933c478a1ca694b38fa4ac44d99e659941f5
>>> Author: Chris Wilson <chris@chris-wilson.co.uk>
>>> Date:   Tue Nov 22 14:41:21 2016 +0000
>>>
>>>       drm/i915: Stop the machine as we install the wedged submit_request handler
>>>
>>> Chris said parts of the reasons for going with stop_machine() was that
>>> it's no overhead for the fast-path. But these callbacks use irqsave
>>> spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast.
>>>
>>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
>>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/i915_gem.c                   | 18 +++++-------------
>>>    drivers/gpu/drm/i915/i915_gem_request.c           |  2 ++
>>>    drivers/gpu/drm/i915/selftests/i915_gem_request.c |  2 ++
>>>    3 files changed, 9 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>>> index ab8c6946fea4..0b260e576b4b 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>> @@ -3022,13 +3022,13 @@ static void nop_submit_request(struct drm_i915_gem_request *request)
>>>    static void engine_set_wedged(struct intel_engine_cs *engine)
>>>    {
>>> +	engine->submit_request = nop_submit_request;
>>
>> Should this be rcu_assign_pointer?
> 
> Those provide additional barriers, needed when you change/allocate the
> stuff you're pointing to. We point to immutable functions, so shouldn't be
> necessary (and would be confusing imo).

Ah ok. Any barriers then? Or synchronize_rcu implies them all?

>>> +
>>>    	/* We need to be sure that no thread is running the old callback as
>>>    	 * we install the nop handler (otherwise we would submit a request
>>> -	 * to hardware that will never complete). In order to prevent this
>>> -	 * race, we wait until the machine is idle before making the swap
>>> -	 * (using stop_machine()).
>>> +	 * to hardware that will never complete).
>>>    	 */
>>> -	engine->submit_request = nop_submit_request;
>>> +	synchronize_rcu();
>>
>> Consumers of this are running in irq disabled or softirq. Does this mean we
>> would need synchronize_rcu_bh? Would either guarantee all tasklets and irq
>> handlers have exited?
> 
> Oh ... tbh I didn't even digg that deep (much less ran this stuff). This
> really is an RFC so people with real clue could say whether it has a
> chance of working or not.
> 
> Looking at rcu docs we don't want _bh variants, since rcu_read_lock should
> be safe in even hardirq context. _bh and _sched otoh require that all
> critical sections are either in bottom halfs or hardirq context, since
> they treat scheduling of those as a grace period.

rcu_read_unlock might schedule (via preempt_enable) so I don't think we 
can use them from the fence callbacks.

And _bh is indeed only for softirq while we need hard and soft. So I am 
not sure which one we could use.

It sounds to me any would be wrong and if we wanted to drop stop_machine 
we would simply have to use nothing. But then we couldn't be certain 
there are no more new requests queued after wedged has been set.

Maybe I am missing something, not sure.
  Regards,

Tvrtko

> Cheers, Daniel
> 
>>>    	/* Mark all executing requests as skipped */
>>>    	engine->cancel_requests(engine);
>>> @@ -3041,9 +3041,8 @@ static void engine_set_wedged(struct intel_engine_cs *engine)
>>>    				       intel_engine_last_submit(engine));
>>>    }
>>> -static int __i915_gem_set_wedged_BKL(void *data)
>>> +void i915_gem_set_wedged(struct drm_i915_private *i915)
>>>    {
>>> -	struct drm_i915_private *i915 = data;
>>>    	struct intel_engine_cs *engine;
>>>    	enum intel_engine_id id;
>>> @@ -3052,13 +3051,6 @@ static int __i915_gem_set_wedged_BKL(void *data)
>>>    	set_bit(I915_WEDGED, &i915->gpu_error.flags);
>>>    	wake_up_all(&i915->gpu_error.reset_queue);
>>> -
>>> -	return 0;
>>> -}
>>> -
>>> -void i915_gem_set_wedged(struct drm_i915_private *dev_priv)
>>> -{
>>> -	stop_machine(__i915_gem_set_wedged_BKL, dev_priv, NULL);
>>>    }
>>>    bool i915_gem_unset_wedged(struct drm_i915_private *i915)
>>> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
>>> index b100b38f1dd2..ef78a85cb845 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_request.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem_request.c
>>> @@ -556,7 +556,9 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
>>>    	switch (state) {
>>>    	case FENCE_COMPLETE:
>>>    		trace_i915_gem_request_submit(request);
>>> +		rcu_read_lock();
>>>    		request->engine->submit_request(request);
>>> +		rcu_read_unlock();
>>
>> And _bh for these? Although this already runs with preemption off, but I
>> guess it is good for documentation.
>>
>> Regards,
>>
>> Tvrtko
>>
>>>    		break;
>>>    	case FENCE_FREE:
>>> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_request.c b/drivers/gpu/drm/i915/selftests/i915_gem_request.c
>>> index 78b9f811707f..a999161e8db1 100644
>>> --- a/drivers/gpu/drm/i915/selftests/i915_gem_request.c
>>> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_request.c
>>> @@ -215,7 +215,9 @@ static int igt_request_rewind(void *arg)
>>>    	}
>>>    	i915_gem_request_get(vip);
>>>    	i915_add_request(vip);
>>> +	rcu_read_lock();
>>>    	request->engine->submit_request(request);
>>> +	rcu_read_unlock();
>>>    	mutex_unlock(&i915->drm.struct_mutex);
>>>
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx