Re: [RFC 3/4] drm/i915: Interrupt driven fences

From: Daniel Vetter <daniel@ffwll.ch>
To: John Harrison <John.C.Harrison@Intel.com>
Cc: Intel-GFX@Lists.FreeDesktop.Org
Subject: Re: [RFC 3/4] drm/i915: Interrupt driven fences
Date: Thu, 26 Mar 2015 14:22:57 +0100	[thread overview]
Message-ID: <20150326132257.GD24805@phenom.ffwll.local> (raw)
In-Reply-To: <55100384.4080909@Intel.com>

On Mon, Mar 23, 2015 at 12:13:56PM +0000, John Harrison wrote:
> On 23/03/2015 09:22, Daniel Vetter wrote:
> >On Fri, Mar 20, 2015 at 09:11:35PM +0000, Chris Wilson wrote:
> >>On Fri, Mar 20, 2015 at 05:48:36PM +0000, John.C.Harrison@Intel.com wrote:
> >>>From: John Harrison <John.C.Harrison@Intel.com>
> >>>
> >>>The intended usage model for struct fence is that the signalled status should be
> >>>set on demand rather than polled. That is, there should not be a need for a
> >>>'signaled' function to be called everytime the status is queried. Instead,
> >>>'something' should be done to enable a signal callback from the hardware which
> >>>will update the state directly. In the case of requests, this is the seqno
> >>>update interrupt. The idea is that this callback will only be enabled on demand
> >>>when something actually tries to wait on the fence.
> >>>
> >>>This change removes the polling test and replaces it with the callback scheme.
> >>>To avoid race conditions where signals can be sent before anyone is waiting for
> >>>them, it does not implement the callback on demand feature. When the GPU
> >>>scheduler arrives, it will need to know about the completion of every single
> >>>request anyway. So it is far simpler to not put in complex and messy anti-race
> >>>code in the first place given that it will not be needed in the future.
> >>>
> >>>Instead, each fence is added to a 'please poke me' list at the start of
> >>>i915_add_request(). This happens before the commands to generate the seqno
> >>>interrupt are added to the ring thus is guaranteed to be race free. The
> >>>interrupt handler then scans through the 'poke me' list when a new seqno pops
> >>>out and signals any matching fence/request. The fence is then removed from the
> >>>list so the entire request stack does not need to be scanned every time.
> >>No. Please let's not go back to the bad old days of generating an interrupt
> >>per batch, and doing a lot more work inside the interrupt handler.
> >Yeah, enable_signalling should be the place where we grab the interrupt
> >reference. Also that we shouldn't call this unconditionally, that pretty
> >much defeats the point of that fastpath optimization.
> >
> >Another complication is missed interrupts. If we detect those and someone
> >calls enable_signalling then we need to fire up a timer to wake up once
> >per jiffy and save stuck fences. To avoid duplication with the threaded
> >wait code we could remove the fallback wakeups from there and just rely on
> >that timer everywhere.
> >-Daniel
> 
> As has been discussed many times in many forums, the scheduler requires
> notification of each batch buffer's completion. It needs to know so that it
> can submit new work, keep dependencies of outstanding work up to date, etc.
> 
> Android is similar. With the native sync API, Android wants to be signaled
> about the completion of everything. Every single batch buffer submission
> comes with a request for a sync point that will be poked when that buffer
> completes. The kernel has no way of knowing which buffers are actually going
> to be waited on. There is no driver call anymore. User land simply waits on
> a file descriptor.
> 
> I don't see how we can get away without generating an interrupt per batch.

I've explained this a bit offline in a meeting, but here's finally the
mail version for the record. The reason we want to enable interrupts only
when needed is that interrupts don't scale. Looking around high throughput
pheriferals all try to avoid interrupts like the plague: netdev has
netpoll, block devices just gained the same because of ridiculously fast
ssds connected to pcie. And there's lots of people talking about insanely
tightly coupled gpu compute workloads (maybe not yet on intel gpus, but
it'll come).

Now I fully agree that unfortunately the execlist hw design isn't awesome
and there's no way around receiving and processing an interrupt per batch.
But the hw folks are working on fixing these overheads again (or at least
attempting using the guc, I haven't seen the new numbers yet) and old hw
without the scheduler works perfectly fine with interrupts mostly
disabled. So just because we currently have a suboptimal hw design is imo
not a good reason to throw all the on-demand interrupt enabling and
handling overboard. I fully expect that we'll need it again. And I think
it's easier to keep it working than to first kick it out and then rebuild
it again.

That's in a nutshell why I think we should keep all that machinery, even
though it won't be terribly useful for execlist (with or without the
scheduler).

Thanks, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx