From mboxrd@z Thu Jan 1 00:00:00 1970 From: Daniel Vetter Subject: Re: [PATCH] drm/i915: Wait for completion of pending flips when starved of fences Date: Mon, 20 Jan 2014 11:37:42 +0100 Message-ID: <20140120103742.GD15089@phenom.ffwll.local> References: <1390166413-9410-1-git-send-email-chris@chris-wilson.co.uk> <20140120094924.GB27650@nuc-i3427.alporthouse.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail-ee0-f50.google.com (mail-ee0-f50.google.com [74.125.83.50]) by gabe.freedesktop.org (Postfix) with ESMTP id 7E8ADFAE0D for ; Mon, 20 Jan 2014 02:37:48 -0800 (PST) Received: by mail-ee0-f50.google.com with SMTP id d17so3310489eek.37 for ; Mon, 20 Jan 2014 02:37:46 -0800 (PST) Content-Disposition: inline In-Reply-To: <20140120094924.GB27650@nuc-i3427.alporthouse.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: intel-gfx-bounces@lists.freedesktop.org Errors-To: intel-gfx-bounces@lists.freedesktop.org To: Chris Wilson , Daniel Vetter , intel-gfx List-Id: intel-gfx@lists.freedesktop.org On Mon, Jan 20, 2014 at 09:49:24AM +0000, Chris Wilson wrote: > On Sun, Jan 19, 2014 at 10:55:26PM +0100, Daniel Vetter wrote: > > On Sun, Jan 19, 2014 at 10:20 PM, Chris Wilson wrote: > > > On older generations (gen2, gen3) the GPU requires fences for many > > > operations, such as blits. The display hardware also requires fences for > > > scanouts and this leads to a situation where an arbitrary number of > > > fences may be pinned by old scanouts following a pageflip but before we > > > have executed the unpin workqueue. This is unpredictable by userspace > > > and leads to random EDEADLK when submitting an otherwise benign > > > execbuffer. However, we can detect when we have an outstanding flip and > > > so cause userspace to wait upon their completion before finally > > > declaring that the system is starved of fences. This is really no worse > > > than forcing the GPU to stall waiting for older execbuffer to retire and > > > release their fences before we can reallocate them for the next > > > execbuffer. > > > > > > Reported-and-tested-by: dimon@gmx.net > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73696 > > > Signed-off-by: Chris Wilson > > > > New subtest for kms_flip which submits such a blt buffer while a > > pageflip is still pending? > > Correct. > > > Also there's a certain chance we'll starve > > the unpin work, similar to the issues around flushing the unpin work > > in our pageflip implementation. > > If you mean that we will never run the unpin workqueue, that's what the > implementation will fix, eventually, after a busy-spin in userspace since > set_need_resched() was removed. I can teach userspace to yield() after > an EAGAIN which seems a reasonable compromise (userspace gets a bonus > for being cooperative rather than penalized for using up its timeslice.) yield won't help, we need to block on the work-queue draining like we do in the pageflip code with flush_workqueue. At least we've had bug reports in the past where someone found it intriguing to run his entire userspace with rt prio, which ended up starving the sched_normal workqueue and so livelocked the entire system. Instead of busy-looping through userspace with -EAGAIN I think we should keep all the unpin works on a spinlock-protected list and synchronously unpin the buffers in the get_fence and evict_something paths (after the flip completed, we've removed the unpin entry from the list and dropped the spinlock ofc). The only downside is that we have a notch more complexity since we need to manually check for gpu hangs and bail out correctly if there is one. Which means another kms_flip subtest, but that shouldn't be too much fuzz with the combinatorial testflags we already have. Since we don't have a test where rt threads starve our workers for the normal pageflip code I think we can eshew that part here, too. I'll add it to the i-g-t wishlist though for a rainy afternoon ;-) Cheers, Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch