From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ben Widawsky Subject: Re: [PATCH] drm/i915: Make vm eviction uninterruptible Date: Mon, 7 Apr 2014 11:58:28 -0700 Message-ID: <20140407185828.GA18726@bwidawsk.net> References: <1396728482-25402-1-git-send-email-benjamin.widawsky@intel.com> <20140405203412.GH8475@nuc-i3427.alporthouse.com> <20140406024527.GA31750@intel.com> <20140406183502.GA4414@bwidawsk.net> <20140407094256.GI8475@nuc-i3427.alporthouse.com> <20140407121500.GE9262@phenom.ffwll.local> <20140407123004.GJ8475@nuc-i3427.alporthouse.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail.bwidawsk.net (bwidawsk.net [166.78.191.112]) by gabe.freedesktop.org (Postfix) with ESMTP id D5E786E6B5 for ; Mon, 7 Apr 2014 11:58:35 -0700 (PDT) Content-Disposition: inline In-Reply-To: <20140407123004.GJ8475@nuc-i3427.alporthouse.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" To: Chris Wilson , Daniel Vetter , Ben Widawsky , Intel GFX List-Id: intel-gfx@lists.freedesktop.org On Mon, Apr 07, 2014 at 01:30:04PM +0100, Chris Wilson wrote: > On Mon, Apr 07, 2014 at 02:15:00PM +0200, Daniel Vetter wrote: > > On Mon, Apr 07, 2014 at 10:42:56AM +0100, Chris Wilson wrote: > > > On Sun, Apr 06, 2014 at 11:35:03AM -0700, Ben Widawsky wrote: > > > > On Sat, Apr 05, 2014 at 07:45:28PM -0700, Ben Widawsky wrote: > > > > > The issue I was seeing appeared to seeing from sigkill. In such a case, > > > > > the process may want to die before the context/work/address space is > > > > > freeable. For example: > > > > > 1. evict_vm called for whatever reason > > > > > 2. wait_seqno because the VMA is still active > > > > > > > > hmm something isn't right here. Why did I get to wait_seqno if pin_count > > > > was 0? Just FYI, this wasn't hypothetical. I did trace it all the way to > > > > exactly ERESTARTSYS from wait_seqno. > > > > > > > > By the way, another option in evict would be: > > > > while(ret = (i915_vma_unbind(vma) == -ERESTARTSYS)); > > > > WARN_ON(ret); > > > > > > > > > 3. receive signal break out of wait_seqno > > > > > 4. return to evict_vm and the above WARN > > > > > > > > > > Our error handling from there just spirals. > > > > > > > > > > One issue I have with our current code is I'd really like eviction to > > > > > not be able to fail (obviously extreme cases are unavoidable). > > > > > > This is unrealistic since we must support X which uses sigtimer. > > > > > > > > Perhaps > > > > > one other solution would be to make sure the context is idled before > > > > > evicting its VM. > > > > > > Indeed. > > > > > > Anyway, I do concur that wrapping i915_driver_preclose() with > > > > > > dev_priv->mm.interruptible = false; > > > > > > would make us both happy. > > > > Isn't the backtrace just fallout from the lifetime rules being a bit > > funny? We didn't uninterruptibly stall for any still active bo when the > > drm fd gets closed, why do we suddenly need to do that with ppgtts? Iirc > > requests hold a ref on the context, contexts hold a ref on the ppgtt and > > so the entire thing should only dissipate once it's really idle. > > > > Imo just doing uninterruptible sleeps tastes way too much like duct-tape. > > I can be convinced of duct-tape if the tradeoffs really strongly suggests > > it's the right thing (e.g. the shrinker lock stealing, even though we've > > paid a hefty price in accidental complexity with that one), but that needs > > some good justification. > > Yes, it is duct-tape. But it should be duct-tape against future unknown > bugs (and the currently known bugs) in that the i915_driver_preclose() > cannot report failure and so should not allow its callees to fail (which > is more or less the contract given by .interruptible=false). > > The alternative is to allow preclose() to support an error-code, which > has the issue that very few programs check for errors during close() and > that EINTR from close() is frowned upon by most. > -Chris > Do we have consensus? I am good with Chris' idea. I can write and test the patch. -- Ben Widawsky, Intel Open Source Technology Center