From mboxrd@z Thu Jan 1 00:00:00 1970 From: Daniel Vetter Subject: Re: [PATCH] drm/i915: remove user GTT mappings early during runtime suspend Date: Tue, 6 May 2014 21:27:37 +0200 Message-ID: <20140506192737.GD5730@phenom.ffwll.local> References: <1399375730-4355-1-git-send-email-imre.deak@intel.com> <20140506114011.GB7322@nuc-i3427.alporthouse.com> <1399376546.19761.1.camel@intelbox> <20140506115931.GC7322@nuc-i3427.alporthouse.com> <1399387561.19761.31.camel@intelbox> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail-ee0-f43.google.com (mail-ee0-f43.google.com [74.125.83.43]) by gabe.freedesktop.org (Postfix) with ESMTP id 7519E6E963 for ; Tue, 6 May 2014 12:27:43 -0700 (PDT) Received: by mail-ee0-f43.google.com with SMTP id d17so59013eek.16 for ; Tue, 06 May 2014 12:27:42 -0700 (PDT) Content-Disposition: inline In-Reply-To: <1399387561.19761.31.camel@intelbox> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" To: Imre Deak Cc: intel-gfx@lists.freedesktop.org List-Id: intel-gfx@lists.freedesktop.org On Tue, May 06, 2014 at 05:46:01PM +0300, Imre Deak wrote: > On Tue, 2014-05-06 at 12:59 +0100, Chris Wilson wrote: > > On Tue, May 06, 2014 at 02:42:26PM +0300, Imre Deak wrote: > > > On Tue, 2014-05-06 at 12:40 +0100, Chris Wilson wrote: > > > > On Tue, May 06, 2014 at 02:28:50PM +0300, Imre Deak wrote: > > > > > Currently user space can access GEM buffers mapped to GTT through > > > > > existing mappings concurrently while the platform specific suspend > > > > > handlers are running. Since these handlers may change the HW state in a > > > > > way that would break such accesses, remove the mappings before calling > > > > > the handlers. > > > > > > > > Hmm, but you never locked the device, so what is preventing those > > > > concurrent accesses from refaulting in GTT entires anyway. Please explain > > > > the context under which the runtime suspend code executes, and leave > > > > that explanation within easy reach of intel_runtime_suspend() - > > > > preferrably with testing of those assumptions. > > > > > > During faulting we take an RPM reference, so that avoids concurrent > > > re-faults. I could add a comment about this to the code. > > > > You are still iterating over lists that are not safe, right? Or are > > there more magic rpm references that prevent ioctls whilst > > intel_runtime_suspend is being called? > > Tbh I haven't checked this, since moving the unmapping earlier doesn't > make a difference in this regard. > > But it's a good point, I tried to audit now those paths. Currently the > assumption is that we hold an RPM reference everywhere where those lists > are changed. On the exec buffer path this is true, but for example in > i915_drop_caches_set() it's not. > > We could fix this by taking struct_mutex around > i915_gem_release_all_mmaps() in intel_runtime_suspend(). Doing that > needs some more auditing to make sure we can't deadlock somehow. At > first glance it seems that the driver always schedules a deferred work > and calls intel_runtime_suspend() from that, so I think it's fine. > > I suggest applying this patch regardless since the two issues are > separate. If I understand the situation correctly the runtime suspend function only ever gets called from a worker thread after the hysteris timeout expired. Which means we should be able to wrap _just_ the gtt pte shotdown with dev->struct_mutex and nothing else. Which is good since profileration of dev->struct_mutex is awful. On the resume side we don't need any locking since the gtt fault handler will first grab the runtime reference and also dev->struct_mutex. One issue which is looming is that this might deadlock. We might need a trylock in the runtime suspend function and abort the runtime suspend if we can't get the lock. Please test that lockdep catches this before we commit to a design. Just a very quick analysis, I didn't check the details so this might be horribly wrong. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch