From mboxrd@z Thu Jan 1 00:00:00 1970 From: Lukas Wunner Subject: Re: [PATCH 1/5] drm/nouveau: Prevent RPM callback recursion in suspend/resume paths Date: Tue, 17 Jul 2018 20:20:41 +0200 Message-ID: <20180717182041.GA18363@wunner.de> References: <20180716235936.11268-1-lyude@redhat.com> <20180716235936.11268-2-lyude@redhat.com> <20180717071641.GA5411@wunner.de> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Return-path: Content-Disposition: inline In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: nouveau-bounces-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org Sender: "Nouveau" To: Lyude Paul Cc: linux-pm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, David Airlie , nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org, Ben Skeggs List-Id: linux-pm@vger.kernel.org T24gVHVlLCBKdWwgMTcsIDIwMTggYXQgMTI6NTM6MTFQTSAtMDQwMCwgTHl1ZGUgUGF1bCB3cm90 ZToKPiBPbiBUdWUsIDIwMTgtMDctMTcgYXQgMDk6MTYgKzAyMDAsIEx1a2FzIFd1bm5lciB3cm90 ZToKPiA+IE9uIE1vbiwgSnVsIDE2LCAyMDE4IGF0IDA3OjU5OjI1UE0gLTA0MDAsIEx5dWRlIFBh dWwgd3JvdGU6Cj4gPiA+IEluIG9yZGVyIHRvIGZpeCBhbGwgb2YgdGhlIHNwb3RzIHRoYXQgbmVl ZCB0byBoYXZlIHJ1bnRpbWUgUE0gZ2V0L3B1dHMoKQo+ID4gPiBhZGRlZCwgd2UgbmVlZCB0byBl bnN1cmUgdGhhdCBpdCdzIHBvc3NpYmxlIGZvciB1cyB0byBjYWxsCj4gPiA+IHBtX3J1bnRpbWVf Z2V0L3B1dCgpIGluIGFueSBjb250ZXh0LCByZWdhcmRsZXNzIG9mIGhvdyBkZWVwLCBzaW5jZQo+ ID4gPiBhbG1vc3QgYWxsIG9mIHRoZSBzcG90cyB0aGF0IGFyZSBjdXJyZW50bHkgbWlzc2luZyBy ZWZzIGNhbiBwb3RlbnRpYWxseQo+ID4gPiBnZXQgY2FsbGVkIGluIHRoZSBydW50aW1lIHN1c3Bl bmQvcmVzdW1lIHBhdGguIE90aGVyd2lzZSwgd2UnbGwgdHJ5IHRvCj4gPiA+IHJlc3VtZSB0aGUg R1BVIGFzIHdlJ3JlIHRyeWluZyB0byByZXN1bWUgdGhlIEdQVSAoYW5kIHZpY2UtdmVyc2EpIGFu ZAo+ID4gPiBjYXVzZSB0aGUga2VybmVsIHRvIGRlYWRsb2NrLgo+ID4gPiAKPiA+ID4gV2l0aCB0 aGlzLCBpdCBzaG91bGQgYmUgc2FmZSB0byBjYWxsIHRoZSBwbSBydW50aW1lIGZ1bmN0aW9ucyBp biBhbnkKPiA+ID4gY29udGV4dCBpbiBub3V2ZWF1IHdpdGggb25lIGNvbmRpdGlvbjogYW55IHBv aW50IGluIHRoZSBkcml2ZXIgdGhhdAo+ID4gPiBjYWxscyBwbV9ydW50aW1lX2dldCooKSBjYW5u b3QgaG9sZCBhbnkgbG9ja3Mgb3duZWQgYnkgbm91dmVhdSB0aGF0Cj4gPiA+IHdvdWxkIGJlIGFj cXVpcmVkIGFueXdoZXJlIGluc2lkZSBub3V2ZWF1X3Btb3BzX3J1bnRpbWVfcmVzdW1lKCkuCj4g PiA+IFRoaXMgaW5jbHVkZXMgbW9kZXNldHRpbmcgbG9ja3MsIGkyYyBidXMgbG9ja3MsIGV0Yy4K PiA+IAo+ID4gW3NuaXBdCj4gPiA+IC0tLSBhL2RyaXZlcnMvZ3B1L2RybS9ub3V2ZWF1L25vdXZl YXVfZHJtLmMKPiA+ID4gKysrIGIvZHJpdmVycy9ncHUvZHJtL25vdXZlYXUvbm91dmVhdV9kcm0u Ywo+ID4gPiBAQCAtODM1LDYgKzgzNSw4IEBAIG5vdXZlYXVfcG1vcHNfcnVudGltZV9zdXNwZW5k KHN0cnVjdCBkZXZpY2UgKmRldikKPiA+ID4gIAkJcmV0dXJuIC1FQlVTWTsKPiA+ID4gIAl9Cj4g PiA+ICAKPiA+ID4gKwlkZXYtPnBvd2VyLmRpc2FibGVfZGVwdGgrKzsKPiA+ID4gKwo+ID4gCj4g PiBBbnl3YXksIGlmIEkgdW5kZXJzdGFuZCB0aGUgY29tbWl0IG1lc3NhZ2UgY29ycmVjdGx5LCB5 b3UncmUgaGl0dGluZyBhCj4gPiBwbV9ydW50aW1lX2dldF9zeW5jKCkgaW4gYSBjb2RlIHBhdGgg dGhhdCBpdHNlbGYgaXMgY2FsbGVkIGR1cmluZyBhCj4gPiBwbV9ydW50aW1lX2dldF9zeW5jKCku ICBDb3VsZCB5b3UgaW5jbHVkZSBzdGFjayB0cmFjZXMgaW4gdGhlIGNvbW1pdAo+ID4gbWVzc2Fn ZT8gIE15IGd1dCBmZWVsaW5nIGlzIHRoYXQgdGhpcyBwYXRjaCBtYXNrcyBhIGRlZXBlciBpc3N1 ZSwKPiA+IGUuZy4gaWYgdGhlIHJ1bnRpbWVfcmVzdW1lIGNvZGUgcGF0aCBkb2VzIGluIGZhY3Qg ZGlyZWN0bHkgcG9sbCBvdXRwdXRzLAo+ID4gdGhhdCB3b3VsZCBzZWVtIHdyb25nLiAgUnVudGlt ZSByZXN1bWUgc2hvdWxkIG1lcmVseSBtYWtlIHRoZSBjYXJkCj4gPiBhY2Nlc3NpYmxlLCBpLmUu IHJlaW5zdGF0ZSBwb3dlciBpZiBuZWNlc3NhcnksIHB1dCBpbnRvIFBDSV9EMCwKPiA+IHJlc3Rv cmUgcmVnaXN0ZXJzLCBldGMuICBPdXRwdXQgcG9sbGluZyBzaG91bGQgYmUgc2NoZWR1bGVkCj4g PiBhc3luY2hyb25vdXNseS4KPiAKPiBTbzogdGhlIHJlYXNvbiB0aGF0IHBhdGNoIHdhcyBhZGRl ZCB3YXMgbWFpbmx5IGZvciB0aGUgcGF0Y2hlcyBsYXRlciBpbiB0aGUKPiBzZXJpZXMgdGhhdCBh ZGQgZ3VhcmRzIGFyb3VuZCB0aGUgaTJjIGJ1cyBhbmQgYXV4IGJ1cywgc2luY2UgYm90aCBvZiB0 aG9zZQo+IHJlcXVpcmUgdGhhdCB0aGUgZGV2aWNlIGJlIGF3YWtlIGZvciBpdCB0byB3b3JrLiBD dXJyZW50bHksIHRoZSBzcG90IHdoZXJlIGl0Cj4gd291bGQgcmVjdXJzZSBpczoKCk9rYXksIHRo ZSBQQ0kgZGV2aWNlIGlzIHN1c3BlbmRpbmcgYW5kIHRoZSBudmttX2kyY19hdXhfYWNxdWlyZSgp CndhbnRzIGl0IGluIHJlc3VtZWQgc3RhdGUsIHNvIGlzIHdhaXRpbmcgZm9yZXZlciBmb3IgdGhl IGRldmljZSB0bwpydW50aW1lIHN1c3BlbmQgaW4gb3JkZXIgdG8gcmVzdW1lIGl0IGFnYWluIGlt bWVkaWF0ZWx5IGFmdGVyd2FyZHMuCgpUaGUgZGVhZGxvY2sgaW4gdGhlIHN0YWNrIHRyYWNlIHlv dSd2ZSBwb3N0ZWQgY291bGQgYmUgcmVzb2x2ZWQgdXNpbmcKdGhlIHRlY2huaXF1ZSBJIHVzZWQg aW4gZDYxYTVjMTA2MzUxIGJ5IGFkZGluZyB0aGUgZm9sbG93aW5nIHRvCmluY2x1ZGUvbGludXgv cG1fcnVudGltZS5oOgoKc3RhdGljIGlubGluZSBib29sIHBtX3J1bnRpbWVfc3RhdHVzX3N1c3Bl bmRpbmcoc3RydWN0IGRldmljZSAqZGV2KQp7CglyZXR1cm4gZGV2LT5wb3dlci5ydW50aW1lX3N0 YXR1cyA9PSBSUE1fU1VTUEVORElORzsKfQoKc3RhdGljIGlubGluZSBib29sIGlzX3BtX3dvcmso c3RydWN0IGRldmljZSAqZGV2KQp7CglzdHJ1Y3Qgd29ya19zdHJ1Y3QgKndvcmsgPSBjdXJyZW50 X3dvcmsoKTsKCglyZXR1cm4gd29yayAmJiB3b3JrLT5mdW5jID09IGRldi0+cG93ZXIud29yazsK fQoKVGhlbiBhZGRpbmcgdGhpcyB0byBudmttX2kyY19hdXhfYWNxdWlyZSgpOgoKCXN0cnVjdCBk ZXZpY2UgKmRldiA9IHBhZC0+aTJjLT5zdWJkZXYuZGV2aWNlLT5kZXY7CgoJaWYgKCEoaXNfcG1f d29yayhkZXYpICYmIHBtX3J1bnRpbWVfc3RhdHVzX3N1c3BlbmRpbmcoZGV2KSkpIHsKCQlyZXQg PSBwbV9ydW50aW1lX2dldF9zeW5jKGRldik7CgkJaWYgKHJldCA8IDAgJiYgcmV0ICE9IC1FQUND RVMpCgkJCXJldHVybiByZXQ7Cgl9CgpCdXQgaGVyZSdzIHRoZSBjYXRjaDogIFRoaXMgb25seSB3 b3JrcyBmb3IgYW4gKmFzeW5jKiBydW50aW1lIHN1c3BlbmQuCkl0IGRvZXNuJ3Qgd29yayBmb3Ig cG1fcnVudGltZV9wdXRfc3luYygpLCBwbV9ydW50aW1lX3N1c3BlbmQoKSBldGMsCmJlY2F1c2Ug dGhlbiB0aGUgcnVudGltZSBzdXNwZW5kIGlzIGV4ZWN1dGVkIGluIHRoZSBjb250ZXh0IG9mIHRo ZSBjYWxsZXIsCm5vdCBpbiB0aGUgY29udGV4dCBvZiBkZXYtPnBvd2VyLndvcmsuCgpTbyBpdCdz IG5vdCBhIGZ1bGwgc29sdXRpb24sIGJ1dCBob3BlZnVsbHkgc29tZXRoaW5nIHRoYXQgZ2V0cyB5 b3UKZ29pbmcuICBJJ20gbm90IHJlYWxseSBmYW1pbGlhciB3aXRoIHRoZSBjb2RlIHBhdGhzIGxl YWRpbmcgdG8KbnZrbV9pMmNfYXV4X2FjcXVpcmUoKSB0byBjb21lIHVwIHdpdGggYSBmdWxsIHNv bHV0aW9uIG9mZiB0aGUgdG9wCm9mIG15IGhlYWQgSSdtIGFmcmFpZC4KCk5vdGUsIGl0J3Mgbm90 IHN1ZmZpY2llbnQgdG8ganVzdCBjaGVjayBwbV9ydW50aW1lX3N0YXR1c19zdXNwZW5kaW5nKGRl dikKYmVjYXVzZSBpZiB0aGUgcnVudGltZV9zdXNwZW5kIGlzIGNhcnJpZWQgb3V0IGNvbmN1cnJl bnRseSBieSBzb21ldGhpbmcKZWxzZSwgdGhpcyB3aWxsIHJldHVybiB0cnVlIGJ1dCBpdCdzIG5v dCBndWFyYW50ZWVkIHRoYXQgdGhlIGRldmljZSBpcwphY3R1YWxseSBrZXB0IGF3YWtlIHVudGls IHRoZSBpMmMgY29tbXVuaWNhdGlvbiBoYXMgYmVlbiBmdWxseSBwZXJmb3JtZWQuCgpIVEgsCgpM dWthcwpfX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXwpOb3V2 ZWF1IG1haWxpbmcgbGlzdApOb3V2ZWF1QGxpc3RzLmZyZWVkZXNrdG9wLm9yZwpodHRwczovL2xp c3RzLmZyZWVkZXNrdG9wLm9yZy9tYWlsbWFuL2xpc3RpbmZvL25vdXZlYXUK From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.3 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2671AECDFB3 for ; Tue, 17 Jul 2018 18:20:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id DDA1C2075E for ; Tue, 17 Jul 2018 18:20:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DDA1C2075E Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=wunner.de Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730598AbeGQSyg (ORCPT ); Tue, 17 Jul 2018 14:54:36 -0400 Received: from bmailout2.hostsharing.net ([83.223.90.240]:53037 "EHLO bmailout2.hostsharing.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729759AbeGQSyf (ORCPT ); Tue, 17 Jul 2018 14:54:35 -0400 Received: from h08.hostsharing.net (h08.hostsharing.net [83.223.95.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.hostsharing.net", Issuer "COMODO RSA Domain Validation Secure Server CA" (not verified)) by bmailout2.hostsharing.net (Postfix) with ESMTPS id 492CD2800B48A; Tue, 17 Jul 2018 20:20:42 +0200 (CEST) Received: by h08.hostsharing.net (Postfix, from userid 100393) id E984A227519; Tue, 17 Jul 2018 20:20:41 +0200 (CEST) Date: Tue, 17 Jul 2018 20:20:41 +0200 From: Lukas Wunner To: Lyude Paul Cc: nouveau@lists.freedesktop.org, David Airlie , linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, Ben Skeggs , linux-pm@vger.kernel.org Subject: Re: [Nouveau] [PATCH 1/5] drm/nouveau: Prevent RPM callback recursion in suspend/resume paths Message-ID: <20180717182041.GA18363@wunner.de> References: <20180716235936.11268-1-lyude@redhat.com> <20180716235936.11268-2-lyude@redhat.com> <20180717071641.GA5411@wunner.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 17, 2018 at 12:53:11PM -0400, Lyude Paul wrote: > On Tue, 2018-07-17 at 09:16 +0200, Lukas Wunner wrote: > > On Mon, Jul 16, 2018 at 07:59:25PM -0400, Lyude Paul wrote: > > > In order to fix all of the spots that need to have runtime PM get/puts() > > > added, we need to ensure that it's possible for us to call > > > pm_runtime_get/put() in any context, regardless of how deep, since > > > almost all of the spots that are currently missing refs can potentially > > > get called in the runtime suspend/resume path. Otherwise, we'll try to > > > resume the GPU as we're trying to resume the GPU (and vice-versa) and > > > cause the kernel to deadlock. > > > > > > With this, it should be safe to call the pm runtime functions in any > > > context in nouveau with one condition: any point in the driver that > > > calls pm_runtime_get*() cannot hold any locks owned by nouveau that > > > would be acquired anywhere inside nouveau_pmops_runtime_resume(). > > > This includes modesetting locks, i2c bus locks, etc. > > > > [snip] > > > --- a/drivers/gpu/drm/nouveau/nouveau_drm.c > > > +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c > > > @@ -835,6 +835,8 @@ nouveau_pmops_runtime_suspend(struct device *dev) > > > return -EBUSY; > > > } > > > > > > + dev->power.disable_depth++; > > > + > > > > Anyway, if I understand the commit message correctly, you're hitting a > > pm_runtime_get_sync() in a code path that itself is called during a > > pm_runtime_get_sync(). Could you include stack traces in the commit > > message? My gut feeling is that this patch masks a deeper issue, > > e.g. if the runtime_resume code path does in fact directly poll outputs, > > that would seem wrong. Runtime resume should merely make the card > > accessible, i.e. reinstate power if necessary, put into PCI_D0, > > restore registers, etc. Output polling should be scheduled > > asynchronously. > > So: the reason that patch was added was mainly for the patches later in the > series that add guards around the i2c bus and aux bus, since both of those > require that the device be awake for it to work. Currently, the spot where it > would recurse is: Okay, the PCI device is suspending and the nvkm_i2c_aux_acquire() wants it in resumed state, so is waiting forever for the device to runtime suspend in order to resume it again immediately afterwards. The deadlock in the stack trace you've posted could be resolved using the technique I used in d61a5c106351 by adding the following to include/linux/pm_runtime.h: static inline bool pm_runtime_status_suspending(struct device *dev) { return dev->power.runtime_status == RPM_SUSPENDING; } static inline bool is_pm_work(struct device *dev) { struct work_struct *work = current_work(); return work && work->func == dev->power.work; } Then adding this to nvkm_i2c_aux_acquire(): struct device *dev = pad->i2c->subdev.device->dev; if (!(is_pm_work(dev) && pm_runtime_status_suspending(dev))) { ret = pm_runtime_get_sync(dev); if (ret < 0 && ret != -EACCES) return ret; } But here's the catch: This only works for an *async* runtime suspend. It doesn't work for pm_runtime_put_sync(), pm_runtime_suspend() etc, because then the runtime suspend is executed in the context of the caller, not in the context of dev->power.work. So it's not a full solution, but hopefully something that gets you going. I'm not really familiar with the code paths leading to nvkm_i2c_aux_acquire() to come up with a full solution off the top of my head I'm afraid. Note, it's not sufficient to just check pm_runtime_status_suspending(dev) because if the runtime_suspend is carried out concurrently by something else, this will return true but it's not guaranteed that the device is actually kept awake until the i2c communication has been fully performed. HTH, Lukas