From mboxrd@z Thu Jan 1 00:00:00 1970 From: Lukas Wunner Subject: Re: [PATCH 1/5] drm/nouveau: Prevent RPM callback recursion in suspend/resume paths Date: Wed, 18 Jul 2018 09:47:10 +0200 Message-ID: <20180718074710.GA16072@wunner.de> References: <20180716235936.11268-1-lyude@redhat.com> <20180716235936.11268-2-lyude@redhat.com> <20180717071641.GA5411@wunner.de> <20180717182041.GA18363@wunner.de> <20180717183211.GB18363@wunner.de> <2dbe75b1a83c025b9cddc229dbca9af6fb30111e.camel@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Return-path: Content-Disposition: inline In-Reply-To: <2dbe75b1a83c025b9cddc229dbca9af6fb30111e.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: nouveau-bounces-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org Sender: "Nouveau" To: Lyude Paul Cc: linux-pm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, David Airlie , nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org, Ben Skeggs List-Id: linux-pm@vger.kernel.org T24gVHVlLCBKdWwgMTcsIDIwMTggYXQgMDI6MzQ6NDdQTSAtMDQwMCwgTHl1ZGUgUGF1bCB3cm90 ZToKPiBPbiBUdWUsIDIwMTgtMDctMTcgYXQgMjA6MzIgKzAyMDAsIEx1a2FzIFd1bm5lciB3cm90 ZToKPiA+IE9uIFR1ZSwgSnVsIDE3LCAyMDE4IGF0IDAyOjI0OjMxUE0gLTA0MDAsIEx5dWRlIFBh dWwgd3JvdGU6Cj4gPiA+IE9uIFR1ZSwgMjAxOC0wNy0xNyBhdCAyMDoyMCArMDIwMCwgTHVrYXMg V3VubmVyIHdyb3RlOgo+ID4gPiA+IE9rYXksIHRoZSBQQ0kgZGV2aWNlIGlzIHN1c3BlbmRpbmcg YW5kIHRoZSBudmttX2kyY19hdXhfYWNxdWlyZSgpCj4gPiA+ID4gd2FudHMgaXQgaW4gcmVzdW1l ZCBzdGF0ZSwgc28gaXMgd2FpdGluZyBmb3JldmVyIGZvciB0aGUgZGV2aWNlIHRvCj4gPiA+ID4g cnVudGltZSBzdXNwZW5kIGluIG9yZGVyIHRvIHJlc3VtZSBpdCBhZ2FpbiBpbW1lZGlhdGVseSBh ZnRlcndhcmRzLgo+ID4gPiA+IAo+ID4gPiA+IFRoZSBkZWFkbG9jayBpbiB0aGUgc3RhY2sgdHJh Y2UgeW91J3ZlIHBvc3RlZCBjb3VsZCBiZSByZXNvbHZlZCB1c2luZwo+ID4gPiA+IHRoZSB0ZWNo bmlxdWUgSSB1c2VkIGluIGQ2MWE1YzEwNjM1MSBieSBhZGRpbmcgdGhlIGZvbGxvd2luZyB0bwo+ ID4gPiA+IGluY2x1ZGUvbGludXgvcG1fcnVudGltZS5oOgo+ID4gPiA+IAo+ID4gPiA+IHN0YXRp YyBpbmxpbmUgYm9vbCBwbV9ydW50aW1lX3N0YXR1c19zdXNwZW5kaW5nKHN0cnVjdCBkZXZpY2Ug KmRldikKPiA+ID4gPiB7Cj4gPiA+ID4gCXJldHVybiBkZXYtPnBvd2VyLnJ1bnRpbWVfc3RhdHVz ID09IFJQTV9TVVNQRU5ESU5HOwo+ID4gPiA+IH0KPiA+ID4gPiAKPiA+ID4gPiBzdGF0aWMgaW5s aW5lIGJvb2wgaXNfcG1fd29yayhzdHJ1Y3QgZGV2aWNlICpkZXYpCj4gPiA+ID4gewo+ID4gPiA+ IAlzdHJ1Y3Qgd29ya19zdHJ1Y3QgKndvcmsgPSBjdXJyZW50X3dvcmsoKTsKPiA+ID4gPiAKPiA+ ID4gPiAJcmV0dXJuIHdvcmsgJiYgd29yay0+ZnVuYyA9PSBkZXYtPnBvd2VyLndvcms7Cj4gPiA+ ID4gfQo+ID4gPiA+IAo+ID4gPiA+IFRoZW4gYWRkaW5nIHRoaXMgdG8gbnZrbV9pMmNfYXV4X2Fj cXVpcmUoKToKPiA+ID4gPiAKPiA+ID4gPiAJc3RydWN0IGRldmljZSAqZGV2ID0gcGFkLT5pMmMt PnN1YmRldi5kZXZpY2UtPmRldjsKPiA+ID4gPiAKPiA+ID4gPiAJaWYgKCEoaXNfcG1fd29yayhk ZXYpICYmIHBtX3J1bnRpbWVfc3RhdHVzX3N1c3BlbmRpbmcoZGV2KSkpIHsKPiA+ID4gPiAJCXJl dCA9IHBtX3J1bnRpbWVfZ2V0X3N5bmMoZGV2KTsKPiA+ID4gPiAJCWlmIChyZXQgPCAwICYmIHJl dCAhPSAtRUFDQ0VTKQo+ID4gPiA+IAkJCXJldHVybiByZXQ7Cj4gPiA+ID4gCX0KPiA+ID4gPiAK PiA+ID4gPiBCdXQgaGVyZSdzIHRoZSBjYXRjaDogIFRoaXMgb25seSB3b3JrcyBmb3IgYW4gKmFz eW5jKiBydW50aW1lIHN1c3BlbmQuCj4gPiA+ID4gSXQgZG9lc24ndCB3b3JrIGZvciBwbV9ydW50 aW1lX3B1dF9zeW5jKCksIHBtX3J1bnRpbWVfc3VzcGVuZCgpIGV0YywKPiA+ID4gPiBiZWNhdXNl IHRoZW4gdGhlIHJ1bnRpbWUgc3VzcGVuZCBpcyBleGVjdXRlZCBpbiB0aGUgY29udGV4dCBvZiB0 aGUgY2FsbGVyLAo+ID4gPiA+IG5vdCBpbiB0aGUgY29udGV4dCBvZiBkZXYtPnBvd2VyLndvcmsu CltzbmlwXQo+IAo+IFNvbWV0aGluZyBJJ20gY3VyaW91cyBhYm91dC4gVGhpcyBpc24ndCB0aGUg Zmlyc3QgdGltZSBJJ3ZlIGhpdCBhCj4gc2l0dWF0aW9uIGxpa2UgdGhpcyAoc2VlOiB0aGUgaW1w cm9wZXIgZGlzYWJsZV9kZXB0aCBmaXggSSBhZGRlZCBpbnRvCj4gYW1kZ3B1IEkgbm93IG5lZWQg dG8gZ28gYW5kIGZpeCksIHdoaWNoIG1ha2VzIG1lIHdvbmRlcjogaXMgdGhlcmUKPiBhY3R1YWxs eSBhbnkgcmVhc29uIExpbnV4J3MgcnVudGltZSBQTSBjb3JlIGRvZXNuJ3QganVzdCB0dXJuIGdl dC9wdXRzKCkKPiBpbiB0aGUgY29udGV4dCBvZiBzL3IgY2FsbGJhY2tzIGludG8gbm8tb3BzIGJ5 IGRlZmF1bHQ/IAoKU28gdGhlIFBNIGNvcmUgY291bGQgc2F2ZSBhIHBvaW50ZXIgdG8gdGhlICJj dXJyZW50IiB0YXNrX3N0cnVjdAppbiBzdHJ1Y3QgZGV2aWNlIGJlZm9yZSBpbnZva2luZyB0aGUg LT5ydW50aW1lX3N1c3BlbmQgb3IKLT5ydW50aW1lX3Jlc3VtZSBjYWxsYmFjaywgYW5kIGFsbCBz dWJzZXF1ZW50IHJwbV9yZXN1bWUoKSBhbmQKcnBtX3N1c3BlbmQoKSBjYWxscyBjb3VsZCB0aGVu IGJlY29tZSBuby1vcHMgaWYgImN1cnJlbnQiIGlzCmVxdWl2YWxlbnQgdG8gdGhlIHNhdmVkIHBv aW50ZXIuICAoVGhpcyBpcyBhbHNvIGhvdyB5b3UgY291bGQKc29sdmUgdGhlIGRlYWRsb2NrIHlv dSdyZSBkZWFsaW5nIHdpdGggZm9yIHN5bmMgc3VzcGVuZC4pCgpGb3IgYSByZWN1cnNpdmUgcmVz dW1lIGR1cmluZyBhIHJlc3VtZSBvciBhIHJlY3Vyc2l2ZSBzdXNwZW5kCmR1cmluZyBhIHN1c3Bl bmQsIHRoaXMgbWlnaHQgYWN0dWFsbHkgYmUgZmluZS4KCkZvciBhIHJlY3Vyc2l2ZSBzdXNwZW5k IGR1cmluZyBhIHJlc3VtZSBvciBhIHJlY3Vyc2l2ZSByZXN1bWUKZHVyaW5nIGEgc3VzcGVuZCwg dGhpbmdzIGJlY29tZSBtdXJraWVyOiAgSG93IHNob3VsZCB0aGUgUE0gY29yZQprbm93IGlmIHRo ZSBwYXJ0aWN1bGFyIHBhcnQgb2YgdGhlIGRldmljZSBpcyBzdGlsbCBhY2Nlc3NpYmxlCndoZW4g aGl0dGluZyBhIHJlY3Vyc2l2ZSByZXN1bWUgZHVyaW5nIGEgc3VzcGVuZD8gIExldCdzIHNheSBh CmNsb2NrIGlzIG5lZWRlZCBmb3IgaTJjLiAgVGhlbiB0aGUgcmVjdXJzaXZlIHJlc3VtZSBkdXJp bmcgYQpzdXNwZW5kIG1heSBvbmx5IGJlY29tZSBhIG5vLW9wIGJlZm9yZSB0aGF0IGNsb2NrIGhh cyBiZWVuCnR1cm5lZCBvZmYuICBUaGF0J3Mgc29tZXRoaW5nIG9ubHkgdGhlIGRldmljZSBkcml2 ZXIgaXRzZWxmCmhhcyBrbm93bGVkZ2UgYWJvdXQsIGJlY2F1c2UgaXQgaW1wbGVtZW50cyB0aGUg b3JkZXIgaW4gd2hpY2gKc3ViZGV2aWNlcyBvZiB0aGUgR1BVIGFyZSB0dXJuZWQgb2ZmLgoKTHVr YXMKX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KTm91dmVh dSBtYWlsaW5nIGxpc3QKTm91dmVhdUBsaXN0cy5mcmVlZGVza3RvcC5vcmcKaHR0cHM6Ly9saXN0 cy5mcmVlZGVza3RvcC5vcmcvbWFpbG1hbi9saXN0aW5mby9ub3V2ZWF1Cg== From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.3 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F3828ECDFB8 for ; Wed, 18 Jul 2018 07:47:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B5CBD2075E for ; Wed, 18 Jul 2018 07:47:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B5CBD2075E Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=wunner.de Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728711AbeGRIXt (ORCPT ); Wed, 18 Jul 2018 04:23:49 -0400 Received: from bmailout2.hostsharing.net ([83.223.90.240]:60805 "EHLO bmailout2.hostsharing.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726110AbeGRIXt (ORCPT ); Wed, 18 Jul 2018 04:23:49 -0400 Received: from h08.hostsharing.net (h08.hostsharing.net [83.223.95.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.hostsharing.net", Issuer "COMODO RSA Domain Validation Secure Server CA" (not verified)) by bmailout2.hostsharing.net (Postfix) with ESMTPS id 5E2A82800A282; Wed, 18 Jul 2018 09:47:11 +0200 (CEST) Received: by h08.hostsharing.net (Postfix, from userid 100393) id EC41B6F3EF; Wed, 18 Jul 2018 09:47:10 +0200 (CEST) Date: Wed, 18 Jul 2018 09:47:10 +0200 From: Lukas Wunner To: Lyude Paul Cc: nouveau@lists.freedesktop.org, David Airlie , linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, Ben Skeggs , linux-pm@vger.kernel.org Subject: Re: [Nouveau] [PATCH 1/5] drm/nouveau: Prevent RPM callback recursion in suspend/resume paths Message-ID: <20180718074710.GA16072@wunner.de> References: <20180716235936.11268-1-lyude@redhat.com> <20180716235936.11268-2-lyude@redhat.com> <20180717071641.GA5411@wunner.de> <20180717182041.GA18363@wunner.de> <20180717183211.GB18363@wunner.de> <2dbe75b1a83c025b9cddc229dbca9af6fb30111e.camel@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2dbe75b1a83c025b9cddc229dbca9af6fb30111e.camel@redhat.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 17, 2018 at 02:34:47PM -0400, Lyude Paul wrote: > On Tue, 2018-07-17 at 20:32 +0200, Lukas Wunner wrote: > > On Tue, Jul 17, 2018 at 02:24:31PM -0400, Lyude Paul wrote: > > > On Tue, 2018-07-17 at 20:20 +0200, Lukas Wunner wrote: > > > > Okay, the PCI device is suspending and the nvkm_i2c_aux_acquire() > > > > wants it in resumed state, so is waiting forever for the device to > > > > runtime suspend in order to resume it again immediately afterwards. > > > > > > > > The deadlock in the stack trace you've posted could be resolved using > > > > the technique I used in d61a5c106351 by adding the following to > > > > include/linux/pm_runtime.h: > > > > > > > > static inline bool pm_runtime_status_suspending(struct device *dev) > > > > { > > > > return dev->power.runtime_status == RPM_SUSPENDING; > > > > } > > > > > > > > static inline bool is_pm_work(struct device *dev) > > > > { > > > > struct work_struct *work = current_work(); > > > > > > > > return work && work->func == dev->power.work; > > > > } > > > > > > > > Then adding this to nvkm_i2c_aux_acquire(): > > > > > > > > struct device *dev = pad->i2c->subdev.device->dev; > > > > > > > > if (!(is_pm_work(dev) && pm_runtime_status_suspending(dev))) { > > > > ret = pm_runtime_get_sync(dev); > > > > if (ret < 0 && ret != -EACCES) > > > > return ret; > > > > } > > > > > > > > But here's the catch: This only works for an *async* runtime suspend. > > > > It doesn't work for pm_runtime_put_sync(), pm_runtime_suspend() etc, > > > > because then the runtime suspend is executed in the context of the caller, > > > > not in the context of dev->power.work. [snip] > > Something I'm curious about. This isn't the first time I've hit a > situation like this (see: the improper disable_depth fix I added into > amdgpu I now need to go and fix), which makes me wonder: is there > actually any reason Linux's runtime PM core doesn't just turn get/puts() > in the context of s/r callbacks into no-ops by default? So the PM core could save a pointer to the "current" task_struct in struct device before invoking the ->runtime_suspend or ->runtime_resume callback, and all subsequent rpm_resume() and rpm_suspend() calls could then become no-ops if "current" is equivalent to the saved pointer. (This is also how you could solve the deadlock you're dealing with for sync suspend.) For a recursive resume during a resume or a recursive suspend during a suspend, this might actually be fine. For a recursive suspend during a resume or a recursive resume during a suspend, things become murkier: How should the PM core know if the particular part of the device is still accessible when hitting a recursive resume during a suspend? Let's say a clock is needed for i2c. Then the recursive resume during a suspend may only become a no-op before that clock has been turned off. That's something only the device driver itself has knowledge about, because it implements the order in which subdevices of the GPU are turned off. Lukas