From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 073D4ECAAA1 for ; Thu, 15 Sep 2022 10:48:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=qJN/RSSMvGexw44zsKJbuGX27j3a0A9srryUW64xK4M=; b=fC7c4RawP+6s4R eUYwvm/KlmcwFwohMy42kfruR9eND/e+PJQQPD29s3Dy7Ja+zVMfT6QQ7qQiGlJrdQIBNTfZJOLpY C9T1aV+7hw4qiGFa75ogQF8RKYJvDbi4qDyXg5WxyfVXMGzNMIWAft/GtvlZ7IahEQQHb+ZiWDURb 8QjInlUt2snILpVDidLFKVQ+ufB/tuiLQA/MUTxhYE6WEKcg1uepWJjJllhhklD8lwQZFyhsvCCIa LIrnJZRBqSQ7QslGefqJb+8WrXcLqUitsY9RG7EeUvkTZR6RKVotE096H56WM814bH1b8g7owhkon 9MBluhifWkAWGzv1PmsA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1oYmPI-00732X-QT; Thu, 15 Sep 2022 10:47:44 +0000 Received: from ams.source.kernel.org ([2604:1380:4601:e00::1]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1oYmPF-00731I-3V for linux-arm-kernel@lists.infradead.org; Thu, 15 Sep 2022 10:47:43 +0000 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id CAB0AB81F99; Thu, 15 Sep 2022 10:47:39 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 21E8AC433D6; Thu, 15 Sep 2022 10:47:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1663238858; bh=yItK4LRfpsWyejA+YwYzurqI06YhwZBLtq9+clMCb4I=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=MBjp442pIA8cgzYsOp9i3Vhq1mcK52PpFzOyG6L8hbRnwZWJPbteZknwvHk/j95dQ 1gSuz1Dj8HQ+ov3LaUhfwIeI2ZCzL4Yt7Ya/VwtdPUKpAq6vaLBREc45trpe35I4No 5IkV6SmvKKyX885hwqnk7UnORGK/eODQ+QOhfux4= Date: Thu, 15 Sep 2022 12:48:04 +0200 From: Greg KH To: Olof Johansson Cc: Saravana Kannan , Linus Torvalds , Andrew Morton , linux-kernel@vger.kernel.org, Stephen Rothwell , Linux ARM Mailing List , Shawn Guo , Li Yang Subject: Re: [GIT PULL] Driver core changes for 6.0-rc1 Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220915_034741_447928_3F082D8D X-CRM114-Status: GOOD ( 58.32 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Wed, Sep 14, 2022 at 08:56:04PM -0700, Olof Johansson wrote: > On Wed, Sep 14, 2022 at 10:36 AM Saravana Kannan wrote: > > > > On Wed, Sep 14, 2022 at 9:24 AM Olof Johansson wrote: > > > > > > Hi, > > > > > > On Wed, Sep 14, 2022 at 7:00 AM Greg KH wrote: > > > > > > > > On Tue, Sep 13, 2022 at 09:28:27AM -0700, Olof Johansson wrote: > > > > > On Tue, Sep 13, 2022 at 8:15 AM Greg KH wrote: > > > > > > > > > > > > On Mon, Sep 12, 2022 at 10:24:43AM -0700, Olof Johansson wrote: > > > > > > > On Mon, Sep 12, 2022 at 10:23 AM Olof Johansson wrote: > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > On Wed, Aug 3, 2022 at 7:16 AM Greg KH wrote: > > > > > > > > > > > > > > > > > Saravana Kannan (11): > > > > > > > > > PM: domains: Delete usage of driver_deferred_probe_check_state() > > > > > > > > > pinctrl: devicetree: Delete usage of driver_deferred_probe_check_state() > > > > > > > > > net: mdio: Delete usage of driver_deferred_probe_check_state() > > > > > > > > > driver core: Add wait_for_init_devices_probe helper function > > > > > > > > > net: ipconfig: Relax fw_devlink if we need to mount a network rootfs > > > > > > > > > Revert "driver core: Set default deferred_probe_timeout back to 0." > > > > > > > > > driver core: Set fw_devlink.strict=1 by default > > > > > > > > > iommu/of: Delete usage of driver_deferred_probe_check_state() > > > > > > > > > driver core: Delete driver_deferred_probe_check_state() > > > > > > > > > driver core: fw_devlink: Allow firmware to mark devices as best effort > > > > > > > > > of: base: Avoid console probe delay when fw_devlink.strict=1 > > > > > > > > > > > > > > > > The last patch in this list regresses my HoneyComb LX2K (ironically > > > > > > > > the machine I do maintainer work on). It stops PCIe from probing, but > > > > > > > > without a single message indicating why. > > > > > > > > > > > > > > > > The reason seems to be that the iommu-maps property doesn't get > > > > > > > > patched up by my (older) u-boot, and thus isn't a valid reference. > > > > > > > > System works fine without IOMMU, which is how I've ran it for a couple > > > > > > > > of years. > > > > > > > > > > > > > > > > It's also extremely hard to diagnose out of the box because there are > > > > > > > > *no error messages*. And there were no warnings leading up to this > > > > > > > > strict enforcement. > > > > > > > > > > > > > > > > This "feature" seems to have been done backwards. The checks should > > > > > > > > have been running (and not skipped due to the "optional" flag), but > > > > > > > > also not causing errors, just warnings. That would have given users a > > > > > > > > chance to know that this is something that needs to be fixed. > > > > > > > > > > > > > > > > And when you flip the switch, at least report what failed so that > > > > > > > > people don't need to spend a whole night bisecting kernels, please. > > > > > > > > > > > > > > > > Greg, mind reverting just the last one? If I hit this, I presume > > > > > > > > others would too. > > > > > > > > > > > > > > Apologies, wrong patch pointed out. The culprit is "driver core: Set > > > > > > > fw_devlink.strict=1 by default", 71066545b48e42. > > > > > > > > > > > > Is this still an issue in -rc5? A number of patches in the above series > > > > > > was just reverted and hopefully should have resolved the issue you are > > > > > > seeing. > > > > > > > > > > Unfortunately, I discovered this regression with -rc5 in the first > > > > > place, so it's still there. > > > > > > > > Ick, ok, Saravana, any thoughts? I know you're at the conference this > > > > week with me, maybe you can give Olof a hint as to what to look for > > > > here? > > > > > > I'm not sure what you want me to look for. The patch turns on > > > enforcement of DT contents that never used to be enforced, so now my > > > computer no longer boots. And it does it in a way that makes it > > > impossible for someone not rebuilding kernels to debug to figure out > > > what happened. > > > > Hi Olof, > > > > Sorry for the trouble. It doesn't print any error messages because > > there are cases where it's block the probe where it wouldn't be an > > error. If I printed it every time fw_devlink blocked a probe, it'd be > > a ton of messages. > > > > Btw, when I enabled fw_devlink.strict=1, it was AFTER making changes > > that'll stop indefinitely blocking probes. So what you are seeing > > shouldn't be happening. After about 10 seconds (configurable), it > > should stop blocking the probes. > > "Shouldn't be happening" is a pretty bold statement. It's not actually > stuck on timeout in my case, and doesn't recover. > > Instead, what seems to be happening is that the PCIe driver, which > registers as a platform_driver here: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/mobiveil/pcie-layerscape-gen4.c#n255 > > ends up registering, and the driver core now refuses to try to probe > the device matches, since they no longer have their suppliers > fulfilled (the smmu suppliers would not be tracked since they are > optional here: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/of/property.c#n1449 > > So what happens is that the driver registration succeeds, but there > have been no devices matched to it. So when it returns to the platform > core, it thinks there are no devices bound to this driver, so it > should be unregistered: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/base/platform.c#n951 > > That explains why the pcie core doesn't retry and just disappears, and > stops retrying. > > This is what it looks like with CONFIG_DEBUG_DRIVER and CONFIG_DEBUG_DEVRES: > [ 5.178538] bus: 'platform': add driver layerscape-pcie-gen4 > [ 5.184301] bus: 'platform': __driver_probe_device: matched device > 3600000.pcie with driver layerscape-pcie-gen4 > [ 5.194498] platform 3600000.pcie: error -EPROBE_DEFER: supplier > 5000000.iommu not ready > [ 5.202607] platform 3600000.pcie: Added to deferred list > [ 5.208024] bus: 'platform': __driver_probe_device: matched device > 3800000.pcie with driver layerscape-pcie-gen4 > [ 5.218227] platform 3800000.pcie: error -EPROBE_DEFER: supplier > 5000000.iommu not ready > [ 5.226333] platform 3800000.pcie: Added to deferred list > [ 5.231814] bus: 'platform': remove driver layerscape-pcie-gen4 > [ 5.237761] driver: 'layerscape-pcie-gen4': driver_release > > Note that the platform driver registration sets flags to disable async > probing, supposedly so it can assume that any matching devices would > be found by the time registration returns: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/base/platform.c#n917 > : > > /* > * We have to run our probes synchronously because we check if > * we find any devices to bind to and exit with error if there > * are any. > */ > drv->driver.probe_type = PROBE_FORCE_SYNCHRONOUS; > > /* > * Prevent driver from requesting probe deferral to avoid further > * futile probe attempts. > */ > drv->prevent_deferred_probe = true; > > > > > Bottom line: How was this code tested? This seems far from mature, > this doesn't seem like that of an obscure condition to occur and it > could create minefields for others down the road if it's fragile. I've reverted it for now, let's get this worked out for later releases. thanks, greg k-h _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel