From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EF4ED1AAE06; Thu, 29 Aug 2024 12:38:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724935091; cv=none; b=rW7YXXcj3K+/1ohVWFnzkW6As5FxSjxQypvTYujVm9An505H73w2UszsukMj3kiEZUv+bs9pjDzw9X0TgCkv4go/VFRqGzOF3hvZPqnm2Y7kIKTc/Nc8aTbajWwOZaxTWzHofmg2vXNtpiOaKc9rW+vPsEICEcB8zN2Guj1QHlI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724935091; c=relaxed/simple; bh=EP0ieTOayoDdZG/qTLxEmbaztKRTyEXZPutP4iINDFY=; h=Date:From:To:Cc:Subject:Message-ID:MIME-Version:Content-Type: Content-Disposition:In-Reply-To; b=p+CgF3xrFhxtlPH4tSF03RZlqBA3ulk6bLaJcedVscYQ9mFR4YEyGPrG2vpN63oETVg3uOnuMMVGuZm/del0hwO4mMoYACPcmOWvjO5UGJoXzCz1parfO9oe4janrF67gQWLpGq7X8NUdwauX9PvUJS5QbTGrAxsu9jB011qMRQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=KScVkLgI; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="KScVkLgI" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2827FC4CEC1; Thu, 29 Aug 2024 12:38:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1724935090; bh=EP0ieTOayoDdZG/qTLxEmbaztKRTyEXZPutP4iINDFY=; h=Date:From:To:Cc:Subject:In-Reply-To:From; b=KScVkLgIC5WkAeYuwO8xRsAXxv/njc1++dyAfFyW/z9yPHGGvSvhMopoj5CrBSop/ Y81u47nlLJj/Pf9i9F4Bu36svaIuox7+xAZy6ly0zKaJW196u3eM5OMWNLlgVntLdI JIzSdIYfAwpDb1dx+h/AnjIJtWYj3i3o48kpk+2Qw+xI6/DfqjcerqXf5fO7aTStmq UJomr7xISDd3WIFKqwcnyxGcQsWbpVfm/ayc1k2o4PhKIjEIDvsZhkxotRwly11Ho3 kUzAg5R0+q9soyvSKnnx3YT5X07ctT9981+DzS5uTO1q41k0X/BYs+0pPKsin99C8Y ajBoUzq6V6j4Q== Date: Thu, 29 Aug 2024 07:38:08 -0500 From: Bjorn Helgaas To: Manivannan Sadhasivam Cc: lpieralisi@kernel.org, kw@linux.com, bhelgaas@google.com, robh@kernel.org, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arm-msm@vger.kernel.org, Dmitry Baryshkov Subject: Re: [PATCH v2] PCI: qcom-ep: Enable controller resources like PHY only after refclk is available Message-ID: <20240829123808.GA56247@bhelgaas> Precedence: bulk X-Mailing-List: linux-pci@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240829053720.gmblrai2hkd73el3@thinkpad> On Thu, Aug 29, 2024 at 11:07:20AM +0530, Manivannan Sadhasivam wrote: > On Wed, Aug 28, 2024 at 03:59:45PM -0500, Bjorn Helgaas wrote: > > On Wed, Aug 28, 2024 at 07:31:08PM +0530, Manivannan Sadhasivam wrote: > > > qcom_pcie_enable_resources() is called by qcom_pcie_ep_probe() and it > > > enables the controller resources like clocks, regulator, PHY. On one of the > > > new unreleased Qcom SoC, PHY enablement depends on the active refclk. And > > > on all of the supported Qcom endpoint SoCs, refclk comes from the host > > > (RC). So calling qcom_pcie_enable_resources() without refclk causes the > > > whole SoC crash on the new SoC. > > > > > > qcom_pcie_enable_resources() is already called by > > > qcom_pcie_perst_deassert() when PERST# is deasserted, and refclk is > > > available at that time. > > > > > > Hence, remove the unnecessary call to qcom_pcie_enable_resources() from > > > qcom_pcie_ep_probe() to prevent the crash. > > > > > > Fixes: 869bc5253406 ("PCI: dwc: ep: Fix DBI access failure for drivers requiring refclk from host") > > > Tested-by: Dmitry Baryshkov > > > Signed-off-by: Manivannan Sadhasivam > > > --- > > > > > > Changes in v2: > > > > > > - Changed the patch description to mention the crash clearly as suggested by > > > Bjorn > > > > Clearly mentioning the crash as rationale for the change is *part* of > > what I was looking for. > > > > The rest, just as important, is information about what sort of crash > > this is, because I hope and suspect the crash is recoverable, and we > > *should* recover from it because PERST# may occur at arbitrary times, > > so trying to avoid it is never going to be reliable. > > I did mention 'whole SoC crash' which typically means unrecoverable > state as the SoC would crash (not just the driver). On Qcom SoCs, > this will also lead the SoC to boot into EDL (Emergency Download) > mode so that the users can collect dumps on the crash. IIUC we're talking about an access to a PHY register, and the access requires Refclk from the host. I assume the SoC accesses the register by doing an MMIO load. If nothing responds, I assume the SoC would take a machine check or similar because there's no data to complete the load instruction. So I assume again that the Linux on the SoC doesn't know how to recover from such a machine check? If that's the scenario, is the machine check unrecoverable in principle, or is it potentially recoverable but nobody has done the work to do it? My guess would be the latter, because the former would mean that it's impossible to build a robust endpoint around this SoC. But obviously this is all complete speculation on my part. Bjorn