From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.6 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_PASS,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7DA5C10F03 for ; Tue, 23 Apr 2019 20:24:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6C172218B0 for ; Tue, 23 Apr 2019 20:24:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1556051047; bh=Q/HUG19XUhmRW8zDHq1HEpSxjHz08PXUPZgh1Ew3nx4=; h=Date:From:To:Cc:Subject:References:In-Reply-To:List-ID:From; b=dXWKdYt0WCCeSmLeHr/CiZXGXKpoRv/8e51RfhY+Qs/OOddaACl5c/ac+4lE2I+ch f0D+jsUFlLY0gkz5ENCQy0POaXrcpvgj16IwVwRzJqfdTO2BS/nrrhyeABQTBULD6A /U+vANvmHEjL2BMM81HduVilgZTR6P+8EDkuarYA= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727119AbfDWUYG (ORCPT ); Tue, 23 Apr 2019 16:24:06 -0400 Received: from mail.kernel.org ([198.145.29.99]:43476 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726029AbfDWUYG (ORCPT ); Tue, 23 Apr 2019 16:24:06 -0400 Received: from localhost (unknown [69.71.4.100]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id A4984208E4; Tue, 23 Apr 2019 20:24:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1556051044; bh=Q/HUG19XUhmRW8zDHq1HEpSxjHz08PXUPZgh1Ew3nx4=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=ffkJw0SdYj4PZhOrlzW+8r5zWMfvUmzoZyRrWs//WkvmTA9mTLdFZBJKwQcvBEVv/ YTKfqNUcwkPAF6Lm+7432vc9ZOyf4UpPLKaoIjAcfjzcVYCp3uYPYweuOV5fMqq/Jx Hk44XYCKpSvzGMRCsiqToLUIw6Mi1wfHbXChcSvI= Date: Tue, 23 Apr 2019 15:24:03 -0500 From: Bjorn Helgaas To: Manikanta Maddireddy Cc: thierry.reding@gmail.com, robh+dt@kernel.org, mark.rutland@arm.com, jonathanh@nvidia.com, lorenzo.pieralisi@arm.com, vidyas@nvidia.com, linux-tegra@vger.kernel.org, linux-pci@vger.kernel.org, devicetree@vger.kernel.org, Jingoo Han , Gustavo Pimentel , Ley Foon Tan , Michal Simek Subject: Re: [PATCH 22/30] PCI: tegra: Access endpoint config only if PCIe link is up Message-ID: <20190423202403.GA61751@google.com> References: <20190411170355.6882-1-mmaddireddy@nvidia.com> <20190411170355.6882-23-mmaddireddy@nvidia.com> <20190411201535.GS256045@google.com> <20190412145003.GE141472@google.com> <1039fbf2-24ad-c31c-93d9-663aab74a26a@nvidia.com> <20190415140430.GK126710@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org Sorry, somehow I forgot to respond to this. On Mon, Apr 15, 2019 at 09:13:29PM +0530, Manikanta Maddireddy wrote: > On 15-Apr-19 7:34 PM, Bjorn Helgaas wrote: > > On Mon, Apr 15, 2019 at 05:06:10PM +0530, Manikanta Maddireddy wrote: > >> On 12-Apr-19 8:20 PM, Bjorn Helgaas wrote: > >>> On Fri, Apr 12, 2019 at 12:30:22PM +0530, Manikanta Maddireddy wrote: > >>>> On 12-Apr-19 1:45 AM, Bjorn Helgaas wrote: > >>>>> On Thu, Apr 11, 2019 at 10:33:47PM +0530, Manikanta Maddireddy wrote: > >>>>>> Add PCIe link up check in config read and write callback functions > >>>>>> before accessing endpoint config registers. > >>>>>> static int tegra_pcie_config_read(struct pci_bus *bus, unsigned int devfn, > >>>>>> int where, int size, u32 *value) > >>>>>> { > >>>>>> + struct tegra_pcie *pcie = bus->sysdata; > >>>>>> + struct pci_dev *bridge; > >>>>>> + struct tegra_pcie_port *port; > >>>>>> + > >>>>>> if (bus->number == 0) > >>>>>> return pci_generic_config_read32(bus, devfn, where, size, > >>>>>> value); > >>>>>> > >>>>>> + bridge = pcie_find_root_port(bus->self); > >>>>>> + > >>>>>> + list_for_each_entry(port, &pcie->ports, list) > >>>>>> + if (port->index + 1 == PCI_SLOT(bridge->devfn)) > >>>>>> + break; > >>>>>> + > >>>>>> + /* If there is no link, then there is no device */ > >>>>>> + if (!tegra_pcie_link_status(port)) { > >>>>> This is racy and you should avoid it if possible. The link could go down > >>>>> between calling tegra_pcie_link_status() and issuing the config read/write. > >>>>> > >>>>> If your driver is to be reliable, it must be able to handle any bad > >>>>> consequence of issuing that config read/write anyway, so I think it's > >>>>> better if it doesn't even bother checking whether the link is up. > >>>> This change is made based on similar check present in dwc driver > >>>> dw_pcie_valid_device(), reasons for making this change in Tegra might > >>>> differ dwc. > >>> Yes, you won't be surprised to learn that I don't like the similar > >>> checks in dwc, altera, xilinx, and xilinx-nwl either :) I raise this > >>> issue every time I see it, but I can't remember if I've mentioned dwc > >>> specifically. > >>> > >>> We need to either eradicate this pattern of checking for link up, or > >>> include a comment about why it is absolutely necessary. > >> This patch is created to address below scenario in our downstream kernel, > >> 1) Our platform has WiFi on one slot and GPU in another. > >> 2) During WiFi OFF, link is put in L2 and it goes through hot reset > >> when turning ON WiFi (since Tegra doesn't support hot-plug). > >> 3) Whenever x11 server is started it scans the PCIe bus for video devices. > >> Here PCIe configuration registers of all devices are read to find out > >> all available video devices. > >> 4) If "x11 server" started with WiFi OFF, then we are seeing "response > >> decoding error"(Tegra AFI module specific error). > > Probably happens with lspci too. I guess this is when you try to read > > the WiFi device config space? > Yes, happens with lspci when trying to read WiFi device config space. > >> Best solution we came up with is to have link up check in config access > >> callback functions. > > Can you check for "response decoding error" in the config accessor and > > return 0xffffffff if you see it? > "Response decoding error" is informed by an > interrupt(tegra_pcie_isr()), we have to add polling logic in config > accessor to check if config access caused "response decoding error" > or not. This will increase config access time. Config access is never in a performance path, so the access time doesn't matter. > Also sometimes BAR access can also cause "Response decoding error", so > matching "Response decoding error" to a particular config access is > going to be difficult. I'm surprised your hardware can't distinguish a failed config access from an unclaimed MMIO access. > >>>> Intention here is to reduce the number of AER errors when device is > >>>> falling off the bus or going through hot reset. So racy condition here is > >>>> OK > >>> I'm not convinced about this. The issues you mention need to be > >>> solved in a generic way, not a tegra-specific way. > >>> > >>> We don't want to end up with code that silently avoids the config > >>> access 99.99% of the time, but once in a blue moon, we lose the race > >>> (the device stops responding after we've determined the link is up) > >>> and the access causes a mysterious AER error that we have no way to > >>> debug. > >>> > >>>>>> + *value = 0xffffffff; > >>>>>> + return PCIBIOS_DEVICE_NOT_FOUND; > >>>>>> + } > >>>>>> + > >>>>>> return pci_generic_config_read(bus, devfn, where, size, value); > >>>>>> }