From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com ([209.132.183.28]:54609 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752133Ab2GPS6K (ORCPT ); Mon, 16 Jul 2012 14:58:10 -0400 Message-ID: <50046436.5000501@redhat.com> Date: Mon, 16 Jul 2012 14:57:58 -0400 From: Don Dutile MIME-Version: 1.0 To: Bjorn Helgaas CC: Jiang Liu , Jiang Liu , Yinghai Lu , Taku Izumi , "Rafael J . Wysocki" , Kenji Kaneshige , Yijing Wang , Keping Chen , linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org Subject: Re: [RFC PATCH 05/14] PCI: add access functions for PCIe capabilities to hide PCIe spec differences References: <1341935655-5381-1-git-send-email-jiang.liu@huawei.com> <1341935655-5381-6-git-send-email-jiang.liu@huawei.com> <4FFCEDDE.2080907@huawei.com> <4FFD1FE7.6010504@huawei.com> <4FFE3CEC.80804@huawei.com> <5002F420.40801@gmail.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Sender: linux-pci-owner@vger.kernel.org List-ID: On 07/16/2012 01:29 PM, Bjorn Helgaas wrote: > On Sun, Jul 15, 2012 at 10:47 AM, Jiang Liu wrote: >> On 07/13/2012 04:49 AM, Bjorn Helgaas wrote: >>>> Hi Bjorn, >>>> It's a little risk to change these PCIe capabilities access >>>> functions as void. On some platform with hardware error detecting/correcting >>>> capabilities, such as EEH on Power, it would be better to return >>>> error code if hardware error happens during accessing configuration registers. >>>> As I know, coming Intel Xeon processor may provide PCIe hardware >>>> error detecting capability similar to EEH on power. >>> >>> I guess I'm playing devil's advocate here. As a general rule, people >>> don't check the return value of pci_read_config_*() or >>> pci_write_config_*(). Unless you change them all, most callers of >>> pci_pcie_capability_read_*() and _write_*() won't check the returns >>> either. So I'm not sure return values are an effective way to detect >>> those hardware errors. >>> >>> How do these EEH errors get detected or reported today? Do the >>> drivers check every config access for success? Adding those checks >>> and figuring out how to handle errors at every possible point doesn't >>> seem like a recipe for success. >> >> Hi Bjorn, >> Sorry for later reply, on travel these days. >> Yeah, it's true that most driver doesn't check return values of configuration >> access functions, but there are still some drivers which do check return value of >> pci_read_config_xxx(). For example, pciehp driver checks return value of CFG access >> functions. >> >> It's not realistic to enhance all drivers, but we may focus on a small set of >> drivers for hardwares on specific high-end servers. For RAS features, we can never provide >> perfect solutions, so we prefer some improvements. After all a small improvement is still >> an improvement:) >> >> I'm only familiar with PCI on IA64 and x86. For PowerPC, I just know that the OS >> may query firmware whether there's some hardware faults if pci_cfg_read_xxx() returns >> all 1s. For PCI on IA64, SAL may handle PCI hardware errors and return error code to >> pci_cfg_read_xxx(). For x86, I think it will have some mechanisms to report hardware faults >> like SAL on IA64. >> >> So how about keeping consistence with pci_cfg_read_xxx() and pci_user_cfg_read_xxx()? > > My goal is "the caller should never have to know whether this is a v1 > or v2 capability." Returning any error other than one passed along > from pci_read/write_config_xxx() means we miss that goal. Perhaps the > goal is unattainable, but I haven't been convinced yet. > > I think hardware error detection is irrelevant to this discussion. > After reading Documentation/PCI/pci-error-recovery.txt, I'm even less > convinced that checking return values from pci_read/write_config_xxx() > or pci_pcie_capability_read/write_xxx() is a useful way to detect > hardware errors. > > Having drivers detect hardware failures by checking for config access > errors is neither necessary nor sufficient. It's not necessary > because a platform can implement a config accessor that checks *every* > access and reports failures to the driver via the pci_error_handler > framework. It's not sufficient because config accesses are rare > (usually only at init-time), and hardware failures may happen at > arbitrary other times. > > In my opinion, the only relevant question is whether a caller of > pci_pcie_capability_read/write_xxx() needs to know whether a register > is implemented (i.e., we have a v2 capability) or not. For reads, I > don't think there's a case where fabricating a value of zero when > reading an unimplemented register is a problem. > > Writes are obviously more interesting, but I'm still not sure there's > a case where silently dropping a write to an unimplemented register is > a problem. The "capability" registers are read-only, so there's no > problem if we drop writes to them. The "status" registers are > generally RO or RW1C, where it's only meaningful to write a non-zero > value if you're previously *read* a non-zero value. The "control" > registers are often RW, of course, but generally it's only meaningful > to write a non-zero value when a non-zero bit in the "capability" > register has previously told you that something is supported. > > Bjorn +1 Returning 0 on capability reads -- due to unimplemented features/register or due to failures, should translate into the (core) code doing no writes. Thus, the reason I suggested returning 0 on failure in original posting.