Linux PCI subsystem development

Linux PCI subsystem development
 help / color / mirror / Atom feed

* Re: [PATCH 5/9] unicore32/PCI: Remove CONFIG_HOTPLUG ifdefs
From: guanxuetao @ 2012-11-29  2:02 UTC (permalink / raw)
  To: Bill Pemberton; +Cc: gregkh, linux-pci, Guan Xuetao, Bjorn Helgaas
In-Reply-To: <1353530100-728-6-git-send-email-wfp5p@virginia.edu>

> Remove conditional code based on CONFIG_HOTPLUG being false.  It's
> always on now in preparation of it going away as an option.
>
> Signed-off-by: Bill Pemberton <wfp5p@virginia.edu>
> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
> Cc: Bjorn Helgaas <bhelgaas@google.com>

Acked-by: Guan Xuetao <gxt@mprc.pku.edu.cn>

> ---
>  arch/unicore32/kernel/pci.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/arch/unicore32/kernel/pci.c b/arch/unicore32/kernel/pci.c
> index b0056f6..7c43592 100644
> --- a/arch/unicore32/kernel/pci.c
> +++ b/arch/unicore32/kernel/pci.c
> @@ -250,9 +250,7 @@ void __devinit pcibios_fixup_bus(struct pci_bus *bus)
>  	printk(KERN_INFO "PCI: bus%d: Fast back to back transfers %sabled\n",
>  		bus->number, (features & PCI_COMMAND_FAST_BACK) ? "en" : "dis");
>  }
> -#ifdef CONFIG_HOTPLUG
>  EXPORT_SYMBOL(pcibios_fixup_bus);
> -#endif
>
>  static int __init pci_common_init(void)
>  {
> --
> 1.8.0
>


^ permalink raw reply

* Re: pci_enable_msix() failure
From: Bjorn Helgaas @ 2012-11-29  0:18 UTC (permalink / raw)
  To: Alex Lyakas; +Cc: linux-pci
In-Reply-To: <0B81514A792D4D9FAB292ED315D48BE2@alyakaslap>

On Tue, Nov 20, 2012 at 11:04 AM, Alex Lyakas <alex@zadarastorage.com> wrote:
> Hello list, I was advised to post this question by Jesse Barnes; I also
> posted to KVM list.

When you post a question to more than one list, you should always send
a *single* message with all the lists being copied.  That way
everybody can tell what progress is being made, and we can avoid
duplicating effort.

I see via a Google search that you've gotten responses on the KVM
list, e.g., http://www.spinics.net/lists/kvm/msg82997.html, so I
assume you don't need any more help from linux-pci.

If that's not the case, please add linux-pci to the CC: list of the
thread where you're working on this.

> I am running Ubuntu-Precise 3.2.0-29-generic #46, with stock KVM ("QEMU
> emulator version 1.0 (qemu-kvm-1.0)") on a Dell R510 server. I have one
> dual-port Intel's NIC 82599, of which I spawn 32 VFs from each port. I spawn
> virtual machines with KVM, each VM has 4 VFs attached (two from each PF).
>
> Once in a while, in particular when I spawn multiple VMs in parallel, I hit
> an issue that one of the VFs does not have an IRQ assigned to it. I am
> checking this in /proc/interrupts, looking for entries like
> "kvm:0000:03:14.6". In some cases, an entry is missing for a particular VF.
> As a result, the VF within the VM is non-functional.
>
> I debugged this issue further, by adding prints to kvm.ko code. I see that
> the failure happens in kvm_vm_ioctl_assigned_device/KVM_ASSIGN_DEV_IRQ path,
> which calls assigned_device_enable_host_msix() function, which calls
> pci_enable_msix(), which fails with EINVAL or with ENOMEM. This path is
> called twice for each VF.
>
> I see that first pci_enable_msix() returns -12/-22, and when
> kvm_vm_ioctl_set_msix_nr() is called again, it sees that adev->entries_nr !=
> 0 and fails the call with EINVAL. I can repro it only when spawning like 8
> or 10 VMs in parallel, but it doesn't happen every time. So it seems like
> this is not a resource shortage problem, but some race somewhere.
>
> I tested this with several version of ixgbe drivers, including the in-tree
> version that comes with Precise. It reproduces with all the versions.
>
> Can anybody pls advise on how to debug this issue further?
>
> Thanks,
> Alex.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] PCI,sriov: add documentation on sysfs-based VF control
From: Don Dutile @ 2012-11-29  0:09 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci
In-Reply-To: <CAErSpo4mso26k5pwpBs9ujrX7dZtvd-fxYnU+m712xycwngCvg@mail.gmail.com>

On 11/28/2012 03:18 PM, Bjorn Helgaas wrote:
> On Tue, Nov 27, 2012 at 8:31 PM, Donald Dutile<ddutile@redhat.com>  wrote:
>>   Signed-off: Donald Dutile<ddutile@redhat.com>
>>
>
> Thanks, Don.  I added this to my -next branch, for merging in the v3.8
> merge window.
>
> I removed the trailing underscore from sriov_numvfs_.
>
>
Thanks for pulling it into your 3.8-next branch and the correction!
My apologies for the delay in getting it out to the list
after posting the related code.
- Don

>>   Documentation/ABI/testing/sysfs-bus-pci | 34 +++++++++++++++++++++++
>>   Documentation/PCI/pci-iov-howto.txt     | 48 ++++++++++++++++++++++++++++++---
>>   2 files changed, 78 insertions(+), 4 deletions(-)
>>
>> diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci
>> index dff1f48..1cb389d 100644
>> --- a/Documentation/ABI/testing/sysfs-bus-pci
>> +++ b/Documentation/ABI/testing/sysfs-bus-pci
>> @@ -222,3 +222,37 @@ Description:
>>                  satisfied too.  Reading this attribute will show the current
>>                  value of d3cold_allowed bit.  Writing this attribute will set
>>                  the value of d3cold_allowed bit.
>> +
>> +What:          /sys/bus/pci/devices/.../sriov_totalvfs
>> +Date:          November 2012
>> +Contact:       Donald Dutile<ddutile@redhat.com>
>> +Description:
>> +               This file appears when a physical PCIe device supports SR-IOV.
>> +               Userspace applications can read this file to determine the
>> +               maximum number of Virtual Functions (VFs) a PCIe physical
>> +               function (PF) can support. Typically, this is the value reported
>> +               in the PF's SR-IOV extended capability structure's TotalVFs
>> +               element.  Drivers have the ability at probe time to reduce the
>> +               value read from this file via the pci_sriov_set_totalvfs()
>> +               function.
>> +
>> +What:          /sys/bus/pci/devices/.../sriov_numvfs_
>> +Date:          November 2012
>> +Contact:       Donald Dutile<ddutile@redhat.com>
>> +Description:
>> +               This file appears when a physical PCIe device supports SR-IOV.
>> +               Userspace applications can read and write to this file to
>> +               determine and control the enablement or disablement of Virtual
>> +               Functions (VFs) on the physical function (PF). A read of this
>> +               file will return the number of VFs that are enabled on this PF.
>> +               A number written to this file will enable the specified
>> +               number of VFs. A userspace application would typically read the
>> +               file and check that the value is zero, and then write the number
>> +               of VFs that should be enabled on the PF; the value written
>> +               should be less than or equal to the value in the sriov_totalvfs
>> +               file. A userspace application wanting to disable the VFs would
>> +               write a zero to this file. The core ensures that valid values
>> +               are written to this file, and returns errors when values are not
>> +               valid.  For example, writing a 2 to this file when sriov_numvfs
>> +               is not 0 and not 2 already will return an error. Writing a 10
>> +               when the value of sriov_totalvfs is 8 will return an error.
>> diff --git a/Documentation/PCI/pci-iov-howto.txt b/Documentation/PCI/pci-iov-howto.txt
>> index fc73ef5..c41cf95 100644
>> --- a/Documentation/PCI/pci-iov-howto.txt
>> +++ b/Documentation/PCI/pci-iov-howto.txt
>> @@ -2,6 +2,9 @@
>>                  Copyright (C) 2009 Intel Corporation
>>                      Yu Zhao<yu.zhao@intel.com>
>>
>> +               Update: November 2012
>> +                       -- sysfs-based SRIOV enable-/disable-ment
>> +               Donald Dutile<ddutile@redhat.com>
>>
>>   1. Overview
>>
>> @@ -24,10 +27,21 @@ real existing PCI device.
>>
>>   2.1 How can I enable SR-IOV capability
>>
>> -The device driver (PF driver) will control the enabling and disabling
>> -of the capability via API provided by SR-IOV core. If the hardware
>> -has SR-IOV capability, loading its PF driver would enable it and all
>> -VFs associated with the PF.
>> +Multiple methods are available for SR-IOV enablement.
>> +In the first method, the device driver (PF driver) will control the
>> +enabling and disabling of the capability via API provided by SR-IOV core.
>> +If the hardware has SR-IOV capability, loading its PF driver would
>> +enable it and all VFs associated with the PF.  Some PF drivers require
>> +a module parameter to be set to determine the number of VFs to enable.
>> +In the second method, a write to the sysfs file sriov_numvfs will
>> +enable and disable the VFs associated with a PCIe PF.  This method
>> +enables per-PF, VF enable/disable values versus the first method,
>> +which applies to all PFs of the same device.  Additionally, the
>> +PCI SRIOV core support ensures that enable/disable operations are
>> +valid to reduce duplication in multiple drivers for the same
>> +checks, e.g., check numvfs == 0 if enabling VFs, ensure
>> +numvfs<= totalvfs.
>> +The second method is the recommended method for new/future VF devices.
>>
>>   2.2 How can I use the Virtual Functions
>>
>> @@ -40,13 +54,22 @@ requires device driver that is same as a normal PCI device's.
>>   3.1 SR-IOV API
>>
>>   To enable SR-IOV capability:
>> +(a) For the first method, in the driver:
>>          int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
>>          'nr_virtfn' is number of VFs to be enabled.
>> +(b) For the second method, from sysfs:
>> +       echo 'nr_virtfn'>  \
>> +        /sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_numvfs
>>
>>   To disable SR-IOV capability:
>> +(a) For the first method, in the driver:
>>          void pci_disable_sriov(struct pci_dev *dev);
>> +(b) For the second method, from sysfs:
>> +       echo  0>  \
>> +        /sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_numvfs
>>
>>   To notify SR-IOV core of Virtual Function Migration:
>> +(a) In the driver:
>>          irqreturn_t pci_sriov_migration(struct pci_dev *dev);
>>
>>   3.2 Usage example
>> @@ -88,6 +111,22 @@ static void dev_shutdown(struct pci_dev *dev)
>>          ...
>>   }
>>
>> +static int dev_sriov_configure(struct pci_dev *dev, int numvfs)
>> +{
>> +       if (numvfs>  0) {
>> +               ...
>> +               pci_enable_sriov(dev, numvfs);
>> +               ...
>> +               return numvfs;
>> +       }
>> +       if (numvfs == 0) {
>> +               ....
>> +               pci_disable_sriov(dev);
>> +               ...
>> +               return 0;
>> +       }
>> +}
>> +
>>   static struct pci_driver dev_driver = {
>>          .name =         "SR-IOV Physical Function driver",
>>          .id_table =     dev_id_table,
>> @@ -96,4 +135,5 @@ static struct pci_driver dev_driver = {
>>          .suspend =      dev_suspend,
>>          .resume =       dev_resume,
>>          .shutdown =     dev_shutdown,
>> +       .sriov_configure = dev_sriov_configure,
>>   };
>> --
>> 1.7.10.2.552.gaa3bb87
>>


^ permalink raw reply

* Re: [PATCH v2 RESEND] Add NumaChip remote PCI support
From: Bjorn Helgaas @ 2012-11-28 23:08 UTC (permalink / raw)
  To: Daniel J Blueman; +Cc: linux-pci, linux-kernel, Steffen Persvold
In-Reply-To: <1353487196-10204-1-git-send-email-daniel@numascale-asia.com>

On Wed, Nov 21, 2012 at 1:39 AM, Daniel J Blueman
<daniel@numascale-asia.com> wrote:
> Add NumaChip-specific PCI access mechanism via MMCONFIG cycles, but
> preventing access to AMD Northbridges which shouldn't respond.
>
> v2: Use PCI_DEVFN in precomputed constant limit; drop unneeded includes
>
> Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com>
> ---
>  arch/x86/include/asm/numachip/numachip.h |   20 +++++
>  arch/x86/kernel/apic/apic_numachip.c     |    2 +
>  arch/x86/pci/Makefile                    |    1 +
>  arch/x86/pci/numachip.c                  |  134 ++++++++++++++++++++++++++++++
>  4 files changed, 157 insertions(+)
>  create mode 100644 arch/x86/include/asm/numachip/numachip.h
>  create mode 100644 arch/x86/pci/numachip.c
>
> diff --git a/arch/x86/include/asm/numachip/numachip.h b/arch/x86/include/asm/numachip/numachip.h
> new file mode 100644
> index 0000000..d35e71a
> --- /dev/null
> +++ b/arch/x86/include/asm/numachip/numachip.h
> @@ -0,0 +1,20 @@
> +/*
> + * This file is subject to the terms and conditions of the GNU General Public
> + * License.  See the file "COPYING" in the main directory of this archive
> + * for more details.
> + *
> + * Numascale NumaConnect-specific header file
> + *
> + * Copyright (C) 2012 Numascale AS. All rights reserved.
> + *
> + * Send feedback to <support@numascale.com>
> + *
> + */
> +
> +#ifndef _ASM_X86_NUMACHIP_NUMACHIP_H
> +#define _ASM_X86_NUMACHIP_NUMACHIP_H
> +
> +extern int __init pci_numachip_init(void);
> +
> +#endif /* _ASM_X86_NUMACHIP_NUMACHIP_H */
> +
> diff --git a/arch/x86/kernel/apic/apic_numachip.c b/arch/x86/kernel/apic/apic_numachip.c
> index a65829a..9c2aa89 100644
> --- a/arch/x86/kernel/apic/apic_numachip.c
> +++ b/arch/x86/kernel/apic/apic_numachip.c
> @@ -22,6 +22,7 @@
>  #include <linux/hardirq.h>
>  #include <linux/delay.h>
>
> +#include <asm/numachip/numachip.h>
>  #include <asm/numachip/numachip_csr.h>
>  #include <asm/smp.h>
>  #include <asm/apic.h>
> @@ -179,6 +180,7 @@ static int __init numachip_system_init(void)
>                 return 0;
>
>         x86_cpuinit.fixup_cpu_id = fixup_cpu_id;
> +       x86_init.pci.arch_init = pci_numachip_init;
>
>         map_csrs();
>
> diff --git a/arch/x86/pci/Makefile b/arch/x86/pci/Makefile
> index 3af5a1e..ee0af58 100644
> --- a/arch/x86/pci/Makefile
> +++ b/arch/x86/pci/Makefile
> @@ -16,6 +16,7 @@ obj-$(CONFIG_STA2X11)           += sta2x11-fixup.o
>  obj-$(CONFIG_X86_VISWS)                += visws.o
>
>  obj-$(CONFIG_X86_NUMAQ)                += numaq_32.o
> +obj-$(CONFIG_X86_NUMACHIP)     += numachip.o

It looks like this depends on CONFIG_PCI_MMCONFIG for
pci_mmconfig_lookup().  Are there config constraints that force
CONFIG_PCI_MMCONFIG=y when CONFIG_X86_NUMACHIP=y?

>  obj-$(CONFIG_X86_INTEL_MID)    += mrst.o
>
> diff --git a/arch/x86/pci/numachip.c b/arch/x86/pci/numachip.c
> new file mode 100644
> index 0000000..3773e05
> --- /dev/null
> +++ b/arch/x86/pci/numachip.c
> @@ -0,0 +1,129 @@
> +/*
> + * This file is subject to the terms and conditions of the GNU General Public
> + * License.  See the file "COPYING" in the main directory of this archive
> + * for more details.
> + *
> + * Numascale NumaConnect-specific PCI code
> + *
> + * Copyright (C) 2012 Numascale AS. All rights reserved.
> + *
> + * Send feedback to <support@numascale.com>
> + *
> + * PCI accessor functions derived from mmconfig_64.c
> + *
> + */
> +
> +#include <linux/pci.h>
> +#include <asm/pci_x86.h>
> +
> +static u8 limit __read_mostly;
> +
> +static inline char __iomem *pci_dev_base(unsigned int seg, unsigned int bus, unsigned int devfn)
> +{
> +       struct pci_mmcfg_region *cfg = pci_mmconfig_lookup(seg, bus);
> +
> +       if (cfg && cfg->virt)
> +               return cfg->virt + (PCI_MMCFG_BUS_OFFSET(bus) | (devfn << 12));
> +       return NULL;
> +}

Most of this file is copied directly from mmconfig_64.c (as you
mentioned above).  I wonder if we could avoid the code duplication by
making the pci_dev_base() implementation in mmconfig_64.c a weak
definition.  Then you could just supply a non-weak pci_dev_base() here
that would override that default version.  Your version would look
something like:

  char __iomem *pci_dev_base(unsigned int seg, unsigned int bus,
unsigned int devfn)
  {
      struct pci_mmcfg_region *cfg = pci_mmconfig_lookup(seg, bus);

      if (cfg && cfg->virt && devfn < limit)
          return cfg->virt + (PCI_MMCFG_BUS_OFFSET(bus) | (devfn << 12));
      return NULL;
  }

That would be different from what you have in this patch because reads
& writes to devices above "limit" would return -EINVAL rather than 0
as you do here.  Would that be a problem?

> +static int pci_mmcfg_read_numachip(unsigned int seg, unsigned int bus,
> +                         unsigned int devfn, int reg, int len, u32 *value)
> +{
> +       char __iomem *addr;
> +
> +       /* Why do we have this when nobody checks it. How about a BUG()!? -AK */
> +       if (unlikely((bus > 255) || (devfn > 255) || (reg > 4095))) {
> +err:           *value = -1;
> +               return -EINVAL;
> +       }
> +
> +       /* Ensure AMD Northbridges don't decode reads to other devices */
> +       if (unlikely(bus == 0 && devfn >= limit)) {
> +               *value = -1;
> +               return 0;
> +       }
> +
> +       rcu_read_lock();
> +       addr = pci_dev_base(seg, bus, devfn);
> +       if (!addr) {
> +               rcu_read_unlock();
> +               goto err;
> +       }
> +
> +       switch (len) {
> +       case 1:
> +               *value = mmio_config_readb(addr + reg);
> +               break;
> +       case 2:
> +               *value = mmio_config_readw(addr + reg);
> +               break;
> +       case 4:
> +               *value = mmio_config_readl(addr + reg);
> +               break;
> +       }
> +       rcu_read_unlock();
> +
> +       return 0;
> +}
> +
> +static int pci_mmcfg_write_numachip(unsigned int seg, unsigned int bus,
> +                          unsigned int devfn, int reg, int len, u32 value)
> +{
> +       char __iomem *addr;
> +
> +       /* Why do we have this when nobody checks it. How about a BUG()!? -AK */
> +       if (unlikely((bus > 255) || (devfn > 255) || (reg > 4095)))
> +               return -EINVAL;
> +
> +       /* Ensure AMD Northbridges don't decode writes to other devices */
> +       if (unlikely(bus == 0 && devfn >= limit))
> +               return 0;
> +
> +       rcu_read_lock();
> +       addr = pci_dev_base(seg, bus, devfn);
> +       if (!addr) {
> +               rcu_read_unlock();
> +               return -EINVAL;
> +       }
> +
> +       switch (len) {
> +       case 1:
> +               mmio_config_writeb(addr + reg, value);
> +               break;
> +       case 2:
> +               mmio_config_writew(addr + reg, value);
> +               break;
> +       case 4:
> +               mmio_config_writel(addr + reg, value);
> +               break;
> +       }
> +       rcu_read_unlock();
> +
> +       return 0;
> +}
> +
> +const struct pci_raw_ops pci_mmcfg_numachip = {
> +       .read = pci_mmcfg_read_numachip,
> +       .write = pci_mmcfg_write_numachip,
> +};
> +
> +int __init pci_numachip_init(void)
> +{
> +       int ret = 0;
> +       u32 val;
> +
> +       /* For remote I/O, restrict bus 0 access to the actual number of AMD
> +          Northbridges, which starts at device number 0x18 */
> +       ret = raw_pci_read(0, 0, PCI_DEVFN(0x18, 0), 0x60, sizeof(val), &val);
> +       if (ret)
> +               goto out;
> +
> +       /* HyperTransport fabric size in bits 6:4 */
> +       limit = PCI_DEVFN(0x18 + ((val >> 4) & 7) + 1, 0);
> +
> +       /* Use NumaChip PCI accessors for non-extended and extended access */
> +       raw_pci_ops = raw_pci_ext_ops = &pci_mmcfg_numachip;
> +out:
> +       return ret;
> +}
> --
> 1.7.9.5
>

^ permalink raw reply

* Re: Unreliable USB3 with NEC uPD720200 and Delock Cardreader
From: Sarah Sharp @ 2012-11-28 22:54 UTC (permalink / raw)
  To: Bjørn Mork
  Cc: Ulrich Eckhardt, Andrew Lutomirski, linux-usb@vger.kernel.org,
	Alan Stern, Ming Lei, Rafael J. Wysocki, Bjorn Helgaas, linux-pci,
	Huang Ying
In-Reply-To: <87boekave4.fsf@nemi.mork.no>

On Mon, Nov 26, 2012 at 10:48:03PM +0100, Bjørn Mork wrote:
> Sarah Sharp <sarah.a.sharp@linux.intel.com> writes:
> 
> > It looks like both Ulrich and Andrew have the same issue.  I also have a
> > Lenovo x220, and I confirmed that when I turn on PCI runtime suspend,
> > the NEC host controller does not report port status changes when a new
> > USB device is plugged in.
> >
> > I'm running 3.6.7, and I'm pretty sure that runtime suspend worked for
> > the NEC host on some older kernel.  I don't think the NEC host went into
> > D3cold on that kernel, though.  Is there a way to disable D3cold and
> > just use D3hot instead?
> 
> Yes, you have /sys/bus/pci/devices/.../d3cold_allowed
> See Documentation/ABI/testing/sysfs-bus-pci
> 
> If this really is a problem with the D3cold support that went into 3.6
> then I guess you should include Huang Ying in the discussions as well
> (CCed).

Turning off D3 cold didn't help.  Once the PCI device is suspended,
connect events do not generate an interrupt.

I'll go see if I can figure out which kernel this worked on and bisect.

Sarah Sharp

^ permalink raw reply

* Re: [PATCH] PCI/PM: Keep runtime PM enabled for unbound PCI devices
From: Rafael J. Wysocki @ 2012-11-28 22:43 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Huang Ying, linux-kernel, linux-pci, linux-pm, Rafael J. Wysocki,
	Alan Stern
In-Reply-To: <CAErSpo6CPSMXjq1eSTX41vtwOcd5rQOdC0-9ye5NZhds9U1B8A@mail.gmail.com>

On Wednesday, November 28, 2012 03:25:59 PM Bjorn Helgaas wrote:
> On Tue, Nov 20, 2012 at 1:08 AM, Huang Ying <ying.huang@intel.com> wrote:
> > For unbound PCI devices, what we need is:
> >
> >  - Always in D0 state, because some devices does not work again after
> >    being put into D3 by the PCI bus.
> >
> >  - In SUSPENDED state if allowed, so that the parent devices can still
> >    be put into low power state.
> >
> > To satisfy these requirement, the runtime PM for the unbound PCI
> > devices are disabled and set to SUSPENDED state.  One issue of this
> > solution is that the PCI devices will be put into SUSPENDED state even
> > if the SUSPENDED state is forbidden via the sysfs interface
> > (.../power/control) of the device.  This is not an issue for most
> > devices, because most PCI devices are not used at all if unbounded.
> > But there are exceptions.  For example, unbound VGA card can be used
> > for display, but suspend its parents make it stop working.
> >
> > To fix the issue, we keep the runtime PM enabled when the PCI devices
> > are unbound.  But the runtime PM callbacks will do nothing if the PCI
> > devices are unbound.  This way, we can put the PCI devices into
> > SUSPENDED state without put the PCI devices into D3 state.
> 
> Does this fix a reported problem?  Is there a bug report URL?  What
> problem would a user notice?

There is a BZ: https://bugzilla.kernel.org/show_bug.cgi?id=48201

Unfortunately, the reporter hasn't confirmed that the bug is fixed,
although we're quite confident that it will be.

> This doesn't look critical enough to try to put in v3.7 (correct me if
> I'm wrong).  It's getting pretty late for the v3.8 merge window, but
> it looks like it qualifies as a bug fix that we would merge even after
> the merge window, so I'll put it in my -next branch headed for v3.8.
> Then it can be backported to v3.6 and v3.7.

I think that's OK.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* Re: [PATCH] PCI/PM: Keep runtime PM enabled for unbound PCI devices
From: Bjorn Helgaas @ 2012-11-28 22:25 UTC (permalink / raw)
  To: Huang Ying
  Cc: linux-kernel, linux-pci, linux-pm, Rafael J. Wysocki,
	Rafael J. Wysocki, Alan Stern
In-Reply-To: <1353398902-21253-1-git-send-email-ying.huang@intel.com>

On Tue, Nov 20, 2012 at 1:08 AM, Huang Ying <ying.huang@intel.com> wrote:
> For unbound PCI devices, what we need is:
>
>  - Always in D0 state, because some devices does not work again after
>    being put into D3 by the PCI bus.
>
>  - In SUSPENDED state if allowed, so that the parent devices can still
>    be put into low power state.
>
> To satisfy these requirement, the runtime PM for the unbound PCI
> devices are disabled and set to SUSPENDED state.  One issue of this
> solution is that the PCI devices will be put into SUSPENDED state even
> if the SUSPENDED state is forbidden via the sysfs interface
> (.../power/control) of the device.  This is not an issue for most
> devices, because most PCI devices are not used at all if unbounded.
> But there are exceptions.  For example, unbound VGA card can be used
> for display, but suspend its parents make it stop working.
>
> To fix the issue, we keep the runtime PM enabled when the PCI devices
> are unbound.  But the runtime PM callbacks will do nothing if the PCI
> devices are unbound.  This way, we can put the PCI devices into
> SUSPENDED state without put the PCI devices into D3 state.

Does this fix a reported problem?  Is there a bug report URL?  What
problem would a user notice?

This doesn't look critical enough to try to put in v3.7 (correct me if
I'm wrong).  It's getting pretty late for the v3.8 merge window, but
it looks like it qualifies as a bug fix that we would merge even after
the merge window, so I'll put it in my -next branch headed for v3.8.
Then it can be backported to v3.6 and v3.7.

> Signed-off-by: Huang Ying <ying.huang@intel.com>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Alan Stern <stern@rowland.harvard.edu>
> CC: stable@vger.kernel.org          # v3.6+
> ---
>  drivers/pci/pci-driver.c |   69 +++++++++++++++++++++++++++--------------------
>  drivers/pci/pci.c        |    2 +
>  2 files changed, 43 insertions(+), 28 deletions(-)
>
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1900,6 +1900,8 @@ void pci_pm_init(struct pci_dev *dev)
>         u16 pmc;
>
>         pm_runtime_forbid(&dev->dev);
> +       pm_runtime_set_active(&dev->dev);
> +       pm_runtime_enable(&dev->dev);
>         device_enable_async_suspend(&dev->dev);
>         dev->wakeup_prepared = false;
>
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -256,31 +256,26 @@ struct drv_dev_and_id {
>  static long local_pci_probe(void *_ddi)
>  {
>         struct drv_dev_and_id *ddi = _ddi;
> -       struct device *dev = &ddi->dev->dev;
> -       struct device *parent = dev->parent;
> +       struct pci_dev *pci_dev = ddi->dev;
> +       struct pci_driver *pci_drv = ddi->drv;
> +       struct device *dev = &pci_dev->dev;
>         int rc;
>
> -       /* The parent bridge must be in active state when probing */
> -       if (parent)
> -               pm_runtime_get_sync(parent);
> -       /* Unbound PCI devices are always set to disabled and suspended.
> -        * During probe, the device is set to enabled and active and the
> -        * usage count is incremented.  If the driver supports runtime PM,
> -        * it should call pm_runtime_put_noidle() in its probe routine and
> -        * pm_runtime_get_noresume() in its remove routine.
> -        */
> -       pm_runtime_get_noresume(dev);
> -       pm_runtime_set_active(dev);
> -       pm_runtime_enable(dev);
> -
> -       rc = ddi->drv->probe(ddi->dev, ddi->id);
> +       /*
> +        * Unbound PCI devices are always put in D0, regardless of
> +        * runtime PM status.  During probe, the device is set to
> +        * active and the usage count is incremented.  If the driver
> +        * supports runtime PM, it should call pm_runtime_put_noidle()
> +        * in its probe routine and pm_runtime_get_noresume() in its
> +        * remove routine.
> +        */
> +       pm_runtime_get_sync(dev);
> +       pci_dev->driver = pci_drv;
> +       rc = pci_drv->probe(pci_dev, ddi->id);
>         if (rc) {
> -               pm_runtime_disable(dev);
> -               pm_runtime_set_suspended(dev);
> -               pm_runtime_put_noidle(dev);
> +               pci_dev->driver = NULL;
> +               pm_runtime_put_sync(dev);
>         }
> -       if (parent)
> -               pm_runtime_put(parent);
>         return rc;
>  }
>
> @@ -330,10 +325,8 @@ __pci_device_probe(struct pci_driver *dr
>                 id = pci_match_device(drv, pci_dev);
>                 if (id)
>                         error = pci_call_probe(drv, pci_dev, id);
> -               if (error >= 0) {
> -                       pci_dev->driver = drv;
> +               if (error >= 0)
>                         error = 0;
> -               }
>         }
>         return error;
>  }
> @@ -369,9 +362,7 @@ static int pci_device_remove(struct devi
>         }
>
>         /* Undo the runtime PM settings in local_pci_probe() */
> -       pm_runtime_disable(dev);
> -       pm_runtime_set_suspended(dev);
> -       pm_runtime_put_noidle(dev);
> +       pm_runtime_put_sync(dev);
>
>         /*
>          * If the device is still on, set the power state as "unknown",
> @@ -994,6 +985,13 @@ static int pci_pm_runtime_suspend(struct
>         pci_power_t prev = pci_dev->current_state;
>         int error;
>
> +       /*
> +        * If pci_dev->driver is not set (unbound), the device should
> +        * always remain in D0 regardless of the runtime PM status
> +        */
> +       if (!pci_dev->driver)
> +               return 0;
> +
>         if (!pm || !pm->runtime_suspend)
>                 return -ENOSYS;
>
> @@ -1029,6 +1027,13 @@ static int pci_pm_runtime_resume(struct
>         struct pci_dev *pci_dev = to_pci_dev(dev);
>         const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
>
> +       /*
> +        * If pci_dev->driver is not set (unbound), the device should
> +        * always remain in D0 regardless of the runtime PM status
> +        */
> +       if (!pci_dev->driver)
> +               return 0;
> +
>         if (!pm || !pm->runtime_resume)
>                 return -ENOSYS;
>
> @@ -1046,8 +1051,16 @@ static int pci_pm_runtime_resume(struct
>
>  static int pci_pm_runtime_idle(struct device *dev)
>  {
> +       struct pci_dev *pci_dev = to_pci_dev(dev);
>         const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
>
> +       /*
> +        * If pci_dev->driver is not set (unbound), the device should
> +        * always remain in D0 regardless of the runtime PM status
> +        */
> +       if (!pci_dev->driver)
> +               goto out;
> +
>         if (!pm)
>                 return -ENOSYS;
>
> @@ -1057,8 +1070,8 @@ static int pci_pm_runtime_idle(struct de
>                         return ret;
>         }
>
> +out:
>         pm_runtime_suspend(dev);
> -
>         return 0;
>  }
>

^ permalink raw reply

* RE: [PATCH v3] PCI-AER: Do not report successful error recovery for devices with AER-unaware drivers
From: Pandarathil, Vijaymohan R @ 2012-11-28 20:30 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
	linasvepstas@gmail.com, Myron Stowe, Ortiz, Lance E, Huang Ying,
	Hidetoshi Seto, Patterson, Andrew D (LeftHand Networks),
	Zhang Yanmin
In-Reply-To: <CAErSpo7VvWreoTj7MUE3cMGaM1dVMfZymO=1P-m8E-joRSgE7g@mail.gmail.com>

Looks correct.

Vijay

> -----Original Message-----
> From: Bjorn Helgaas [mailto:bhelgaas@google.com]
> Sent: Wednesday, November 28, 2012 12:15 PM
> To: Pandarathil, Vijaymohan R
> Cc: linux-kernel@vger.kernel.org; linux-pci@vger.kernel.org;
> linasvepstas@gmail.com; Myron Stowe; Ortiz, Lance E; Huang Ying; Hidetoshi
> Seto; Patterson, Andrew D (LeftHand Networks); Zhang Yanmin
> Subject: Re: [PATCH v3] PCI-AER: Do not report successful error recovery
> for devices with AER-unaware drivers
> 
> On Sat, Nov 17, 2012 at 4:47 AM, Pandarathil, Vijaymohan R
> <vijaymohan.pandarathil@hp.com> wrote:
> > When an error is detected on a PCIe device which does not have an
> > AER-aware driver, prevent AER infrastructure from reporting
> > successful error recovery.
> >
> > This is because the report_error_detected() function that gets
> > called in the first phase of recovery process allows forward
> > progress even when the driver for the device does not have AER
> > capabilities. It seems that all callbacks (in pci_error_handlers
> > structure) registered by drivers that gets called during error
> > recovery are not mandatory. So the intention of the infrastructure
> > design seems to be to allow forward progress even when a specific
> > callback has not been registered by a driver. However, if error
> > handler structure itself has not been registered, it doesn't make
> > sense to allow forward progress.
> >
> > As a result of the current design, in the case of a single device
> > having an AER-unaware driver or in the case of any function in a
> > multi-function card having an AER-unaware driver, a successful
> > recovery is reported.
> >
> > Typical scenario this happens is when a PCI device is detached
> > from a KVM host and the pci-stub driver on the host claims the
> > device. The pci-stub driver does not have error handling capabilities
> > but the AER infrastructure still reports that the device recovered
> > successfully.
> >
> > The changes proposed here leaves the device(s)in an unrecovered state
> > if the driver for the device or for any device in the subtree
> > does not have error handler structure registered. This reflects
> > the true state of the device and prevents any partial recovery (or no
> > recovery at all) reported as successful.
> >
> >
> > v3:
> >   - Fixed a checkpatch/build issue
> >
> > v2:
> >   - Made changes so that all devices in the subtree have the error
> >     state set correctly.
> >
> >
> > Reviewed-by: Linas Vepstas <linasvepstas <at> gmail.com>
> > Reviewed-by: Myron Stowe <mstowe <at> redhat.com>
> > Reviewed-by: Bjorn Helgaas <bhelgaas <at> google.com>
> > Signed-off-by: Vijay Mohan Pandarathil <vijaymohan.pandarathil@hp.com>
> 
> I added this to my -next branch, which will be merged for v3.8.
> 
> Please double-check that I resolved the merge conflict correctly.  It
> should show up here in the next few minutes:
> 
> http://git.kernel.org/?p=linux/kernel/git/helgaas/pci.git;a=shortlog;h=refs
> /heads/next
> 
> >  drivers/pci/pcie/aer/aerdrv.h      |  5 ++++-
> >  drivers/pci/pcie/aer/aerdrv_core.c | 21 ++++++++++++++++++---
> >  include/linux/pci.h                |  3 +++
> >  3 files changed, 25 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/pci/pcie/aer/aerdrv.h
> b/drivers/pci/pcie/aer/aerdrv.h
> > index 94a7598..22f840f 100644
> > --- a/drivers/pci/pcie/aer/aerdrv.h
> > +++ b/drivers/pci/pcie/aer/aerdrv.h
> > @@ -87,6 +87,9 @@ struct aer_broadcast_data {
> >  static inline pci_ers_result_t merge_result(enum pci_ers_result orig,
> >                 enum pci_ers_result new)
> >  {
> > +       if (new == PCI_ERS_RESULT_NO_AER_DRIVER)
> > +               return PCI_ERS_RESULT_NO_AER_DRIVER;
> > +
> >         if (new == PCI_ERS_RESULT_NONE)
> >                 return orig;
> >
> > @@ -97,7 +100,7 @@ static inline pci_ers_result_t merge_result(enum
> pci_ers_result orig,
> >                 break;
> >         case PCI_ERS_RESULT_DISCONNECT:
> >                 if (new == PCI_ERS_RESULT_NEED_RESET)
> > -                       orig = new;
> > +                       orig = PCI_ERS_RESULT_NEED_RESET;
> >                 break;
> >         default:
> >                 break;
> > diff --git a/drivers/pci/pcie/aer/aerdrv_core.c
> b/drivers/pci/pcie/aer/aerdrv_core.c
> > index 9f80cb0..b67f91a 100644
> > --- a/drivers/pci/pcie/aer/aerdrv_core.c
> > +++ b/drivers/pci/pcie/aer/aerdrv_core.c
> > @@ -232,11 +232,26 @@ static int report_error_detected(struct pci_dev
> *dev, void *data)
> >                                    dev->driver ?
> >                                    "no AER-aware driver" : "no driver");
> >                 }
> > -               return 0;
> > +
> > +               /*
> > +                * If there's any device in the subtree that does not
> > +                * have an error_detected callback, returning
> > +                * PCI_ERS_RESULT_NO_AER_DRIVER prevents calling of
> > +                * the subsequent mmio_enabled/slot_reset/resume
> > +                * callbacks of "any" device in the subtree. All the
> > +                * devices in the subtree are left in the error state
> > +                * without recovery.
> > +                */
> > +
> > +               if (!(dev->hdr_type & PCI_HEADER_TYPE_BRIDGE))
> > +                       vote = PCI_ERS_RESULT_NO_AER_DRIVER;
> > +               else
> > +                       vote = PCI_ERS_RESULT_NONE;
> > +       } else {
> > +               err_handler = dev->driver->err_handler;
> > +               vote = err_handler->error_detected(dev, result_data-
> >state);
> >         }
> >
> > -       err_handler = dev->driver->err_handler;
> > -       vote = err_handler->error_detected(dev, result_data->state);
> >         result_data->result = merge_result(result_data->result, vote);
> >         return 0;
> >  }
> > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > index ab17a08..c458602 100644
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -540,6 +540,9 @@ enum pci_ers_result {
> >
> >         /* Device driver is fully recovered and operational */
> >         PCI_ERS_RESULT_RECOVERED = (__force pci_ers_result_t) 5,
> > +
> > +       /* No AER capabilities registered for the driver */
> > +       PCI_ERS_RESULT_NO_AER_DRIVER = (__force pci_ers_result_t) 6,
> >  };
> >
> >  /* PCI bus error event callbacks */
> > --
> > 1.7.11.3
> >

^ permalink raw reply

* Re: [PATCH] PCI,sriov: add documentation on sysfs-based VF control
From: Bjorn Helgaas @ 2012-11-28 20:18 UTC (permalink / raw)
  To: Donald Dutile; +Cc: linux-pci
In-Reply-To: <1354073497-6702-1-git-send-email-ddutile@redhat.com>

On Tue, Nov 27, 2012 at 8:31 PM, Donald Dutile <ddutile@redhat.com> wrote:
>  Signed-off: Donald Dutile <ddutile@redhat.com>
>

Thanks, Don.  I added this to my -next branch, for merging in the v3.8
merge window.

I removed the trailing underscore from sriov_numvfs_.


>  Documentation/ABI/testing/sysfs-bus-pci | 34 +++++++++++++++++++++++
>  Documentation/PCI/pci-iov-howto.txt     | 48 ++++++++++++++++++++++++++++++---
>  2 files changed, 78 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci
> index dff1f48..1cb389d 100644
> --- a/Documentation/ABI/testing/sysfs-bus-pci
> +++ b/Documentation/ABI/testing/sysfs-bus-pci
> @@ -222,3 +222,37 @@ Description:
>                 satisfied too.  Reading this attribute will show the current
>                 value of d3cold_allowed bit.  Writing this attribute will set
>                 the value of d3cold_allowed bit.
> +
> +What:          /sys/bus/pci/devices/.../sriov_totalvfs
> +Date:          November 2012
> +Contact:       Donald Dutile <ddutile@redhat.com>
> +Description:
> +               This file appears when a physical PCIe device supports SR-IOV.
> +               Userspace applications can read this file to determine the
> +               maximum number of Virtual Functions (VFs) a PCIe physical
> +               function (PF) can support. Typically, this is the value reported
> +               in the PF's SR-IOV extended capability structure's TotalVFs
> +               element.  Drivers have the ability at probe time to reduce the
> +               value read from this file via the pci_sriov_set_totalvfs()
> +               function.
> +
> +What:          /sys/bus/pci/devices/.../sriov_numvfs_
> +Date:          November 2012
> +Contact:       Donald Dutile <ddutile@redhat.com>
> +Description:
> +               This file appears when a physical PCIe device supports SR-IOV.
> +               Userspace applications can read and write to this file to
> +               determine and control the enablement or disablement of Virtual
> +               Functions (VFs) on the physical function (PF). A read of this
> +               file will return the number of VFs that are enabled on this PF.
> +               A number written to this file will enable the specified
> +               number of VFs. A userspace application would typically read the
> +               file and check that the value is zero, and then write the number
> +               of VFs that should be enabled on the PF; the value written
> +               should be less than or equal to the value in the sriov_totalvfs
> +               file. A userspace application wanting to disable the VFs would
> +               write a zero to this file. The core ensures that valid values
> +               are written to this file, and returns errors when values are not
> +               valid.  For example, writing a 2 to this file when sriov_numvfs
> +               is not 0 and not 2 already will return an error. Writing a 10
> +               when the value of sriov_totalvfs is 8 will return an error.
> diff --git a/Documentation/PCI/pci-iov-howto.txt b/Documentation/PCI/pci-iov-howto.txt
> index fc73ef5..c41cf95 100644
> --- a/Documentation/PCI/pci-iov-howto.txt
> +++ b/Documentation/PCI/pci-iov-howto.txt
> @@ -2,6 +2,9 @@
>                 Copyright (C) 2009 Intel Corporation
>                     Yu Zhao <yu.zhao@intel.com>
>
> +               Update: November 2012
> +                       -- sysfs-based SRIOV enable-/disable-ment
> +               Donald Dutile <ddutile@redhat.com>
>
>  1. Overview
>
> @@ -24,10 +27,21 @@ real existing PCI device.
>
>  2.1 How can I enable SR-IOV capability
>
> -The device driver (PF driver) will control the enabling and disabling
> -of the capability via API provided by SR-IOV core. If the hardware
> -has SR-IOV capability, loading its PF driver would enable it and all
> -VFs associated with the PF.
> +Multiple methods are available for SR-IOV enablement.
> +In the first method, the device driver (PF driver) will control the
> +enabling and disabling of the capability via API provided by SR-IOV core.
> +If the hardware has SR-IOV capability, loading its PF driver would
> +enable it and all VFs associated with the PF.  Some PF drivers require
> +a module parameter to be set to determine the number of VFs to enable.
> +In the second method, a write to the sysfs file sriov_numvfs will
> +enable and disable the VFs associated with a PCIe PF.  This method
> +enables per-PF, VF enable/disable values versus the first method,
> +which applies to all PFs of the same device.  Additionally, the
> +PCI SRIOV core support ensures that enable/disable operations are
> +valid to reduce duplication in multiple drivers for the same
> +checks, e.g., check numvfs == 0 if enabling VFs, ensure
> +numvfs <= totalvfs.
> +The second method is the recommended method for new/future VF devices.
>
>  2.2 How can I use the Virtual Functions
>
> @@ -40,13 +54,22 @@ requires device driver that is same as a normal PCI device's.
>  3.1 SR-IOV API
>
>  To enable SR-IOV capability:
> +(a) For the first method, in the driver:
>         int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
>         'nr_virtfn' is number of VFs to be enabled.
> +(b) For the second method, from sysfs:
> +       echo 'nr_virtfn' > \
> +        /sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_numvfs
>
>  To disable SR-IOV capability:
> +(a) For the first method, in the driver:
>         void pci_disable_sriov(struct pci_dev *dev);
> +(b) For the second method, from sysfs:
> +       echo  0 > \
> +        /sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_numvfs
>
>  To notify SR-IOV core of Virtual Function Migration:
> +(a) In the driver:
>         irqreturn_t pci_sriov_migration(struct pci_dev *dev);
>
>  3.2 Usage example
> @@ -88,6 +111,22 @@ static void dev_shutdown(struct pci_dev *dev)
>         ...
>  }
>
> +static int dev_sriov_configure(struct pci_dev *dev, int numvfs)
> +{
> +       if (numvfs > 0) {
> +               ...
> +               pci_enable_sriov(dev, numvfs);
> +               ...
> +               return numvfs;
> +       }
> +       if (numvfs == 0) {
> +               ....
> +               pci_disable_sriov(dev);
> +               ...
> +               return 0;
> +       }
> +}
> +
>  static struct pci_driver dev_driver = {
>         .name =         "SR-IOV Physical Function driver",
>         .id_table =     dev_id_table,
> @@ -96,4 +135,5 @@ static struct pci_driver dev_driver = {
>         .suspend =      dev_suspend,
>         .resume =       dev_resume,
>         .shutdown =     dev_shutdown,
> +       .sriov_configure = dev_sriov_configure,
>  };
> --
> 1.7.10.2.552.gaa3bb87
>

^ permalink raw reply

* Re: [PATCH v3] PCI-AER: Do not report successful error recovery for devices with AER-unaware drivers
From: Bjorn Helgaas @ 2012-11-28 20:15 UTC (permalink / raw)
  To: Pandarathil, Vijaymohan R
  Cc: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
	linasvepstas@gmail.com, Myron Stowe, Ortiz, Lance E, Huang Ying,
	Hidetoshi Seto, Patterson, Andrew D (LeftHand Networks),
	Zhang Yanmin
In-Reply-To: <F9E001219150CB45BEDC82A650F360C90146A2F8@G9W0717.americas.hpqcorp.net>

On Sat, Nov 17, 2012 at 4:47 AM, Pandarathil, Vijaymohan R
<vijaymohan.pandarathil@hp.com> wrote:
> When an error is detected on a PCIe device which does not have an
> AER-aware driver, prevent AER infrastructure from reporting
> successful error recovery.
>
> This is because the report_error_detected() function that gets
> called in the first phase of recovery process allows forward
> progress even when the driver for the device does not have AER
> capabilities. It seems that all callbacks (in pci_error_handlers
> structure) registered by drivers that gets called during error
> recovery are not mandatory. So the intention of the infrastructure
> design seems to be to allow forward progress even when a specific
> callback has not been registered by a driver. However, if error
> handler structure itself has not been registered, it doesn't make
> sense to allow forward progress.
>
> As a result of the current design, in the case of a single device
> having an AER-unaware driver or in the case of any function in a
> multi-function card having an AER-unaware driver, a successful
> recovery is reported.
>
> Typical scenario this happens is when a PCI device is detached
> from a KVM host and the pci-stub driver on the host claims the
> device. The pci-stub driver does not have error handling capabilities
> but the AER infrastructure still reports that the device recovered
> successfully.
>
> The changes proposed here leaves the device(s)in an unrecovered state
> if the driver for the device or for any device in the subtree
> does not have error handler structure registered. This reflects
> the true state of the device and prevents any partial recovery (or no
> recovery at all) reported as successful.
>
>
> v3:
>   - Fixed a checkpatch/build issue
>
> v2:
>   - Made changes so that all devices in the subtree have the error
>     state set correctly.
>
>
> Reviewed-by: Linas Vepstas <linasvepstas <at> gmail.com>
> Reviewed-by: Myron Stowe <mstowe <at> redhat.com>
> Reviewed-by: Bjorn Helgaas <bhelgaas <at> google.com>
> Signed-off-by: Vijay Mohan Pandarathil <vijaymohan.pandarathil@hp.com>

I added this to my -next branch, which will be merged for v3.8.

Please double-check that I resolved the merge conflict correctly.  It
should show up here in the next few minutes:

http://git.kernel.org/?p=linux/kernel/git/helgaas/pci.git;a=shortlog;h=refs/heads/next

>  drivers/pci/pcie/aer/aerdrv.h      |  5 ++++-
>  drivers/pci/pcie/aer/aerdrv_core.c | 21 ++++++++++++++++++---
>  include/linux/pci.h                |  3 +++
>  3 files changed, 25 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/pci/pcie/aer/aerdrv.h b/drivers/pci/pcie/aer/aerdrv.h
> index 94a7598..22f840f 100644
> --- a/drivers/pci/pcie/aer/aerdrv.h
> +++ b/drivers/pci/pcie/aer/aerdrv.h
> @@ -87,6 +87,9 @@ struct aer_broadcast_data {
>  static inline pci_ers_result_t merge_result(enum pci_ers_result orig,
>                 enum pci_ers_result new)
>  {
> +       if (new == PCI_ERS_RESULT_NO_AER_DRIVER)
> +               return PCI_ERS_RESULT_NO_AER_DRIVER;
> +
>         if (new == PCI_ERS_RESULT_NONE)
>                 return orig;
>
> @@ -97,7 +100,7 @@ static inline pci_ers_result_t merge_result(enum pci_ers_result orig,
>                 break;
>         case PCI_ERS_RESULT_DISCONNECT:
>                 if (new == PCI_ERS_RESULT_NEED_RESET)
> -                       orig = new;
> +                       orig = PCI_ERS_RESULT_NEED_RESET;
>                 break;
>         default:
>                 break;
> diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
> index 9f80cb0..b67f91a 100644
> --- a/drivers/pci/pcie/aer/aerdrv_core.c
> +++ b/drivers/pci/pcie/aer/aerdrv_core.c
> @@ -232,11 +232,26 @@ static int report_error_detected(struct pci_dev *dev, void *data)
>                                    dev->driver ?
>                                    "no AER-aware driver" : "no driver");
>                 }
> -               return 0;
> +
> +               /*
> +                * If there's any device in the subtree that does not
> +                * have an error_detected callback, returning
> +                * PCI_ERS_RESULT_NO_AER_DRIVER prevents calling of
> +                * the subsequent mmio_enabled/slot_reset/resume
> +                * callbacks of "any" device in the subtree. All the
> +                * devices in the subtree are left in the error state
> +                * without recovery.
> +                */
> +
> +               if (!(dev->hdr_type & PCI_HEADER_TYPE_BRIDGE))
> +                       vote = PCI_ERS_RESULT_NO_AER_DRIVER;
> +               else
> +                       vote = PCI_ERS_RESULT_NONE;
> +       } else {
> +               err_handler = dev->driver->err_handler;
> +               vote = err_handler->error_detected(dev, result_data->state);
>         }
>
> -       err_handler = dev->driver->err_handler;
> -       vote = err_handler->error_detected(dev, result_data->state);
>         result_data->result = merge_result(result_data->result, vote);
>         return 0;
>  }
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index ab17a08..c458602 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -540,6 +540,9 @@ enum pci_ers_result {
>
>         /* Device driver is fully recovered and operational */
>         PCI_ERS_RESULT_RECOVERED = (__force pci_ers_result_t) 5,
> +
> +       /* No AER capabilities registered for the driver */
> +       PCI_ERS_RESULT_NO_AER_DRIVER = (__force pci_ers_result_t) 6,
>  };
>
>  /* PCI bus error event callbacks */
> --
> 1.7.11.3
>

^ permalink raw reply

* Re: [PATCH 0/9] PCI: Remove use of CONFIG_HOTPLUG
From: Bjorn Helgaas @ 2012-11-28 20:04 UTC (permalink / raw)
  To: Greg KH; +Cc: Bill Pemberton, linux-pci
In-Reply-To: <20121128190106.GA30376@kroah.com>

On Wed, Nov 28, 2012 at 12:01 PM, Greg KH <gregkh@linuxfoundation.org> wrote:
> On Wed, Nov 21, 2012 at 03:34:51PM -0500, Bill Pemberton wrote:
>> This is a respin of the patches to remove CONFIG_HOTPLUG for the PCI
>> subsystem.  The end result is the same as the patches in the big
>> patchset I sent out earlier this week but the changelog entries have
>> been changed to be consistent with the capitalization of PCI.  The
>> removal of the __dev* entries has also been squashed into a single
>> patch.
>
> Bjorn, any objection for me taking these patches through my driver-core
> tree?  If not, can I get your ack?

No objection.

Acked-by: Bjorn Helgaas <bhelgaas@google.com>

^ permalink raw reply

* Re: [PATCH 0/9] PCI: Remove use of CONFIG_HOTPLUG
From: Greg KH @ 2012-11-28 19:01 UTC (permalink / raw)
  To: Bill Pemberton; +Cc: linux-pci, Bjorn Helgaas
In-Reply-To: <1353530100-728-1-git-send-email-wfp5p@virginia.edu>

On Wed, Nov 21, 2012 at 03:34:51PM -0500, Bill Pemberton wrote:
> This is a respin of the patches to remove CONFIG_HOTPLUG for the PCI
> subsystem.  The end result is the same as the patches in the big
> patchset I sent out earlier this week but the changelog entries have
> been changed to be consistent with the capitalization of PCI.  The
> removal of the __dev* entries has also been squashed into a single
> patch.

Bjorn, any objection for me taking these patches through my driver-core
tree?  If not, can I get your ack?

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH] PCI,sriov: add documentation on sysfs-based VF control
From: Greg Kroah-Hartman @ 2012-11-28 17:33 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Donald Dutile, linux-pci
In-Reply-To: <CAErSpo6-Rz8+7rEcOhWP+YC9dK9U3y1R03Hj4msQD+7Fv9DWMw@mail.gmail.com>

On Wed, Nov 28, 2012 at 10:22:35AM -0700, Bjorn Helgaas wrote:
> [+cc Greg]
> 
> I plan to apply this to my -next branch today and anticipate merging
> during the v3.8 merge window, probably next week.  So speak up now
> with any concerns :)  The related code is already in my -next branch.

No objection from me.

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH] PCI,sriov: add documentation on sysfs-based VF control
From: Bjorn Helgaas @ 2012-11-28 17:22 UTC (permalink / raw)
  To: Donald Dutile; +Cc: linux-pci, Greg Kroah-Hartman
In-Reply-To: <1354073497-6702-1-git-send-email-ddutile@redhat.com>

[+cc Greg]

I plan to apply this to my -next branch today and anticipate merging
during the v3.8 merge window, probably next week.  So speak up now
with any concerns :)  The related code is already in my -next branch.

On Tue, Nov 27, 2012 at 8:31 PM, Donald Dutile <ddutile@redhat.com> wrote:
>  Signed-off: Donald Dutile <ddutile@redhat.com>
>
> ---
>  Documentation/ABI/testing/sysfs-bus-pci | 34 +++++++++++++++++++++++
>  Documentation/PCI/pci-iov-howto.txt     | 48 ++++++++++++++++++++++++++++++---
>  2 files changed, 78 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci
> index dff1f48..1cb389d 100644
> --- a/Documentation/ABI/testing/sysfs-bus-pci
> +++ b/Documentation/ABI/testing/sysfs-bus-pci
> @@ -222,3 +222,37 @@ Description:
>                 satisfied too.  Reading this attribute will show the current
>                 value of d3cold_allowed bit.  Writing this attribute will set
>                 the value of d3cold_allowed bit.
> +
> +What:          /sys/bus/pci/devices/.../sriov_totalvfs
> +Date:          November 2012
> +Contact:       Donald Dutile <ddutile@redhat.com>
> +Description:
> +               This file appears when a physical PCIe device supports SR-IOV.
> +               Userspace applications can read this file to determine the
> +               maximum number of Virtual Functions (VFs) a PCIe physical
> +               function (PF) can support. Typically, this is the value reported
> +               in the PF's SR-IOV extended capability structure's TotalVFs
> +               element.  Drivers have the ability at probe time to reduce the
> +               value read from this file via the pci_sriov_set_totalvfs()
> +               function.
> +
> +What:          /sys/bus/pci/devices/.../sriov_numvfs_

This should be just "sriov_numvfs", not "sriov_numvfs_", right?

> +Date:          November 2012
> +Contact:       Donald Dutile <ddutile@redhat.com>
> +Description:
> +               This file appears when a physical PCIe device supports SR-IOV.
> +               Userspace applications can read and write to this file to
> +               determine and control the enablement or disablement of Virtual
> +               Functions (VFs) on the physical function (PF). A read of this
> +               file will return the number of VFs that are enabled on this PF.
> +               A number written to this file will enable the specified
> +               number of VFs. A userspace application would typically read the
> +               file and check that the value is zero, and then write the number
> +               of VFs that should be enabled on the PF; the value written
> +               should be less than or equal to the value in the sriov_totalvfs
> +               file. A userspace application wanting to disable the VFs would
> +               write a zero to this file. The core ensures that valid values
> +               are written to this file, and returns errors when values are not
> +               valid.  For example, writing a 2 to this file when sriov_numvfs
> +               is not 0 and not 2 already will return an error. Writing a 10
> +               when the value of sriov_totalvfs is 8 will return an error.
> diff --git a/Documentation/PCI/pci-iov-howto.txt b/Documentation/PCI/pci-iov-howto.txt
> index fc73ef5..c41cf95 100644
> --- a/Documentation/PCI/pci-iov-howto.txt
> +++ b/Documentation/PCI/pci-iov-howto.txt
> @@ -2,6 +2,9 @@
>                 Copyright (C) 2009 Intel Corporation
>                     Yu Zhao <yu.zhao@intel.com>
>
> +               Update: November 2012
> +                       -- sysfs-based SRIOV enable-/disable-ment
> +               Donald Dutile <ddutile@redhat.com>
>
>  1. Overview
>
> @@ -24,10 +27,21 @@ real existing PCI device.
>
>  2.1 How can I enable SR-IOV capability
>
> -The device driver (PF driver) will control the enabling and disabling
> -of the capability via API provided by SR-IOV core. If the hardware
> -has SR-IOV capability, loading its PF driver would enable it and all
> -VFs associated with the PF.
> +Multiple methods are available for SR-IOV enablement.
> +In the first method, the device driver (PF driver) will control the
> +enabling and disabling of the capability via API provided by SR-IOV core.
> +If the hardware has SR-IOV capability, loading its PF driver would
> +enable it and all VFs associated with the PF.  Some PF drivers require
> +a module parameter to be set to determine the number of VFs to enable.
> +In the second method, a write to the sysfs file sriov_numvfs will
> +enable and disable the VFs associated with a PCIe PF.  This method
> +enables per-PF, VF enable/disable values versus the first method,
> +which applies to all PFs of the same device.  Additionally, the
> +PCI SRIOV core support ensures that enable/disable operations are
> +valid to reduce duplication in multiple drivers for the same
> +checks, e.g., check numvfs == 0 if enabling VFs, ensure
> +numvfs <= totalvfs.
> +The second method is the recommended method for new/future VF devices.
>
>  2.2 How can I use the Virtual Functions
>
> @@ -40,13 +54,22 @@ requires device driver that is same as a normal PCI device's.
>  3.1 SR-IOV API
>
>  To enable SR-IOV capability:
> +(a) For the first method, in the driver:
>         int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
>         'nr_virtfn' is number of VFs to be enabled.
> +(b) For the second method, from sysfs:
> +       echo 'nr_virtfn' > \
> +        /sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_numvfs
>
>  To disable SR-IOV capability:
> +(a) For the first method, in the driver:
>         void pci_disable_sriov(struct pci_dev *dev);
> +(b) For the second method, from sysfs:
> +       echo  0 > \
> +        /sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_numvfs
>
>  To notify SR-IOV core of Virtual Function Migration:
> +(a) In the driver:
>         irqreturn_t pci_sriov_migration(struct pci_dev *dev);
>
>  3.2 Usage example
> @@ -88,6 +111,22 @@ static void dev_shutdown(struct pci_dev *dev)
>         ...
>  }
>
> +static int dev_sriov_configure(struct pci_dev *dev, int numvfs)
> +{
> +       if (numvfs > 0) {
> +               ...
> +               pci_enable_sriov(dev, numvfs);
> +               ...
> +               return numvfs;
> +       }
> +       if (numvfs == 0) {
> +               ....
> +               pci_disable_sriov(dev);
> +               ...
> +               return 0;
> +       }
> +}
> +
>  static struct pci_driver dev_driver = {
>         .name =         "SR-IOV Physical Function driver",
>         .id_table =     dev_id_table,
> @@ -96,4 +135,5 @@ static struct pci_driver dev_driver = {
>         .suspend =      dev_suspend,
>         .resume =       dev_resume,
>         .shutdown =     dev_shutdown,
> +       .sriov_configure = dev_sriov_configure,
>  };
> --
> 1.7.10.2.552.gaa3bb87
>

^ permalink raw reply

* RE: [E1000-devel] 82571EB: Detected Hardware Unit Hang
From: Fujinaka, Todd @ 2012-11-28 15:53 UTC (permalink / raw)
  To: Joe Jin, Ben Hutchings
  Cc: Mary Mcgrath, netdev@vger.kernel.org, e1000-devel@lists.sf.net,
	linux-kernel@vger.kernel.org, linux-pci
In-Reply-To: <50B5CBD2.5020804@oracle.com>

VGhlIG9ubHkgRUVQUk9NIEkga25vdyBhYm91dCBvciBjYW4gc3BlYWsgdG8gaXMgdGhlIG9uZSBh
dHRhY2hlZCB0byB0aGUgODI1NzEgYW5kIGl0IGRvZXNuJ3Qgc2V0IHRoZSBNYXhQYXlsb2FkU2l6
ZS4gVGhhdCdzIGRvbmUgYnkgdGhlIEJJT1MuDQoNClRvZGQgRnVqaW5ha2ENClRlY2huaWNhbCBN
YXJrZXRpbmcgRW5naW5lZXINCkxBTiBBY2Nlc3MgRGl2aXNpb24gKExBRCkNCkludGVsIENvcnBv
cmF0aW9uDQp0b2RkLmZ1amluYWthQGludGVsLmNvbQ0KKDUwMykgNzEyLTQ1NjUNCg0KDQotLS0t
LU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KRnJvbTogSm9lIEppbiBbbWFpbHRvOmpvZS5qaW5Ab3Jh
Y2xlLmNvbV0gDQpTZW50OiBXZWRuZXNkYXksIE5vdmVtYmVyIDI4LCAyMDEyIDEyOjMxIEFNDQpU
bzogQmVuIEh1dGNoaW5ncw0KQ2M6IEZ1amluYWthLCBUb2RkOyBNYXJ5IE1jZ3JhdGg7IG5ldGRl
dkB2Z2VyLmtlcm5lbC5vcmc7IGUxMDAwLWRldmVsQGxpc3RzLnNmLm5ldDsgbGludXgta2VybmVs
QHZnZXIua2VybmVsLm9yZzsgbGludXgtcGNpDQpTdWJqZWN0OiBSZTogW0UxMDAwLWRldmVsXSA4
MjU3MUVCOiBEZXRlY3RlZCBIYXJkd2FyZSBVbml0IEhhbmcNCg0KT24gMTEvMjgvMTIgMDI6MTAs
IEJlbiBIdXRjaGluZ3Mgd3JvdGU6DQo+IE9uIFR1ZSwgMjAxMi0xMS0yNyBhdCAxNzozMiArMDAw
MCwgRnVqaW5ha2EsIFRvZGQgd3JvdGU6DQo+PiBGb3JnaXZlIG1lIGlmIEknbSBiZWluZyB0b28g
cmVwZXRpdGlvdXMgYXMgSSB0aGluayBzb21lIG9mIHRoaXMgaGFzIA0KPj4gYmVlbiBtZW50aW9u
ZWQgaW4gdGhlIHBhc3QuDQo+Pg0KPj4gV2UgKGFuZCBieSB3ZSBJIG1lYW4gdGhlIEV0aGVybmV0
IHBhcnQgYW5kIGRyaXZlcikgY2FuIG9ubHkgY2hhbmdlIA0KPj4gdGhlIGFkdmVydGlzZWQgYXZh
aWxhYmlsaXR5IG9mIGEgbGFyZ2VyIE1heFBheWxvYWRTaXplLiBUaGUgc2l6ZSBpcyANCj4+IG5l
Z290aWF0ZWQgYnkgYm90aCBzaWRlcyBvZiB0aGUgbGluayB3aGVuIHRoZSBsaW5rIGlzIGVzdGFi
bGlzaGVkLiANCj4+IFRoZSBkcml2ZXIgc2hvdWxkIG5vdCBjaGFuZ2UgdGhlIHNpemUgb2YgdGhl
IGxpbmsgYXMgaXQgd291bGQgYmUgDQo+PiBwb2tpbmcgYXQgcmVnaXN0ZXJzIG91dHNpZGUgb2Yg
aXRzIHNjb3BlIGFuZCBpcyBjb250cm9sbGVkIGJ5IHRoZSANCj4+IHVwc3RyZWFtIGJyaWRnZSAo
bm90IHVzKS4NCj4gWy4uLl0NCj4gDQo+IE1heFBheWxvYWRTaXplIChNUFMpIGlzIG5vdCBuZWdv
dGlhdGVkIGJldHdlZW4gZGV2aWNlcyBidXQgaXMgDQo+IHByb2dyYW1tZWQgYnkgdGhlIHN5c3Rl
bSBmaXJtd2FyZSAoYXQgbGVhc3QgZm9yIGRldmljZXMgcHJlc2VudCBhdCANCj4gYm9vdCAtIHRo
ZSBrZXJuZWwgbWF5IGJlIHJlc3BvbnNpYmxlIGluIGNhc2Ugb2YgaG90cGx1ZykuICBZb3UgY2Fu
IHVzZSANCj4gdGhlIGtlcm5lbCBwYXJhbWV0ZXIgJ3BjaT1wY2llX2J1c19wZXJmJyAob3Igb25l
IG9mIHNldmVyYWwgb3RoZXJzKSB0byANCj4gc2V0IGEgcG9saWN5IHRoYXQgb3ZlcnJpZGVzIHRo
aXMsIGJ1dCBubyBwb2xpY3kgd2lsbCBhbGxvdyBzZXR0aW5nIE1QUyANCj4gYWJvdmUgdGhlIGRl
dmljZSdzIE1heFBheWxvYWRTaXplU3VwcG9ydGVkIChNUFNTKS4NCj4gDQoNCkJlbiwNCg0KVW5m
b3J0dW5hdGVseSBJJ20gdXNpbmcgMy4wLngga2VybmVsIGFuZCB0aGlzIGlzIG5vdCBpbmNsdWRl
ZCBpbiB0aGUga2VybmVsLg0KU28gSSdtIHRyeWluZyB0byB1c2UgZXRodG9vbCBtb2RpZnkgaXQg
ZnJvbSBlZXByb20gdG8gc2VlIGlmIGhlbHAgb3Igbm8uDQoNCg0KVG9kZCwgSSdsbCByZXZpZXcg
YWxsIE1heFBheWxvYWQgZm9yIGFsbCBkZXZpY2VzLCBidXQgbmVlZCB0byBzYXkgaWYgaXQgbWlz
bWF0Y2gsIGN1c3RvbWVyIGNvdWxkIG5vdCBtb2RpZnkgaXQgZnJvbSBCSU9TIGZvciB0aGVyZSB3
YXMgbm90IGVudHJ5IGF0IHRoZXJlLCB0byB0ZXN0IGl0LCB3ZSBoYXZlIHRvIGZpbmQgaG93IHRv
IHZlcmlmeSBpZiB0aGlzIGlzIHRoZSByb290IGNhdXNlLCBzbyBzdGlsbCBuZWVkIHRvIGZpbmQg
dGhlIG9mZnNldCBpbiBlZXByb20uDQoNClRoYW5rcyBpbiBhZHZhbmNlLA0KSm9lDQoNCg==

^ permalink raw reply

* Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang
From: Joe Jin @ 2012-11-28  8:31 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Fujinaka, Todd, Mary Mcgrath, netdev@vger.kernel.org,
	e1000-devel@lists.sf.net, linux-kernel@vger.kernel.org, linux-pci
In-Reply-To: <1354039840.2701.14.camel@bwh-desktop.uk.solarflarecom.com>

On 11/28/12 02:10, Ben Hutchings wrote:
> On Tue, 2012-11-27 at 17:32 +0000, Fujinaka, Todd wrote:
>> Forgive me if I'm being too repetitious as I think some of this has
>> been mentioned in the past.
>>
>> We (and by we I mean the Ethernet part and driver) can only change the
>> advertised availability of a larger MaxPayloadSize. The size is
>> negotiated by both sides of the link when the link is established. The
>> driver should not change the size of the link as it would be poking at
>> registers outside of its scope and is controlled by the upstream
>> bridge (not us).
> [...]
> 
> MaxPayloadSize (MPS) is not negotiated between devices but is programmed
> by the system firmware (at least for devices present at boot - the
> kernel may be responsible in case of hotplug).  You can use the kernel
> parameter 'pci=pcie_bus_perf' (or one of several others) to set a policy
> that overrides this, but no policy will allow setting MPS above the
> device's MaxPayloadSizeSupported (MPSS).
> 

Ben,

Unfortunately I'm using 3.0.x kernel and this is not included in the kernel.
So I'm trying to use ethtool modify it from eeprom to see if help or no.


Todd, I'll review all MaxPayload for all devices, but need to say if it mismatch,
customer could not modify it from BIOS for there was not entry at there, to
test it, we have to find how to verify if this is the root cause, so still 
need to find the offset in eeprom.

Thanks in advance,
Joe


^ permalink raw reply

* [PATCH]  PCI,sriov: add documentation on sysfs-based VF control
From: Donald Dutile @ 2012-11-28  3:31 UTC (permalink / raw)
  To: linux-pci; +Cc: bhelgaas, ddutile

 Signed-off: Donald Dutile <ddutile@redhat.com>

---
 Documentation/ABI/testing/sysfs-bus-pci | 34 +++++++++++++++++++++++
 Documentation/PCI/pci-iov-howto.txt     | 48 ++++++++++++++++++++++++++++++---
 2 files changed, 78 insertions(+), 4 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci
index dff1f48..1cb389d 100644
--- a/Documentation/ABI/testing/sysfs-bus-pci
+++ b/Documentation/ABI/testing/sysfs-bus-pci
@@ -222,3 +222,37 @@ Description:
 		satisfied too.  Reading this attribute will show the current
 		value of d3cold_allowed bit.  Writing this attribute will set
 		the value of d3cold_allowed bit.
+
+What:		/sys/bus/pci/devices/.../sriov_totalvfs
+Date:		November 2012
+Contact:	Donald Dutile <ddutile@redhat.com>
+Description:
+		This file appears when a physical PCIe device supports SR-IOV.
+		Userspace applications can read this file to determine the
+		maximum number of Virtual Functions (VFs) a PCIe physical
+		function (PF) can support. Typically, this is the value reported
+		in the PF's SR-IOV extended capability structure's TotalVFs
+		element.  Drivers have the ability at probe time to reduce the
+		value read from this file via the pci_sriov_set_totalvfs()
+		function.
+
+What:		/sys/bus/pci/devices/.../sriov_numvfs_
+Date:		November 2012
+Contact:	Donald Dutile <ddutile@redhat.com>
+Description:
+		This file appears when a physical PCIe device supports SR-IOV.
+		Userspace applications can read and write to this file to
+		determine and control the enablement or disablement of Virtual
+		Functions (VFs) on the physical function (PF). A read of this
+		file will return the number of VFs that are enabled on this PF.
+		A number written to this file will enable the specified
+		number of VFs. A userspace application would typically read the
+		file and check that the value is zero, and then write the number
+		of VFs that should be enabled on the PF; the value written
+		should be less than or equal to the value in the sriov_totalvfs
+		file. A userspace application wanting to disable the VFs would
+		write a zero to this file. The core ensures that valid values
+		are written to this file, and returns errors when values are not
+		valid.  For example, writing a 2 to this file when sriov_numvfs
+		is not 0 and not 2 already will return an error. Writing a 10
+		when the value of sriov_totalvfs is 8 will return an error.
diff --git a/Documentation/PCI/pci-iov-howto.txt b/Documentation/PCI/pci-iov-howto.txt
index fc73ef5..c41cf95 100644
--- a/Documentation/PCI/pci-iov-howto.txt
+++ b/Documentation/PCI/pci-iov-howto.txt
@@ -2,6 +2,9 @@
 		Copyright (C) 2009 Intel Corporation
 		    Yu Zhao <yu.zhao@intel.com>
 
+		Update: November 2012
+			-- sysfs-based SRIOV enable-/disable-ment
+		Donald Dutile <ddutile@redhat.com>
 
 1. Overview
 
@@ -24,10 +27,21 @@ real existing PCI device.
 
 2.1 How can I enable SR-IOV capability
 
-The device driver (PF driver) will control the enabling and disabling
-of the capability via API provided by SR-IOV core. If the hardware
-has SR-IOV capability, loading its PF driver would enable it and all
-VFs associated with the PF.
+Multiple methods are available for SR-IOV enablement.
+In the first method, the device driver (PF driver) will control the 
+enabling and disabling of the capability via API provided by SR-IOV core.
+If the hardware has SR-IOV capability, loading its PF driver would 
+enable it and all VFs associated with the PF.  Some PF drivers require
+a module parameter to be set to determine the number of VFs to enable.
+In the second method, a write to the sysfs file sriov_numvfs will
+enable and disable the VFs associated with a PCIe PF.  This method
+enables per-PF, VF enable/disable values versus the first method,
+which applies to all PFs of the same device.  Additionally, the
+PCI SRIOV core support ensures that enable/disable operations are
+valid to reduce duplication in multiple drivers for the same
+checks, e.g., check numvfs == 0 if enabling VFs, ensure 
+numvfs <= totalvfs.
+The second method is the recommended method for new/future VF devices.
 
 2.2 How can I use the Virtual Functions
 
@@ -40,13 +54,22 @@ requires device driver that is same as a normal PCI device's.
 3.1 SR-IOV API
 
 To enable SR-IOV capability:
+(a) For the first method, in the driver:
 	int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
 	'nr_virtfn' is number of VFs to be enabled.
+(b) For the second method, from sysfs:
+	echo 'nr_virtfn' > \
+        /sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_numvfs
 
 To disable SR-IOV capability:
+(a) For the first method, in the driver:
 	void pci_disable_sriov(struct pci_dev *dev);
+(b) For the second method, from sysfs:
+	echo  0 > \
+        /sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_numvfs
 
 To notify SR-IOV core of Virtual Function Migration:
+(a) In the driver:
 	irqreturn_t pci_sriov_migration(struct pci_dev *dev);
 
 3.2 Usage example
@@ -88,6 +111,22 @@ static void dev_shutdown(struct pci_dev *dev)
 	...
 }
 
+static int dev_sriov_configure(struct pci_dev *dev, int numvfs)
+{
+	if (numvfs > 0) {
+		...
+		pci_enable_sriov(dev, numvfs);
+		...
+		return numvfs;
+	}
+	if (numvfs == 0) {
+		....
+		pci_disable_sriov(dev);
+		...
+		return 0;
+	}
+}
+
 static struct pci_driver dev_driver = {
 	.name =		"SR-IOV Physical Function driver",
 	.id_table =	dev_id_table,
@@ -96,4 +135,5 @@ static struct pci_driver dev_driver = {
 	.suspend =	dev_suspend,
 	.resume =	dev_resume,
 	.shutdown =	dev_shutdown,
+	.sriov_configure = dev_sriov_configure,
 };
-- 
1.7.10.2.552.gaa3bb87


^ permalink raw reply related

* RE: [E1000-devel] 82571EB: Detected Hardware Unit Hang
From: Fujinaka, Todd @ 2012-11-27 18:24 UTC (permalink / raw)
  To: Ben Hutchings, Mary Mcgrath
  Cc: Joe Jin, netdev@vger.kernel.org, e1000-devel@lists.sf.net,
	linux-kernel@vger.kernel.org, linux-pci
In-Reply-To: <1354039840.2701.14.camel@bwh-desktop.uk.solarflarecom.com>

VGhhbmtzIGZvciB0aGUgY2xhcmlmaWNhdGlvbi4gSSB3YXMganVzdCBnb2luZyBieSB0aGUgUENJ
ZSBzcGVjLCB3aGljaCBzYXlzIHRoZSBsb3dlc3QgdmFsdWUgb2YgYm90aCBlbmRzIGlzIHVzZWQs
IGFuZCBJIGZpZ3VyZWQgU09NRVRISU5HIGhhZCB0byBiZSBsb29raW5nIGF0IHRoYXQgYW5kIGRv
aW5nIHNvbWUgc29ydCBvZiBuZWdvdGlhdGlvbi4gSSdtIG5vIEJJT1MgZ3V5LCBzbyBJJ20gbm90
IHN1cmUgd2hhdCdzIGFjdHVhbGx5IGdvaW5nIG9uLCB3aGV0aGVyIHNvbWV0aGluZyB3YWxrcyB0
aGUgUENJZSB0cmVlIG9yIGlmIHRoZSBCSU9TIGp1c3Qgc2V0cyBhbGwgdGhlIHZhbHVlcyB0byB0
aGUgbWluaW11bS4NCg0KVG9kZCBGdWppbmFrYQ0KVGVjaG5pY2FsIE1hcmtldGluZyBFbmdpbmVl
cg0KTEFOIEFjY2VzcyBEaXZpc2lvbiAoTEFEKQ0KSW50ZWwgQ29ycG9yYXRpb24NCnRvZGQuZnVq
aW5ha2FAaW50ZWwuY29tDQooNTAzKSA3MTItNDU2NQ0KDQoNCi0tLS0tT3JpZ2luYWwgTWVzc2Fn
ZS0tLS0tDQpGcm9tOiBCZW4gSHV0Y2hpbmdzIFttYWlsdG86Ymh1dGNoaW5nc0Bzb2xhcmZsYXJl
LmNvbV0gDQpTZW50OiBUdWVzZGF5LCBOb3ZlbWJlciAyNywgMjAxMiAxMDoxMSBBTQ0KVG86IEZ1
amluYWthLCBUb2RkOyBNYXJ5IE1jZ3JhdGgNCkNjOiBKb2UgSmluOyBuZXRkZXZAdmdlci5rZXJu
ZWwub3JnOyBlMTAwMC1kZXZlbEBsaXN0cy5zZi5uZXQ7IGxpbnV4LWtlcm5lbEB2Z2VyLmtlcm5l
bC5vcmc7IGxpbnV4LXBjaQ0KU3ViamVjdDogUkU6IFtFMTAwMC1kZXZlbF0gODI1NzFFQjogRGV0
ZWN0ZWQgSGFyZHdhcmUgVW5pdCBIYW5nDQoNCk9uIFR1ZSwgMjAxMi0xMS0yNyBhdCAxNzozMiAr
MDAwMCwgRnVqaW5ha2EsIFRvZGQgd3JvdGU6DQo+IEZvcmdpdmUgbWUgaWYgSSdtIGJlaW5nIHRv
byByZXBldGl0aW91cyBhcyBJIHRoaW5rIHNvbWUgb2YgdGhpcyBoYXMgDQo+IGJlZW4gbWVudGlv
bmVkIGluIHRoZSBwYXN0Lg0KPiANCj4gV2UgKGFuZCBieSB3ZSBJIG1lYW4gdGhlIEV0aGVybmV0
IHBhcnQgYW5kIGRyaXZlcikgY2FuIG9ubHkgY2hhbmdlIHRoZSANCj4gYWR2ZXJ0aXNlZCBhdmFp
bGFiaWxpdHkgb2YgYSBsYXJnZXIgTWF4UGF5bG9hZFNpemUuIFRoZSBzaXplIGlzIA0KPiBuZWdv
dGlhdGVkIGJ5IGJvdGggc2lkZXMgb2YgdGhlIGxpbmsgd2hlbiB0aGUgbGluayBpcyBlc3RhYmxp
c2hlZC4gVGhlIA0KPiBkcml2ZXIgc2hvdWxkIG5vdCBjaGFuZ2UgdGhlIHNpemUgb2YgdGhlIGxp
bmsgYXMgaXQgd291bGQgYmUgcG9raW5nIGF0IA0KPiByZWdpc3RlcnMgb3V0c2lkZSBvZiBpdHMg
c2NvcGUgYW5kIGlzIGNvbnRyb2xsZWQgYnkgdGhlIHVwc3RyZWFtIA0KPiBicmlkZ2UgKG5vdCB1
cykuDQpbLi4uXQ0KDQpNYXhQYXlsb2FkU2l6ZSAoTVBTKSBpcyBub3QgbmVnb3RpYXRlZCBiZXR3
ZWVuIGRldmljZXMgYnV0IGlzIHByb2dyYW1tZWQgYnkgdGhlIHN5c3RlbSBmaXJtd2FyZSAoYXQg
bGVhc3QgZm9yIGRldmljZXMgcHJlc2VudCBhdCBib290IC0gdGhlIGtlcm5lbCBtYXkgYmUgcmVz
cG9uc2libGUgaW4gY2FzZSBvZiBob3RwbHVnKS4gIFlvdSBjYW4gdXNlIHRoZSBrZXJuZWwgcGFy
YW1ldGVyICdwY2k9cGNpZV9idXNfcGVyZicgKG9yIG9uZSBvZiBzZXZlcmFsIG90aGVycykgdG8g
c2V0IGEgcG9saWN5IHRoYXQgb3ZlcnJpZGVzIHRoaXMsIGJ1dCBubyBwb2xpY3kgd2lsbCBhbGxv
dyBzZXR0aW5nIE1QUyBhYm92ZSB0aGUgZGV2aWNlJ3MgTWF4UGF5bG9hZFNpemVTdXBwb3J0ZWQg
KE1QU1MpLg0KDQooVGhlc2UgcGFyYW1ldGVycyBhcmUgbm90IGRvY3VtZW50ZWQgaW4NCkRvY3Vt
ZW50YXRpb24va2VybmVsLXBhcmFtZXRlcnMudHh0ISAgU29tZW9uZSBvdWdodCB0byBmaXggdGhh
dC4pDQoNCkJlbi4NCg0KLS0NCkJlbiBIdXRjaGluZ3MsIFN0YWZmIEVuZ2luZWVyLCBTb2xhcmZs
YXJlIE5vdCBzcGVha2luZyBmb3IgbXkgZW1wbG95ZXI7IHRoYXQncyB0aGUgbWFya2V0aW5nIGRl
cGFydG1lbnQncyBqb2IuDQpUaGV5IGFza2VkIHVzIHRvIG5vdGUgdGhhdCBTb2xhcmZsYXJlIHBy
b2R1Y3QgbmFtZXMgYXJlIHRyYWRlbWFya2VkLg0KDQo=

^ permalink raw reply

* RE: [E1000-devel] 82571EB: Detected Hardware Unit Hang
From: Ben Hutchings @ 2012-11-27 18:10 UTC (permalink / raw)
  To: Fujinaka, Todd, Mary Mcgrath
  Cc: Joe Jin, netdev@vger.kernel.org, e1000-devel@lists.sf.net,
	linux-kernel@vger.kernel.org, linux-pci
In-Reply-To: <9B4A1B1917080E46B64F07F2989DADD62F2D8070@ORSMSX102.amr.corp.intel.com>

On Tue, 2012-11-27 at 17:32 +0000, Fujinaka, Todd wrote:
> Forgive me if I'm being too repetitious as I think some of this has
> been mentioned in the past.
> 
> We (and by we I mean the Ethernet part and driver) can only change the
> advertised availability of a larger MaxPayloadSize. The size is
> negotiated by both sides of the link when the link is established. The
> driver should not change the size of the link as it would be poking at
> registers outside of its scope and is controlled by the upstream
> bridge (not us).
[...]

MaxPayloadSize (MPS) is not negotiated between devices but is programmed
by the system firmware (at least for devices present at boot - the
kernel may be responsible in case of hotplug).  You can use the kernel
parameter 'pci=pcie_bus_perf' (or one of several others) to set a policy
that overrides this, but no policy will allow setting MPS above the
device's MaxPayloadSizeSupported (MPSS).

(These parameters are not documented in
Documentation/kernel-parameters.txt!  Someone ought to fix that.)

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* [PATCH RESEND] PCI: Allow pcie_aspm=force to work even when FADT indicates it is unsupported
From: Colin King @ 2012-11-27 14:09 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci; +Cc: linux-kernel

From: Colin Ian King <colin.king@canonical.com>

BugLink: http://bugs.launchpad.net/bugs/962038

Right now using pcie_aspm=force will not enable ASPM if the FADT indicates
ASPM is unsupported.  However, the semantics of force should probably allow
for this, especially as they did before the ASPM disable rework with commit
3c076351c4027a56d5005a39a0b518a4ba393ce2

This patch just skips the clearing of any ASPM setup that the firmware has
carried out on this bus if pcie_aspm=force is being used.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/pci/pcie/aspm.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index 213753b..449f257 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -773,6 +773,9 @@ void pcie_clear_aspm(struct pci_bus *bus)
 {
 	struct pci_dev *child;

+	if (aspm_force)
+		return;
+
 	/*
 	 * Clear any ASPM setup that the firmware has carried out on this bus
 	 */
-- 
1.8.0

^ permalink raw reply related

* Re: [PATCH v3 0/3] ACPI: container hot remove support.
From: Hanjun Guo @ 2012-11-27 11:24 UTC (permalink / raw)
  To: Tang Chen
  Cc: Yasuaki Ishimatsu, Rafael J. Wysocki, yinghai, bhelgaas, lenb,
	jiang.liu, izumi.taku, linux-acpi, linux-pci, linux-kernel,
	Wen Congyang, Toshi Kani
In-Reply-To: <50B42798.4030402@cn.fujitsu.com>

On 2012/11/27 10:38, Tang Chen wrote:
> On 11/27/2012 09:08 AM, Hanjun Guo wrote:
>> On 2012/11/26 14:06, Tang Chen wrote:
>>> On 11/26/2012 01:42 PM, Hanjun Guo wrote:
>>>>
>>>> Hi all,
>>>> I think Yasuaki mentioned the key point for the container device remove,
>>>> that is dependency.
>>>>
>>>> Currently, container, processor, and memory hotpulg are managed by different ACPI
>>>> hotplug drivers, the driver works when handle device hotplug individually, but they
>>>> have no idea for each other.
>>>>
>>>> This may introduce some issues, such as Yasuaki mentioned above, that is to say, we
>>>> should remove its child before remove the device itself, and hot add its parent before
>>>> the device itself.
>>>>
>>>> According to the ACPI namespace, we can resolve most of dependency issues. On a typical
>>>> two processor sockets system, the namespace is like this:
>>>>
>>>> /_SB                   ---container device, with HID ACPI0004
>>>>       |_SCK0             ---container device, with HID ACPI0004
>>>>        |_CPU0       ---processor device, with HID ACPI0009 or LNXCPU
>>>>             |_...
>>>>             |_CPUx
>>>>             |_MEM0       ---Memory device, with HID PNP0C80
>>>>       |_SCK1
>>>>        |_CPU0
>>>>             |_...
>>>>             |_CPUx
>>>>             |_MEM1
>>>>       |_PCI0            ---Host bridge, with HID PNP0A03 or PNP0A08
>>>>
>>>> If hot remove the container device, such as SCK0, we can easily know the dependency list
>>>> is CPU0~CPUx and MEM0, but I think the ACPI hotplug driver haven't resolve this now.
>>>>
>>>> And there is another corner case for hotplug devices in the namespace above, that is:
>>>> 1) Remove SCK0. yes, we can remove it with no dependency to the host bridge PCI0;
>>>>
>>>> 2) Remove SCK1 after SCK0. we should remove the host bridge PCI0 first,
>>>>      or the system will crash down. yes, dynamic dependency analysis is needed here.
>>>>      and the ACPI hotplug driver totally have no idea of this.
>>>>
>>>> so, should we do something to settle this down ?
>>>
>>> Hi Guo,
>>>
>>> I am trying to do this too. :)
>>
>> Great， what's your idea of this?
>>
> Hi Guo,
> 
> I noticed they had a lot of discussion on this topic.
> https://lkml.org/lkml/2012/11/15/159
> 
> And Vasilis's latest patches are here:
> https://lkml.org/lkml/2012/11/23/335
> 
> I think we can have a review of this patchset first. :)

Hi Tang,
Thanks for your remind, I will review Vasilis's latest patches,
and reply Vasilis's patch if I have any comments.

> 
> And also, as you said, the new ACPI hotplug framework from Liu Jiang
> will settle this problem more properly.
> So I think any solution now could be temporary. :)
> 
> .
> 



^ permalink raw reply

* Re: [Qemu-devel] [PATCH 0/4] AER-KVM: Error containment of PCI pass-thru devices assigned to KVM guests
From: Gleb Natapov @ 2012-11-27  6:46 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Pandarathil, Vijaymohan R, Stefan Hajnoczi,
	linux-kernel@vger.kernel.org, qemu-devel@nongnu.org,
	kvm@vger.kernel.org, linux-pci@vger.kernel.org
In-Reply-To: <20121126234612.GC10634@amt.cnet>

On Mon, Nov 26, 2012 at 09:46:12PM -0200, Marcelo Tosatti wrote:
> On Tue, Nov 20, 2012 at 02:09:46PM +0000, Pandarathil, Vijaymohan R wrote:
> > 
> > 
> > > -----Original Message-----
> > > From: Stefan Hajnoczi [mailto:stefanha@gmail.com]
> > > Sent: Tuesday, November 20, 2012 5:41 AM
> > > To: Pandarathil, Vijaymohan R
> > > Cc: kvm@vger.kernel.org; linux-pci@vger.kernel.org; qemu-devel@nongnu.org;
> > > linux-kernel@vger.kernel.org
> > > Subject: Re: [PATCH 0/4] AER-KVM: Error containment of PCI pass-thru
> > > devices assigned to KVM guests
> > > 
> > > On Tue, Nov 20, 2012 at 06:31:48AM +0000, Pandarathil, Vijaymohan R wrote:
> > > > Add support for error containment when a PCI pass-thru device assigned to
> > > a KVM
> > > > guest encounters an error. This is for PCIe devices/drivers that support
> > > AER
> > > > functionality. When the OS is notified of an error in a device either
> > > > through the firmware first approach or through an interrupt handled by
> > > the AER
> > > > root port driver, concerned subsystems are notified by invoking callbacks
> > > > registered by these subsystems. The device is also marked as tainted till
> > > the
> > > > corresponding driver recovery routines are successful.
> > > >
> > > > KVM module registers for a notification of such errors. In the KVM
> > > callback
> > > > routine, a global counter is incremented to keep track of the error
> > > > notification. Before each CPU enters guest mode to execute guest code,
> > > > appropriate checks are done to see if the impacted device belongs to the
> > > guest
> > > > or not. If the device belongs to the guest, qemu hypervisor for the guest
> > > is
> > > > informed and the guest is immediately brought down, thus preventing or
> > > > minimizing chances of any bad data being written out by the guest driver
> > > > after the device has encountered an error.
> > > 
> > > I'm surprised that the hypervisor would shut down the guest when PCIe
> > > AER kicks in for a pass-through device.  Shouldn't we pass the AER event
> > > into the guest and deal with it there?
> > 
> > Agreed. That would be the ideal behavior and is planned in a future patch.
> > Lack of control over the capabilities/type of the OS/drivers running in 
> > the guest is also a concern in passing along the event to the guest.
> > 
> > My understanding is that in the current implementation of Linux/KVM, these 
> > errors are not handled at all and can potentially cause a guest hang or 
> > crash or even data corruption depending on the implementation of the guest
> > driver for the device. As a first step, these patches make the behavior 
> > better by doing error containment with a predictable behavior when such
> > errors occur. 
> 
> For both ACPI notifications and Linux PCI AER driver there is a way for
> the PCI driver to receive a notification, correct?
> 
> Can just have virt/kvm/assigned-dev.c code register such a notifier (as
> a "PCI driver") and then perform appropriate action?
> 
> Also the semantics of "tainted driver" is not entirely clear.
> 
> Is there any reason for not having this feature for VFIO only, as KVM
> device assigment is being phased out?
> 
Exactly. We shouldn't add checks to guest entry code and introduce new
userspace ABI to add minor feature to deprecated code. New userspace ABI
means that QEMU changes are needed, so the feature will be fully functional
only with latest QEMU which is capable of using VFIO anyway.

--
			Gleb.

^ permalink raw reply

* Re: [PATCH v3 0/3] ACPI: container hot remove support.
From: Tang Chen @ 2012-11-27  2:38 UTC (permalink / raw)
  To: Hanjun Guo
  Cc: Yasuaki Ishimatsu, Rafael J. Wysocki, yinghai, bhelgaas, lenb,
	jiang.liu, izumi.taku, linux-acpi, linux-pci, linux-kernel,
	Wen Congyang, Toshi Kani
In-Reply-To: <50B4127F.3080400@huawei.com>

On 11/27/2012 09:08 AM, Hanjun Guo wrote:
> On 2012/11/26 14:06, Tang Chen wrote:
>> On 11/26/2012 01:42 PM, Hanjun Guo wrote:
>>>
>>> Hi all,
>>> I think Yasuaki mentioned the key point for the container device remove,
>>> that is dependency.
>>>
>>> Currently, container, processor, and memory hotpulg are managed by different ACPI
>>> hotplug drivers, the driver works when handle device hotplug individually, but they
>>> have no idea for each other.
>>>
>>> This may introduce some issues, such as Yasuaki mentioned above, that is to say, we
>>> should remove its child before remove the device itself, and hot add its parent before
>>> the device itself.
>>>
>>> According to the ACPI namespace, we can resolve most of dependency issues. On a typical
>>> two processor sockets system, the namespace is like this:
>>>
>>> /_SB                   ---container device, with HID ACPI0004
>>>       |_SCK0             ---container device, with HID ACPI0004
>>>        |_CPU0       ---processor device, with HID ACPI0009 or LNXCPU
>>>             |_...
>>>             |_CPUx
>>>             |_MEM0       ---Memory device, with HID PNP0C80
>>>       |_SCK1
>>>        |_CPU0
>>>             |_...
>>>             |_CPUx
>>>             |_MEM1
>>>       |_PCI0            ---Host bridge, with HID PNP0A03 or PNP0A08
>>>
>>> If hot remove the container device, such as SCK0, we can easily know the dependency list
>>> is CPU0~CPUx and MEM0, but I think the ACPI hotplug driver haven't resolve this now.
>>>
>>> And there is another corner case for hotplug devices in the namespace above, that is:
>>> 1) Remove SCK0. yes, we can remove it with no dependency to the host bridge PCI0;
>>>
>>> 2) Remove SCK1 after SCK0. we should remove the host bridge PCI0 first,
>>>      or the system will crash down. yes, dynamic dependency analysis is needed here.
>>>      and the ACPI hotplug driver totally have no idea of this.
>>>
>>> so, should we do something to settle this down ?
>>
>> Hi Guo,
>>
>> I am trying to do this too. :)
>
> Great， what's your idea of this?
>
Hi Guo,

I noticed they had a lot of discussion on this topic.
https://lkml.org/lkml/2012/11/15/159

And Vasilis's latest patches are here:
https://lkml.org/lkml/2012/11/23/335

I think we can have a review of this patchset first. :)

And also, as you said, the new ACPI hotplug framework from Liu Jiang
will settle this problem more properly.
So I think any solution now could be temporary. :)

^ permalink raw reply

* Re: [PATCH v3 0/3] ACPI: container hot remove support.
From: Hanjun Guo @ 2012-11-27  1:08 UTC (permalink / raw)
  To: Tang Chen
  Cc: Yasuaki Ishimatsu, Rafael J. Wysocki, yinghai, bhelgaas, lenb,
	jiang.liu, izumi.taku, linux-acpi, linux-pci, linux-kernel,
	Wen Congyang, Toshi Kani
In-Reply-To: <50B306E0.3050207@cn.fujitsu.com>

On 2012/11/26 14:06, Tang Chen wrote:
> On 11/26/2012 01:42 PM, Hanjun Guo wrote:
>>
>> Hi all,
>> I think Yasuaki mentioned the key point for the container device remove,
>> that is dependency.
>>
>> Currently, container, processor, and memory hotpulg are managed by different ACPI
>> hotplug drivers, the driver works when handle device hotplug individually, but they
>> have no idea for each other.
>>
>> This may introduce some issues, such as Yasuaki mentioned above, that is to say, we
>> should remove its child before remove the device itself, and hot add its parent before
>> the device itself.
>>
>> According to the ACPI namespace, we can resolve most of dependency issues. On a typical
>> two processor sockets system, the namespace is like this:
>>
>> /_SB                   ---container device, with HID ACPI0004
>>      |_SCK0             ---container device, with HID ACPI0004
>>       |_CPU0       ---processor device, with HID ACPI0009 or LNXCPU
>>            |_...
>>            |_CPUx
>>            |_MEM0       ---Memory device, with HID PNP0C80
>>      |_SCK1
>>       |_CPU0
>>            |_...
>>            |_CPUx
>>            |_MEM1
>>      |_PCI0            ---Host bridge, with HID PNP0A03 or PNP0A08
>>
>> If hot remove the container device, such as SCK0, we can easily know the dependency list
>> is CPU0~CPUx and MEM0, but I think the ACPI hotplug driver haven't resolve this now.
>>
>> And there is another corner case for hotplug devices in the namespace above, that is:
>> 1) Remove SCK0. yes, we can remove it with no dependency to the host bridge PCI0;
>>
>> 2) Remove SCK1 after SCK0. we should remove the host bridge PCI0 first,
>>     or the system will crash down. yes, dynamic dependency analysis is needed here.
>>     and the ACPI hotplug driver totally have no idea of this.
>>
>> so, should we do something to settle this down ?
> 
> Hi Guo,
> 
> I am trying to do this too. :)

Great， what's your idea of this?



^ permalink raw reply

* [PATCH v7 5/5] x86, pci: Enable PCI INTx when MSI is disabled
From: Takao Indoh @ 2012-11-27  0:43 UTC (permalink / raw)
  To: linux-pci, x86, linux-kernel
  Cc: tokunaga.keiich, kexec, hbabu, andi, ddutile, Takao Indoh,
	ishii.hironobu, hpa, bhelgaas, tglx, yinghai, mingo, vgoyal,
	khalid
In-Reply-To: <20121127004144.3604.61708.sendpatchset@tindoh.g01.fujitsu.local>

This patch enables INTx if MSI is disabled in pcibios_enable_device().
In normal case interrupt disable bit in command register is 0b on boot
time, but in case of kdump, this bit may be 1b. It causes problems of
some drivers. At leaset I confirmed mptsas driver does not work in such
a case. This patch fix this problem.

Signed-off-by: Takao Indoh <indou.takao@jp.fujitsu.com>
---
 arch/x86/pci/common.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
index 720e973..2bb7ecc 100644
--- a/arch/x86/pci/common.c
+++ b/arch/x86/pci/common.c
@@ -615,8 +615,10 @@ int pcibios_enable_device(struct pci_dev *dev, int mask)
 	if ((err = pci_enable_resources(dev, mask)) < 0)
 		return err;
 
-	if (!pci_dev_msi_enabled(dev))
+	if (!pci_dev_msi_enabled(dev)) {
+		pci_intx(dev, true);
 		return pcibios_enable_irq(dev);
+	}
 	return 0;
 }
 
-- 
1.7.1



^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox