* [PATCH v2 0/2] Reset PCIe devices to address DMA problem on kdump with iommu @ 2012-10-02 7:49 Takao Indoh 2012-10-02 7:50 ` [PATCH v2 1/2] x86, pci: Reset PCIe devices at boot time Takao Indoh ` (3 more replies) 0 siblings, 4 replies; 10+ messages in thread From: Takao Indoh @ 2012-10-02 7:49 UTC (permalink / raw) To: linux-pci, x86, linux-kernel Cc: martin.wilck, kexec, hbabu, mingo, vgoyal, ishii.hironobu, hpa, bhelgaas, tglx, Takao Indoh These patches reset PCIe devices at boot time to address DMA problem on kdump with iommu. When "reset_devices" is specified, a hot reset is triggered on each PCIe root port and downstream port to reset its downstream endpoint. Background: A kdump problem about DMA has been discussed for a long time. That is, when a kernel is switched to the kdump kernel DMA derived from first kernel affects second kernel. Recently this problem surfaces when iommu is used for PCI passthrough on KVM guest. In the case of the machine I use, when intel_iommu=on is specified, DMAR error is detected in kdump kernel and PCI SERR is also detected. Finally kdump fails because some devices does not work correctly. The root cause is that ongoing DMA from first kernel causes DMAR fault because page table of DMAR is initialized while kdump kernel is booting up. Therefore to address this problem DMA needs to be stopped before DMAR is initialized at kdump kernel boot time. By these patches, PCIe devices are reset by hot reset and its DMA is stopped when reset_devices is specified. One problem of this solution is that the monitor blacks out when VGA controller is reset. So this patch does not reset the port whose child endpoint is VGA device. v2: Reset devices in setup_arch() because reset need to be done before interrupt remapping is initialized. v1: https://lkml.org/lkml/2012/8/3/160 Thanks, Takao Indoh ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 1/2] x86, pci: Reset PCIe devices at boot time 2012-10-02 7:49 [PATCH v2 0/2] Reset PCIe devices to address DMA problem on kdump with iommu Takao Indoh @ 2012-10-02 7:50 ` Takao Indoh 2012-10-02 7:50 ` [PATCH v2 2/2] x86, pci: Enable PCI INTx when MSI is disabled Takao Indoh ` (2 subsequent siblings) 3 siblings, 0 replies; 10+ messages in thread From: Takao Indoh @ 2012-10-02 7:50 UTC (permalink / raw) To: linux-pci, x86, linux-kernel Cc: martin.wilck, kexec, hbabu, mingo, Takao Indoh, ishii.hironobu, hpa, bhelgaas, tglx, vgoyal This patch resets PCIe devices at boot time by hot reset when "reset_devices" is specified. Signed-off-by: Takao Indoh <indou.takao@jp.fujitsu.com> --- arch/x86/include/asm/pci-direct.h | 1 arch/x86/kernel/setup.c | 3 arch/x86/pci/early.c | 302 ++++++++++++++++++++++++++++ drivers/pci/pci.c | 18 - include/linux/pci.h | 18 + init/main.c | 4 6 files changed, 326 insertions(+), 20 deletions(-) diff --git a/arch/x86/include/asm/pci-direct.h b/arch/x86/include/asm/pci-direct.h index b1e7a45..de30db2 100644 --- a/arch/x86/include/asm/pci-direct.h +++ b/arch/x86/include/asm/pci-direct.h @@ -18,4 +18,5 @@ extern int early_pci_allowed(void); extern unsigned int pci_early_dump_regs; extern void early_dump_pci_device(u8 bus, u8 slot, u8 func); extern void early_dump_pci_devices(void); +extern void early_reset_pcie_devices(void); #endif /* _ASM_X86_PCI_DIRECT_H */ diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index f4b9b80..24b011c 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -988,6 +988,9 @@ void __init setup_arch(char **cmdline_p) generic_apic_probe(); early_quirks(); +#ifdef CONFIG_PCI + early_reset_pcie_devices(); +#endif /* * Read APIC and some other early information from ACPI tables. diff --git a/arch/x86/pci/early.c b/arch/x86/pci/early.c index d1067d5..2ca5f0e 100644 --- a/arch/x86/pci/early.c +++ b/arch/x86/pci/early.c @@ -1,5 +1,6 @@ #include <linux/kernel.h> #include <linux/pci.h> +#include <linux/bootmem.h> #include <asm/pci-direct.h> #include <asm/io.h> #include <asm/pci_x86.h> @@ -109,3 +110,304 @@ void early_dump_pci_devices(void) } } } + +struct save_config { + u32 pci[16]; + u16 pcie[PCI_EXP_SAVE_REGS]; +}; + +struct devinfo { + int pcie_pos; /* position of PCI Express capability */ + int pcie_flags; /* PCI_EXP_FLAGS */ + struct save_config *save; +}; + +static void pci_udelay(int loops) +{ + while (loops--) { + /* Approximately 1 us */ + native_io_delay(); + } +} + +/* Derived from drivers/pci/pci.c */ +#define PCI_FIND_CAP_TTL 48 +static int __init __pci_find_next_cap_ttl(u8 bus, u8 slot, u8 func, + u8 pos, int cap, int *ttl) +{ + u8 id; + + while ((*ttl)--) { + pos = read_pci_config_byte(bus, slot, func, pos); + if (pos < 0x40) + break; + pos &= ~3; + id = read_pci_config_byte(bus, slot, func, + pos + PCI_CAP_LIST_ID); + if (id == 0xff) + break; + if (id == cap) + return pos; + pos += PCI_CAP_LIST_NEXT; + } + return 0; +} + +static int __init __pci_find_next_cap(u8 bus, u8 slot, u8 func, u8 pos, int cap) +{ + int ttl = PCI_FIND_CAP_TTL; + + return __pci_find_next_cap_ttl(bus, slot, func, pos, cap, &ttl); +} + +static int __init __pci_bus_find_cap_start(u8 bus, u8 slot, u8 func, + u8 hdr_type) +{ + u16 status; + + status = read_pci_config_16(bus, slot, func, PCI_STATUS); + if (!(status & PCI_STATUS_CAP_LIST)) + return 0; + + switch (hdr_type) { + case PCI_HEADER_TYPE_NORMAL: + case PCI_HEADER_TYPE_BRIDGE: + return PCI_CAPABILITY_LIST; + case PCI_HEADER_TYPE_CARDBUS: + return PCI_CB_CAPABILITY_LIST; + default: + return 0; + } + + return 0; +} + +static int __init early_pci_find_capability(u8 bus, u8 slot, u8 func, int cap) +{ + int pos; + u8 type = read_pci_config_byte(bus, slot, func, PCI_HEADER_TYPE); + + pos = __pci_bus_find_cap_start(bus, slot, func, type & 0x7f); + if (pos) + pos = __pci_find_next_cap(bus, slot, func, pos, cap); + + return pos; +} + +static void __init do_reset(u8 bus, u8 slot, u8 func) +{ + u16 ctrl; + + printk(KERN_INFO "pci 0000:%02x:%02x.%d reset\n", bus, slot, func); + + /* Assert Secondary Bus Reset */ + ctrl = read_pci_config_16(bus, slot, func, PCI_BRIDGE_CONTROL); + ctrl |= PCI_BRIDGE_CTL_BUS_RESET; + write_pci_config_16(bus, slot, func, PCI_BRIDGE_CONTROL, ctrl); + + pci_udelay(5000); + + /* De-assert Secondary Bus Reset */ + ctrl &= ~PCI_BRIDGE_CTL_BUS_RESET; + write_pci_config_16(bus, slot, func, PCI_BRIDGE_CONTROL, ctrl); + + pci_udelay(500000); +} + +static void __init save_state(unsigned bus, unsigned slot, unsigned func, + struct devinfo *info) +{ + int i; + int pcie, flags, pcie_type; + struct save_config *save; + + pcie = info->pcie_pos; + flags = info->pcie_flags; + pcie_type = (flags & PCI_EXP_FLAGS_TYPE) >> 4; + save = info->save; + + printk(KERN_INFO "pci 0000:%02x:%02x.%d save state\n", bus, slot, func); + + for (i = 0; i < 16; i++) + save->pci[i] = read_pci_config(bus, slot, func, i * 4); + i = 0; + if (pcie_cap_has_devctl(pcie_type, flags)) + save->pcie[i++] = read_pci_config_16(bus, slot, func, + pcie + PCI_EXP_DEVCTL); + if (pcie_cap_has_lnkctl(pcie_type, flags)) + save->pcie[i++] = read_pci_config_16(bus, slot, func, + pcie + PCI_EXP_LNKCTL); + if (pcie_cap_has_sltctl(pcie_type, flags)) + save->pcie[i++] = read_pci_config_16(bus, slot, func, + pcie + PCI_EXP_SLTCTL); + if (pcie_cap_has_rtctl(pcie_type, flags)) + save->pcie[i++] = read_pci_config_16(bus, slot, func, + pcie + PCI_EXP_RTCTL); + + if ((flags & PCI_EXP_FLAGS_VERS) >= 2) { + save->pcie[i++] = read_pci_config_16(bus, slot, func, + pcie + PCI_EXP_DEVCTL2); + save->pcie[i++] = read_pci_config_16(bus, slot, func, + pcie + PCI_EXP_LNKCTL2); + save->pcie[i++] = read_pci_config_16(bus, slot, func, + pcie + PCI_EXP_SLTCTL2); + } +} + +static void __init restore_state(unsigned bus, unsigned slot, unsigned func, + struct devinfo *info) +{ + int i = 0; + int pcie, flags, pcie_type; + struct save_config *save; + + pcie = info->pcie_pos; + flags = info->pcie_flags; + pcie_type = (flags & PCI_EXP_FLAGS_TYPE) >> 4; + save = info->save; + + printk(KERN_INFO "pci 0000:%02x:%02x.%d restore state\n", + bus, slot, func); + + if (pcie_cap_has_devctl(pcie_type, flags)) + write_pci_config_16(bus, slot, func, + pcie + PCI_EXP_DEVCTL, save->pcie[i++]); + if (pcie_cap_has_lnkctl(pcie_type, flags)) + write_pci_config_16(bus, slot, func, + pcie + PCI_EXP_LNKCTL, save->pcie[i++]); + if (pcie_cap_has_sltctl(pcie_type, flags)) + write_pci_config_16(bus, slot, func, + pcie + PCI_EXP_SLTCTL, save->pcie[i++]); + if (pcie_cap_has_rtctl(pcie_type, flags)) + write_pci_config_16(bus, slot, func, + pcie + PCI_EXP_RTCTL, save->pcie[i++]); + + if ((flags & PCI_EXP_FLAGS_VERS) >= 2) { + write_pci_config_16(bus, slot, func, + pcie + PCI_EXP_DEVCTL2, save->pcie[i++]); + write_pci_config_16(bus, slot, func, + pcie + PCI_EXP_LNKCTL2, save->pcie[i++]); + write_pci_config_16(bus, slot, func, + pcie + PCI_EXP_SLTCTL2, save->pcie[i++]); + } + + for (i = 15; i >= 0; i--) + write_pci_config(bus, slot, func, i * 4, save->pci[i]); +} + +static void __init reset_pcie_device(unsigned bus, unsigned slot, unsigned func) +{ + int f, count; + int pcie, pcie_type; + u8 type; + u16 vendor, flags; + u32 class; + int secondary; + struct devinfo child[8]; + struct save_config *save, *pos; + int size; + + pcie = early_pci_find_capability(bus, slot, func, PCI_CAP_ID_EXP); + if (!pcie) + return; + + flags = read_pci_config_16(bus, slot, func, pcie + PCI_EXP_FLAGS); + pcie_type = (flags & PCI_EXP_FLAGS_TYPE) >> 4; + if ((pcie_type != PCI_EXP_TYPE_ROOT_PORT) && + (pcie_type != PCI_EXP_TYPE_DOWNSTREAM)) + return; + + type = read_pci_config_byte(bus, slot, func, PCI_HEADER_TYPE); + if ((type & 0x7f) != PCI_HEADER_TYPE_BRIDGE) + return; + secondary = read_pci_config_byte(bus, slot, func, PCI_SECONDARY_BUS); + memset(child, 0, sizeof(child)); + for (count = 0, f = 0; f < 8; f++) { + vendor = read_pci_config_16(secondary, 0, f, PCI_VENDOR_ID); + if (vendor == 0xffff) + continue; + + pcie = early_pci_find_capability(secondary, 0, f, + PCI_CAP_ID_EXP); + if (!pcie) + continue; + + flags = read_pci_config_16(secondary, 0, f, + pcie + PCI_EXP_FLAGS); + pcie_type = (flags & PCI_EXP_FLAGS_TYPE) >> 4; + if ((pcie_type == PCI_EXP_TYPE_UPSTREAM) || + (pcie_type == PCI_EXP_TYPE_PCI_BRIDGE)) + /* Don't reset switch, bridge */ + return; + + class = read_pci_config(secondary, 0, f, PCI_CLASS_REVISION); + if ((class >> 24) == PCI_BASE_CLASS_DISPLAY) + /* Don't reset VGA device */ + return; + + count++; + child[f].pcie_pos = pcie; + child[f].pcie_flags = flags; + } + + if (!count) + return; + + /* alloc space to save config */ + size = sizeof(struct save_config); + save = (struct save_config *)alloc_bootmem(size*count); + if (save == NULL) { + printk(KERN_ERR "reset_pcie: alloc_bootmem failed\n"); + return; + } + + /* save */ + pos = save; + for (f = 0; f < 8; f++) { + if (child[f].pcie_pos) { + child[f].save = pos++; + save_state(secondary, 0, f, &child[f]); + } + } + + do_reset(bus, slot, func); + + /* restore */ + for (f = 0; f < 8; f++) + if (child[f].pcie_pos) + restore_state(secondary, 0, f, &child[f]); + + free_bootmem(__pa(save), size*count); +} + +void __init early_reset_pcie_devices(void) +{ + unsigned bus, slot, func; + + if (!early_pci_allowed() || !reset_devices) + return; + + for (bus = 0; bus < 256; bus++) { + for (slot = 0; slot < 32; slot++) { + for (func = 0; func < 8; func++) { + u16 vendor; + u8 type; + vendor = read_pci_config_16(bus, slot, func, + PCI_VENDOR_ID); + + if (vendor == 0xffff) + continue; + + reset_pcie_device(bus, slot, func); + + if (func == 0) { + type = read_pci_config_byte(bus, slot, + func, + PCI_HEADER_TYPE); + if (!(type & 0x80)) + break; + } + } + } + } +} diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index ab4bf5a..a7a4125 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -852,24 +852,6 @@ pci_power_t pci_choose_state(struct pci_dev *dev, pm_message_t state) EXPORT_SYMBOL(pci_choose_state); -#define PCI_EXP_SAVE_REGS 7 - -#define pcie_cap_has_devctl(type, flags) 1 -#define pcie_cap_has_lnkctl(type, flags) \ - ((flags & PCI_EXP_FLAGS_VERS) > 1 || \ - (type == PCI_EXP_TYPE_ROOT_PORT || \ - type == PCI_EXP_TYPE_ENDPOINT || \ - type == PCI_EXP_TYPE_LEG_END)) -#define pcie_cap_has_sltctl(type, flags) \ - ((flags & PCI_EXP_FLAGS_VERS) > 1 || \ - ((type == PCI_EXP_TYPE_ROOT_PORT) || \ - (type == PCI_EXP_TYPE_DOWNSTREAM && \ - (flags & PCI_EXP_FLAGS_SLOT)))) -#define pcie_cap_has_rtctl(type, flags) \ - ((flags & PCI_EXP_FLAGS_VERS) > 1 || \ - (type == PCI_EXP_TYPE_ROOT_PORT || \ - type == PCI_EXP_TYPE_RC_EC)) - static struct pci_cap_saved_state *pci_find_saved_cap( struct pci_dev *pci_dev, char cap) { diff --git a/include/linux/pci.h b/include/linux/pci.h index 5faa831..8e10401 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1790,5 +1790,23 @@ static inline struct eeh_dev *pci_dev_to_eeh_dev(struct pci_dev *pdev) */ struct pci_dev *pci_find_upstream_pcie_bridge(struct pci_dev *pdev); +#define PCI_EXP_SAVE_REGS 7 + +#define pcie_cap_has_devctl(type, flags) 1 +#define pcie_cap_has_lnkctl(type, flags) \ + ((flags & PCI_EXP_FLAGS_VERS) > 1 || \ + (type == PCI_EXP_TYPE_ROOT_PORT || \ + type == PCI_EXP_TYPE_ENDPOINT || \ + type == PCI_EXP_TYPE_LEG_END)) +#define pcie_cap_has_sltctl(type, flags) \ + ((flags & PCI_EXP_FLAGS_VERS) > 1 || \ + ((type == PCI_EXP_TYPE_ROOT_PORT) || \ + (type == PCI_EXP_TYPE_DOWNSTREAM && \ + (flags & PCI_EXP_FLAGS_SLOT)))) +#define pcie_cap_has_rtctl(type, flags) \ + ((flags & PCI_EXP_FLAGS_VERS) > 1 || \ + (type == PCI_EXP_TYPE_ROOT_PORT || \ + type == PCI_EXP_TYPE_RC_EC)) + #endif /* __KERNEL__ */ #endif /* LINUX_PCI_H */ diff --git a/init/main.c b/init/main.c index b286730..ebaf067 100644 --- a/init/main.c +++ b/init/main.c @@ -144,10 +144,10 @@ EXPORT_SYMBOL(reset_devices); static int __init set_reset_devices(char *str) { reset_devices = 1; - return 1; + return 0; } -__setup("reset_devices", set_reset_devices); +early_param("reset_devices", set_reset_devices); static const char * argv_init[MAX_INIT_ARGS+2] = { "init", NULL, }; const char * envp_init[MAX_INIT_ENVS+2] = { "HOME=/", "TERM=linux", NULL, }; ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 2/2] x86, pci: Enable PCI INTx when MSI is disabled 2012-10-02 7:49 [PATCH v2 0/2] Reset PCIe devices to address DMA problem on kdump with iommu Takao Indoh 2012-10-02 7:50 ` [PATCH v2 1/2] x86, pci: Reset PCIe devices at boot time Takao Indoh @ 2012-10-02 7:50 ` Takao Indoh 2012-10-02 19:37 ` [PATCH v2 0/2] Reset PCIe devices to address DMA problem on kdump with iommu Andi Kleen 2012-10-03 13:23 ` Don Dutile 3 siblings, 0 replies; 10+ messages in thread From: Takao Indoh @ 2012-10-02 7:50 UTC (permalink / raw) To: linux-pci, x86, linux-kernel Cc: martin.wilck, kexec, hbabu, mingo, vgoyal, ishii.hironobu, hpa, bhelgaas, tglx, Takao Indoh This patch enables INTx if MSI is disabled in pcibios_enable_device(). In normal case interrupt disable bit in command register is 0b on boot time, but in case of kdump, this bit may be 1b. It causes problems of some drivers. At leaset I confirmed mptsas driver does not work in such a case. This patch fix this problem. Signed-off-by: Takao Indoh <indou.takao@jp.fujitsu.com> --- arch/x86/pci/common.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c index 720e973..2bb7ecc 100644 --- a/arch/x86/pci/common.c +++ b/arch/x86/pci/common.c @@ -615,8 +615,10 @@ int pcibios_enable_device(struct pci_dev *dev, int mask) if ((err = pci_enable_resources(dev, mask)) < 0) return err; - if (!pci_dev_msi_enabled(dev)) + if (!pci_dev_msi_enabled(dev)) { + pci_intx(dev, true); return pcibios_enable_irq(dev); + } return 0; } ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2 0/2] Reset PCIe devices to address DMA problem on kdump with iommu 2012-10-02 7:49 [PATCH v2 0/2] Reset PCIe devices to address DMA problem on kdump with iommu Takao Indoh 2012-10-02 7:50 ` [PATCH v2 1/2] x86, pci: Reset PCIe devices at boot time Takao Indoh 2012-10-02 7:50 ` [PATCH v2 2/2] x86, pci: Enable PCI INTx when MSI is disabled Takao Indoh @ 2012-10-02 19:37 ` Andi Kleen 2012-10-02 19:44 ` Vivek Goyal 2012-10-03 0:57 ` Takao Indoh 2012-10-03 13:23 ` Don Dutile 3 siblings, 2 replies; 10+ messages in thread From: Andi Kleen @ 2012-10-02 19:37 UTC (permalink / raw) To: Takao Indoh Cc: linux-pci, x86, linux-kernel, martin.wilck, kexec, hbabu, mingo, vgoyal, ishii.hironobu, hpa, bhelgaas, tglx Takao Indoh <indou.takao@jp.fujitsu.com> writes: > These patches reset PCIe devices at boot time to address DMA problem on > kdump with iommu. When "reset_devices" is specified, a hot reset is > triggered on each PCIe root port and downstream port to reset its > downstream endpoint. Great. I've been pondering this for a long time, but you did finally implement it. I hope this will make kdump a lot more reliable at least on the systems that support per port reset. Now the only question is: why make it a option and not default? -Andi -- ak@linux.intel.com -- Speaking for myself only ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 0/2] Reset PCIe devices to address DMA problem on kdump with iommu 2012-10-02 19:37 ` [PATCH v2 0/2] Reset PCIe devices to address DMA problem on kdump with iommu Andi Kleen @ 2012-10-02 19:44 ` Vivek Goyal 2012-10-03 0:57 ` Takao Indoh 1 sibling, 0 replies; 10+ messages in thread From: Vivek Goyal @ 2012-10-02 19:44 UTC (permalink / raw) To: Andi Kleen Cc: Takao Indoh, linux-pci, x86, linux-kernel, martin.wilck, kexec, hbabu, mingo, ishii.hironobu, hpa, bhelgaas, tglx On Tue, Oct 02, 2012 at 12:37:44PM -0700, Andi Kleen wrote: > Takao Indoh <indou.takao@jp.fujitsu.com> writes: > > > These patches reset PCIe devices at boot time to address DMA problem on > > kdump with iommu. When "reset_devices" is specified, a hot reset is > > triggered on each PCIe root port and downstream port to reset its > > downstream endpoint. > > Great. I've been pondering this for a long time, but you did finally > implement it. I hope this will make kdump a lot more reliable at least > on the systems that support per port reset. > > Now the only question is: why make it a option and not default? Do you mean why to hide it behind "reset_devices" kernel parameter? One thing still is concerning is how to avoid double resets. We probably will not want first bus resets and then driver loads and does another reset after seeing "reset_devices". Thanks Vivek ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 0/2] Reset PCIe devices to address DMA problem on kdump with iommu 2012-10-02 19:37 ` [PATCH v2 0/2] Reset PCIe devices to address DMA problem on kdump with iommu Andi Kleen 2012-10-02 19:44 ` Vivek Goyal @ 2012-10-03 0:57 ` Takao Indoh 1 sibling, 0 replies; 10+ messages in thread From: Takao Indoh @ 2012-10-03 0:57 UTC (permalink / raw) To: andi Cc: linux-pci, x86, linux-kernel, martin.wilck, kexec, hbabu, mingo, vgoyal, ishii.hironobu, hpa, bhelgaas, tglx (2012/10/03 4:37), Andi Kleen wrote: > Takao Indoh <indou.takao@jp.fujitsu.com> writes: > >> These patches reset PCIe devices at boot time to address DMA problem on >> kdump with iommu. When "reset_devices" is specified, a hot reset is >> triggered on each PCIe root port and downstream port to reset its >> downstream endpoint. > > Great. I've been pondering this for a long time, but you did finally > implement it. I hope this will make kdump a lot more reliable at least > on the systems that support per port reset. Actually I got the idea of this patch from your comment below;-) http://permalink.gmane.org/gmane.linux.network/162414 > > Now the only question is: why make it a option and not default? Except kdump I don't know the situation where this reset is useful, so I introduced it as option so that it works only during kdump. Thanks, Takao Indoh ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 0/2] Reset PCIe devices to address DMA problem on kdump with iommu 2012-10-02 7:49 [PATCH v2 0/2] Reset PCIe devices to address DMA problem on kdump with iommu Takao Indoh ` (2 preceding siblings ...) 2012-10-02 19:37 ` [PATCH v2 0/2] Reset PCIe devices to address DMA problem on kdump with iommu Andi Kleen @ 2012-10-03 13:23 ` Don Dutile 2012-10-09 9:03 ` Takao Indoh 3 siblings, 1 reply; 10+ messages in thread From: Don Dutile @ 2012-10-03 13:23 UTC (permalink / raw) To: Takao Indoh Cc: linux-pci, x86, linux-kernel, martin.wilck, kexec, hbabu, mingo, vgoyal, ishii.hironobu, hpa, bhelgaas, tglx On 10/02/2012 03:49 AM, Takao Indoh wrote: > These patches reset PCIe devices at boot time to address DMA problem on > kdump with iommu. When "reset_devices" is specified, a hot reset is > triggered on each PCIe root port and downstream port to reset its > downstream endpoint. > > Background: > A kdump problem about DMA has been discussed for a long time. That is, > when a kernel is switched to the kdump kernel DMA derived from first > kernel affects second kernel. Recently this problem surfaces when iommu > is used for PCI passthrough on KVM guest. In the case of the machine I > use, when intel_iommu=on is specified, DMAR error is detected in kdump > kernel and PCI SERR is also detected. Finally kdump fails because some > devices does not work correctly. > > The root cause is that ongoing DMA from first kernel causes DMAR fault > because page table of DMAR is initialized while kdump kernel is booting > up. Therefore to address this problem DMA needs to be stopped before > DMAR is initialized at kdump kernel boot time. By these patches, PCIe > devices are reset by hot reset and its DMA is stopped when reset_devices > is specified. One problem of this solution is that the monitor blacks > out when VGA controller is reset. So this patch does not reset the port > whose child endpoint is VGA device. > > v2: > Reset devices in setup_arch() because reset need to be done before > interrupt remapping is initialized. > > v1: > https://lkml.org/lkml/2012/8/3/160 > > Thanks, > Takao Indoh > > -- > To unsubscribe from this list: send the line "unsubscribe linux-pci" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Maybe you've tried the following, and I missed a thread on it, but instead of a somewhat-large, reset hammer, did any one try just reading all the endpoint-only device CMD register, flip the MasterEnable bit off, and write it back? .... that would stop all DMA (should stop all MSI writes as well since they are just another DMA), and then restart the system? May also have to do PCI INT Disable as well... and note, that's a PCI 2.3 optional feature.... so devices using INT signalling vs MSI is just borked on IOMMU/intr-remapping systems... which I would expect are few. Then again, if this is foolishly done, then reset legacy PCI busses as the fallback. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 0/2] Reset PCIe devices to address DMA problem on kdump with iommu 2012-10-03 13:23 ` Don Dutile @ 2012-10-09 9:03 ` Takao Indoh 2012-10-09 16:05 ` Don Dutile 0 siblings, 1 reply; 10+ messages in thread From: Takao Indoh @ 2012-10-09 9:03 UTC (permalink / raw) To: ddutile Cc: linux-pci, x86, linux-kernel, martin.wilck, kexec, hbabu, mingo, vgoyal, ishii.hironobu, hpa, bhelgaas, tglx (2012/10/03 22:23), Don Dutile wrote: > On 10/02/2012 03:49 AM, Takao Indoh wrote: >> These patches reset PCIe devices at boot time to address DMA problem on >> kdump with iommu. When "reset_devices" is specified, a hot reset is >> triggered on each PCIe root port and downstream port to reset its >> downstream endpoint. >> >> Background: >> A kdump problem about DMA has been discussed for a long time. That is, >> when a kernel is switched to the kdump kernel DMA derived from first >> kernel affects second kernel. Recently this problem surfaces when iommu >> is used for PCI passthrough on KVM guest. In the case of the machine I >> use, when intel_iommu=on is specified, DMAR error is detected in kdump >> kernel and PCI SERR is also detected. Finally kdump fails because some >> devices does not work correctly. >> >> The root cause is that ongoing DMA from first kernel causes DMAR fault >> because page table of DMAR is initialized while kdump kernel is booting >> up. Therefore to address this problem DMA needs to be stopped before >> DMAR is initialized at kdump kernel boot time. By these patches, PCIe >> devices are reset by hot reset and its DMA is stopped when reset_devices >> is specified. One problem of this solution is that the monitor blacks >> out when VGA controller is reset. So this patch does not reset the port >> whose child endpoint is VGA device. >> >> v2: >> Reset devices in setup_arch() because reset need to be done before >> interrupt remapping is initialized. >> >> v1: >> https://lkml.org/lkml/2012/8/3/160 >> >> Thanks, >> Takao Indoh >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-pci" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > Maybe you've tried the following, and I missed a thread on it, > but instead of a somewhat-large, reset hammer, did any one try > just reading all the endpoint-only device CMD register, flip the MasterEnable > bit off, and write it back? .... that would stop all DMA (should stop > all MSI writes as well since they are just another DMA), and then > restart the system? > May also have to do PCI INT Disable as well... and note, that's a PCI 2.3 > optional feature.... so devices using INT signalling vs MSI > is just borked on IOMMU/intr-remapping systems... which I would expect > are few. Then again, if this is foolishly done, then reset legacy PCI > busses as the fallback. Just clearing bus master bit and INTx disable bit in setup_arch() did not solve this problem. I still got DMAR error on devices(for exmaple, igb and megaraid_sas). Clearing bus master in setup_arch() and resetting devices in fixup_final like v1 patch is better, DMAR error was not detected. But on certain machine kdump kernel hung up when resetting devices. It seems to be a problem specific to the platform. And, resetting devices in setup_arch() like v2 patch solves all problems I found so far. Thanks, Takao Indoh ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 0/2] Reset PCIe devices to address DMA problem on kdump with iommu 2012-10-09 9:03 ` Takao Indoh @ 2012-10-09 16:05 ` Don Dutile 2012-10-10 4:22 ` Takao Indoh 0 siblings, 1 reply; 10+ messages in thread From: Don Dutile @ 2012-10-09 16:05 UTC (permalink / raw) To: Takao Indoh Cc: linux-pci, x86, linux-kernel, martin.wilck, kexec, hbabu, mingo, vgoyal, ishii.hironobu, hpa, bhelgaas, tglx On 10/09/2012 05:03 AM, Takao Indoh wrote: > (2012/10/03 22:23), Don Dutile wrote: >> On 10/02/2012 03:49 AM, Takao Indoh wrote: >>> These patches reset PCIe devices at boot time to address DMA problem on >>> kdump with iommu. When "reset_devices" is specified, a hot reset is >>> triggered on each PCIe root port and downstream port to reset its >>> downstream endpoint. >>> >>> Background: >>> A kdump problem about DMA has been discussed for a long time. That is, >>> when a kernel is switched to the kdump kernel DMA derived from first >>> kernel affects second kernel. Recently this problem surfaces when iommu >>> is used for PCI passthrough on KVM guest. In the case of the machine I >>> use, when intel_iommu=on is specified, DMAR error is detected in kdump >>> kernel and PCI SERR is also detected. Finally kdump fails because some >>> devices does not work correctly. >>> >>> The root cause is that ongoing DMA from first kernel causes DMAR fault >>> because page table of DMAR is initialized while kdump kernel is booting >>> up. Therefore to address this problem DMA needs to be stopped before >>> DMAR is initialized at kdump kernel boot time. By these patches, PCIe >>> devices are reset by hot reset and its DMA is stopped when reset_devices >>> is specified. One problem of this solution is that the monitor blacks >>> out when VGA controller is reset. So this patch does not reset the port >>> whose child endpoint is VGA device. >>> >>> v2: >>> Reset devices in setup_arch() because reset need to be done before >>> interrupt remapping is initialized. >>> >>> v1: >>> https://lkml.org/lkml/2012/8/3/160 >>> >>> Thanks, >>> Takao Indoh >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> Maybe you've tried the following, and I missed a thread on it, >> but instead of a somewhat-large, reset hammer, did any one try >> just reading all the endpoint-only device CMD register, flip the MasterEnable >> bit off, and write it back? .... that would stop all DMA (should stop >> all MSI writes as well since they are just another DMA), and then >> restart the system? >> May also have to do PCI INT Disable as well... and note, that's a PCI 2.3 >> optional feature.... so devices using INT signalling vs MSI >> is just borked on IOMMU/intr-remapping systems... which I would expect >> are few. Then again, if this is foolishly done, then reset legacy PCI >> busses as the fallback. > > Just clearing bus master bit and INTx disable bit in setup_arch() did > not solve this problem. I still got DMAR error on devices(for exmaple, > igb and megaraid_sas). > > Clearing bus master in setup_arch() and resetting devices in fixup_final > like v1 patch is better, DMAR error was not detected. But on certain > machine kdump kernel hung up when resetting devices. It seems to be a > problem specific to the platform. > > And, resetting devices in setup_arch() like v2 patch solves all problems > I found so far. > > Thanks, > Takao Indoh > this summary should be in the patch set, so others know/learn what was attempted, what failed, and how you reached this working conclusion. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 0/2] Reset PCIe devices to address DMA problem on kdump with iommu 2012-10-09 16:05 ` Don Dutile @ 2012-10-10 4:22 ` Takao Indoh 0 siblings, 0 replies; 10+ messages in thread From: Takao Indoh @ 2012-10-10 4:22 UTC (permalink / raw) To: ddutile Cc: linux-pci, x86, linux-kernel, martin.wilck, kexec, hbabu, mingo, vgoyal, ishii.hironobu, hpa, bhelgaas, tglx (2012/10/10 1:05), Don Dutile wrote: > On 10/09/2012 05:03 AM, Takao Indoh wrote: >> (2012/10/03 22:23), Don Dutile wrote: >>> On 10/02/2012 03:49 AM, Takao Indoh wrote: >>>> These patches reset PCIe devices at boot time to address DMA problem on >>>> kdump with iommu. When "reset_devices" is specified, a hot reset is >>>> triggered on each PCIe root port and downstream port to reset its >>>> downstream endpoint. >>>> >>>> Background: >>>> A kdump problem about DMA has been discussed for a long time. That is, >>>> when a kernel is switched to the kdump kernel DMA derived from first >>>> kernel affects second kernel. Recently this problem surfaces when iommu >>>> is used for PCI passthrough on KVM guest. In the case of the machine I >>>> use, when intel_iommu=on is specified, DMAR error is detected in kdump >>>> kernel and PCI SERR is also detected. Finally kdump fails because some >>>> devices does not work correctly. >>>> >>>> The root cause is that ongoing DMA from first kernel causes DMAR fault >>>> because page table of DMAR is initialized while kdump kernel is booting >>>> up. Therefore to address this problem DMA needs to be stopped before >>>> DMAR is initialized at kdump kernel boot time. By these patches, PCIe >>>> devices are reset by hot reset and its DMA is stopped when reset_devices >>>> is specified. One problem of this solution is that the monitor blacks >>>> out when VGA controller is reset. So this patch does not reset the port >>>> whose child endpoint is VGA device. >>>> >>>> v2: >>>> Reset devices in setup_arch() because reset need to be done before >>>> interrupt remapping is initialized. >>>> >>>> v1: >>>> https://lkml.org/lkml/2012/8/3/160 >>>> >>>> Thanks, >>>> Takao Indoh >>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> >>> Maybe you've tried the following, and I missed a thread on it, >>> but instead of a somewhat-large, reset hammer, did any one try >>> just reading all the endpoint-only device CMD register, flip the MasterEnable >>> bit off, and write it back? .... that would stop all DMA (should stop >>> all MSI writes as well since they are just another DMA), and then >>> restart the system? >>> May also have to do PCI INT Disable as well... and note, that's a PCI 2.3 >>> optional feature.... so devices using INT signalling vs MSI >>> is just borked on IOMMU/intr-remapping systems... which I would expect >>> are few. Then again, if this is foolishly done, then reset legacy PCI >>> busses as the fallback. >> >> Just clearing bus master bit and INTx disable bit in setup_arch() did >> not solve this problem. I still got DMAR error on devices(for exmaple, >> igb and megaraid_sas). >> >> Clearing bus master in setup_arch() and resetting devices in fixup_final >> like v1 patch is better, DMAR error was not detected. But on certain >> machine kdump kernel hung up when resetting devices. It seems to be a >> problem specific to the platform. >> >> And, resetting devices in setup_arch() like v2 patch solves all problems >> I found so far. >> >> Thanks, >> Takao Indoh >> > this summary should be in the patch set, so others know/learn > what was attempted, what failed, and how you reached this working conclusion. Ok, I'll post new patch with this information. Thanks, Takao Indoh ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2012-10-10 4:22 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-10-02 7:49 [PATCH v2 0/2] Reset PCIe devices to address DMA problem on kdump with iommu Takao Indoh 2012-10-02 7:50 ` [PATCH v2 1/2] x86, pci: Reset PCIe devices at boot time Takao Indoh 2012-10-02 7:50 ` [PATCH v2 2/2] x86, pci: Enable PCI INTx when MSI is disabled Takao Indoh 2012-10-02 19:37 ` [PATCH v2 0/2] Reset PCIe devices to address DMA problem on kdump with iommu Andi Kleen 2012-10-02 19:44 ` Vivek Goyal 2012-10-03 0:57 ` Takao Indoh 2012-10-03 13:23 ` Don Dutile 2012-10-09 9:03 ` Takao Indoh 2012-10-09 16:05 ` Don Dutile 2012-10-10 4:22 ` Takao Indoh
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).