All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nathan Lynch <nathanl@linux.ibm.com>
To: Sam Bobroff <sbobroff@linux.ibm.com>
Cc: aik@ozlabs.ru, linuxppc-dev@lists.ozlabs.org, oohall@gmail.com,
	tyreld@linux.vnet.ibm.com
Subject: Re: [PATCH v5 05/12] powerpc/eeh: EEH for pSeries hot plug
Date: Thu, 19 Sep 2019 15:28:40 -0500	[thread overview]
Message-ID: <871rwcqbd3.fsf@linux.ibm.com> (raw)
In-Reply-To: <72ae8ae9c54097158894a52de23690448de38ea9.1565930772.git.sbobroff@linux.ibm.com>

Hello Sam,

Sam Bobroff <sbobroff@linux.ibm.com> writes:
> On PowerNV and pSeries, devices currently acquire EEH support from
> several different places: Boot-time devices from eeh_probe_devices()
> and eeh_addr_cache_build(), Virtual Function devices from the pcibios
> bus add device hooks and hot plugged devices from pci_hp_add_devices()
> (with other platforms using other methods as well).  Unfortunately,
> pSeries machines currently discover hot plugged devices using
> pci_rescan_bus(), not pci_hp_add_devices(), and so those devices do
> not receive EEH support.
>
> Rather than adding another case for pci_rescan_bus(), this change
> widens the scope of the pcibios bus add device hooks so that they can
> handle all devices. As a side effect this also supports devices
> discovered after manually rescanning via /sys/bus/pci/rescan.
>
> Note that on PowerNV, this change allows the EEH subsystem to become
> enabled after boot as long as it has not been forced off, which was
> not previously possible (it was already possible on pSeries).

With this change, I get a crash (use after free by the looks of it) when
I remove and then add a pci device in qemu:

$ qemu-system-ppc64 -M pseries -append 'debug console=hvc0' \
  -nographic -vga none -m 1G,slots=32,maxmem=1024G -smp 2 \
  -kernel vmlinux -initrd ~/b/br/ppc64le-initramfs/images/rootfs.cpio \
  -nic model=e1000

...

# echo 1 > /sys/devices/pci0000:00/0000:00:00.0/remove ; \
  echo 1 > /sys/devices/pci0000:00/pci_bus/0000:00/rescan

pci 0000:00:00.0: Removing from iommu group 0
pci 0000:00:00.0: [8086:100e] type 00 class 0x020000
pci 0000:00:00.0: reg 0x10: [mem 0x200080000000-0x20008001ffff]
pci 0000:00:00.0: reg 0x14: [io  0x10040-0x1007f]
pci 0000:00:00.0: reg 0x30: [mem 0x200080040000-0x20008007ffff pref]
pci 0000:00:00.0: Adding to iommu group 0
pci 0000:00:00.0: BAR 6: assigned [mem 0x200080000000-0x20008003ffff pref]
pci 0000:00:00.0: BAR 0: assigned [mem 0x200080040000-0x20008005ffff]
pci 0000:00:00.0: BAR 1: assigned [io  0x10000-0x1003f]
e1000 0000:00:00.0 eth0: (PCI:33MHz:32-bit) 52:54:00:12:34:56
e1000 0000:00:00.0 eth0: Intel(R) PRO/1000 Network Connection
pci 0000:00:00.0: Removing from iommu group 0
pci 0000:00:00.0: [8086:100e] type 00 class 0x020000
pci 0000:00:00.0: reg 0x10: [mem 0x200080040000-0x20008005ffff]
pci 0000:00:00.0: reg 0x14: [io  0x10000-0x1003f]
pci 0000:00:00.0: reg 0x30: [mem 0x200080040000-0x20008007ffff pref]
pci 0000:00:00.0: BAR 6: assigned [mem 0x200080000000-0x20008003ffff pref]
pci 0000:00:00.0: BAR 0: assigned [mem 0x200080040000-0x20008005ffff]
pci 0000:00:00.0: BAR 1: assigned [io  0x10000-0x1003f]
BUG: Unable to handle kernel data access at 0x6b6b6b6b6b6b6bfb
Faulting instruction address: 0xc000000000597270
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in:
CPU: 0 PID: 2464 Comm: pci-probe-vs-cp Not tainted 5.3.0-rc2-00092-gf381d5711f09 #76
NIP:  c000000000597270 LR: c000000000599470 CTR: c0000000002030b0
REGS: c00000003ee4f650 TRAP: 0380   Not tainted  (5.3.0-rc2-00092-gf381d5711f09)
MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 24002442  XER: 00000000
CFAR: c00000000059946c IRQMASK: 0 
GPR00: c000000000599470 c00000003ee4f8e0 c000000003317a00 6b6b6b6b6b6b6b6b 
GPR04: c000000001d0fa38 0000000000000000 0000000000000000 221a64979a66f870 
GPR08: c00000000347b398 0000000000000000 c00000000336e070 ffffffffffffffff 
GPR12: 0000000000002000 c000000004060000 0000000000000000 0000000000000000 
GPR16: 00000000100a78d8 00007fffe9fdff96 00000000100a7898 0000000000000000 
GPR20: 0000000000000000 00000000100e0ff0 0000000000000000 00000000100e0fe8 
GPR24: 0000000000000000 000001002ae50260 c000000001d0fa38 6b6b6b6b6b6b6b6b 
GPR28: fffffffffffffff2 c000000001d0fa38 0000000000000000 c000000003118c18 
NIP [c000000000597270] kernfs_find_ns+0x50/0x3d0
LR [c000000000599470] kernfs_remove_by_name_ns+0x60/0xe0
Call Trace:
[c00000003ee4f8e0] [c00000000020950c] lockdep_hardirqs_on+0x10c/0x210 (unreliable)
[c00000003ee4f970] [c000000000599470] kernfs_remove_by_name_ns+0x60/0xe0
[c00000003ee4fa00] [c00000000059ca08] sysfs_remove_file_ns+0x28/0x40
[c00000003ee4fa20] [c000000000cbd70c] device_remove_file+0x2c/0x40
[c00000003ee4fa40] [c000000000051480] eeh_sysfs_remove_device+0x50/0xf0
[c00000003ee4fa80] [c00000000004a594] eeh_add_device_late.part.7+0x84/0x220
[c00000003ee4fb00] [c0000000000e94f0] pseries_pcibios_bus_add_device+0x60/0xb0
[c00000003ee4fb70] [c00000000006fc40] pcibios_bus_add_device+0x40/0x60
[c00000003ee4fb90] [c000000000bc5220] pci_bus_add_device+0x30/0x100
[c00000003ee4fc00] [c000000000bc5344] pci_bus_add_devices+0x54/0xb0
[c00000003ee4fc40] [c000000000bca058] pci_rescan_bus+0x48/0x70
[c00000003ee4fc70] [c000000000bd9adc] dev_bus_rescan_store+0xcc/0x100
[c00000003ee4fcb0] [c000000000cbc9d8] dev_attr_store+0x38/0x60
[c00000003ee4fcd0] [c00000000059c460] sysfs_kf_write+0x70/0xb0
[c00000003ee4fd10] [c00000000059aa98] kernfs_fop_write+0xf8/0x280
[c00000003ee4fd60] [c0000000004b3e5c] __vfs_write+0x3c/0x70
[c00000003ee4fd80] [c0000000004b81f0] vfs_write+0xd0/0x220
[c00000003ee4fdd0] [c0000000004b85ac] ksys_write+0x7c/0x140
[c00000003ee4fe20] [c00000000000bc6c] system_call+0x5c/0x70

FWIW during boot the EEH core reports:

  EEH: No capable adapters found: recovery disabled.

> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> index ca8b0c58a6a7..87edac6f2fd9 100644
> --- a/arch/powerpc/kernel/eeh.c
> +++ b/arch/powerpc/kernel/eeh.c
> @@ -1272,7 +1272,7 @@ void eeh_add_device_late(struct pci_dev *dev)
>  	struct pci_dn *pdn;
>  	struct eeh_dev *edev;
>  
> -	if (!dev || !eeh_enabled())
> +	if (!dev)
>  		return;
>  
>  	pr_debug("EEH: Adding device %s\n", pci_name(dev));

Reverting this hunk works around (fixes?) it.

  parent reply	other threads:[~2019-09-19 20:31 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-16  4:48 [PATCH v5 00/12] Sam Bobroff
2019-08-16  4:48 ` [PATCH v5 01/12] powerpc/64: Adjust order in pcibios_init() Sam Bobroff
2019-08-28  4:24   ` Michael Ellerman
2019-08-16  4:48 ` [PATCH v5 02/12] powerpc/eeh: Clear stale EEH_DEV_NO_HANDLER flag Sam Bobroff
2019-08-16  4:48 ` [PATCH v5 03/12] powerpc/eeh: Improve debug messages around device addition Sam Bobroff
2019-08-16  4:48 ` [PATCH v5 04/12] powerpc/eeh: Initialize EEH address cache earlier Sam Bobroff
2019-08-16  4:48 ` [PATCH v5 05/12] powerpc/eeh: EEH for pSeries hot plug Sam Bobroff
2019-08-21  3:28   ` Michael Ellerman
2019-08-22  6:17     ` [PATCH] powerpc/eeh: Fixup EEH for pSeries hotplug Sam Bobroff
2019-09-19 20:28   ` Nathan Lynch [this message]
2019-09-19 23:27     ` [PATCH v5 05/12] powerpc/eeh: EEH for pSeries hot plug Oliver O'Halloran
2019-09-19 23:44       ` Nathan Lynch
2019-09-23  5:00     ` Sam Bobroff
2019-09-23 18:01       ` Nathan Lynch
2019-08-16  4:48 ` [PATCH v5 06/12] powerpc/eeh: Refactor around eeh_probe_devices() Sam Bobroff
2019-08-16  4:48 ` [PATCH v5 07/12] powerpc/eeh: Add bdfn field to eeh_dev Sam Bobroff
2019-08-16  4:48 ` [PATCH v5 08/12] powerpc/eeh: Introduce EEH edev logging macros Sam Bobroff
2019-08-16  4:48 ` [PATCH v5 09/12] powerpc/eeh: Convert log messages to eeh_edev_* macros Sam Bobroff
2019-08-16  4:48 ` [PATCH v5 10/12] powerpc/eeh: Fix crash when edev->pdev changes Sam Bobroff
2019-08-16  4:48 ` [PATCH v5 11/12] powerpc/eeh: Remove unused return path from eeh_pe_dev_traverse() Sam Bobroff
2019-08-16  4:48 ` [PATCH v5 12/12] powerpc/eeh: Slightly simplify eeh_add_to_parent_pe() Sam Bobroff

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=871rwcqbd3.fsf@linux.ibm.com \
    --to=nathanl@linux.ibm.com \
    --cc=aik@ozlabs.ru \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=oohall@gmail.com \
    --cc=sbobroff@linux.ibm.com \
    --cc=tyreld@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.