linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Sam Bobroff <sbobroff@linux.ibm.com>
To: Michael Ellerman <mpe@ellerman.id.au>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>,
	Alistair Popple <alistair@popple.id.au>,
	linuxppc-dev@lists.ozlabs.org,
	Piotr Jaroszynski <pjaroszynski@nvidia.com>,
	Oliver O'Halloran <oohall@gmail.com>,
	Reza Arbab <arbab@linux.ibm.com>
Subject: Re: [PATCH kernel] powerpc/powernv/eeh/npu: Fix uninitialized variables in opal_pci_eeh_freeze_status
Date: Tue, 20 Nov 2018 14:51:24 +1100	[thread overview]
Message-ID: <20181120035123.GB9175@tungsten.ozlabs.ibm.com> (raw)
In-Reply-To: <877eh831id.fsf@concordia.ellerman.id.au>

[-- Attachment #1: Type: text/plain, Size: 7692 bytes --]

On Tue, Nov 20, 2018 at 01:51:06PM +1100, Michael Ellerman wrote:
> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
> 
> > The current implementation of the OPAL_PCI_EEH_FREEZE_STATUS call in
> > skiboot's NPU driver does not touch the pci_error_type parameter so
> > it might have garbage but the powernv code analyzes it nevertheless.
> >
> > This initializes pcierr and fstate to zero in all call sites.
> >
> > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> > ---
> 
> Can we tag this with a Fixes? And seems like it should probably go to
> stable, or can we not trigger this path on older kernels?
> 
> cheers

Hmm, it's triggered by use on an NPU PE so that would be any kernel that
can run on P8 or later (AFAIK).

It looks like the issue was present earlier, but the code was last
touched when it was moved, in...

40ae5f693f6a ("powerpc/powernv: Drop PHB operation get_state()")

... which was back in v4.1.

Sam.

> > Without this, this happens:
> >
> > pnv_eeh_get_phb_diag: Failure -7 getting PHB#6 diag-data
> > EEH: PHB#6 failure detected, location: N/A
> > CPU: 23 PID: 5939 Comm: qemu-system-ppc Not tainted 4.19.0-le_f5a7bb7_aikATfstn1-p1 torvalds#106
> > Call Trace:
> > [c000003fea9df9c0] [c000000000a990ec] dump_stack+0xb0/0xf4 (unreliable)
> > [c000003fea9dfa00] [c000000000038480] eeh_dev_check_failure+0x1f0/0x5f0
> > [c000003fea9dfaa0] [c0000000000a2768] pnv_pci_read_config+0x128/0x160
> > [c000003fea9dfae0] [c0000000005d2b0c] pci_bus_read_config_dword+0x9c/0xf0
> > [c000003fea9dfb40] [c0000000005df3d4] pci_save_state+0x64/0x250
> > [c000003fea9dfbc0] [c0000000005e0730] pci_dev_save_and_disable+0x70/0xa0
> > [c000003fea9dfbf0] [c0000000005e4078] pci_try_reset_function+0x48/0xc0
> > [c000003fea9dfc20] [c00800001cbc1b1c] vfio_pci_ioctl+0x334/0xea0 [vfio_pci]
> > [c000003fea9dfcf0] [c00800001ca9046c] vfio_device_fops_unl_ioctl+0x44/0x70 [vfio]
> > [c000003fea9dfd10] [c00000000039fd84] do_vfs_ioctl+0xd4/0xa00
> > [c000003fea9dfdb0] [c0000000003a07b4] ksys_ioctl+0x104/0x120
> > [c000003fea9dfe00] [c0000000003a07f8] sys_ioctl+0x28/0x80
> > [c000003fea9dfe20] [c00000000000b3a4] system_call+0x5c/0x70
> > EEH: Detected error on PHB#6
> > EEH: This PCI device has failed 1 times in the last hour and will be permanently disabled after 5 fail
> > ures.
> > EEH: Notify device drivers to shutdown
> > EEH: Beginning: 'error_detected(IO frozen)'
> > EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0)
> > EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0)
> > EEH: PE#c (PCI 0006:00:01.0): Invoking vfio-pci->error_detected(IO frozen)
> > EEH: PE#c (PCI 0006:00:01.0): vfio-pci driver reports: 'can recover'
> > EEH: PE#c (PCI 0006:00:01.1): Invoking vfio-pci->error_detected(IO frozen)
> > EEH: PE#c (PCI 0006:00:01.1): vfio-pci driver reports: 'can recover'
> > EEH: PE#b (PCI 0006:00:02.0): Invoking vfio-pci->error_detected(IO frozen)
> > EEH: PE#b (PCI 0006:00:02.0): vfio-pci driver reports: 'can recover'
> > EEH: PE#b (PCI 0006:00:02.1): Invoking vfio-pci->error_detected(IO frozen)
> > EEH: PE#b (PCI 0006:00:02.1): vfio-pci driver reports: 'can recover'
> > EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'can recover'
> > EEH: Collect temporary log
> > pnv_pci_dump_phb_diag_data: Unrecognized ioType 0
> > EEH: Reset without hotplug activity
> > iommu: Removing device 0006:00:01.0 from group 4
> > iommu: Removing device 0006:00:01.1 from group 4
> > iommu: Removing device 0006:00:02.0 from group 4
> > iommu: Removing device 0006:00:02.1 from group 4
> > pnv_ioda_freeze_pe: Failure -7 freezing PHB#6-PE#0
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x0 (-7)
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x1 (-7)
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x8 (-7)
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x9 (-7)
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x10 (-7)
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x11 (-7)
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x0 (-7)
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x1 (-7)
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x8 (-7)
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x9 (-7)
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x10 (-7)
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x11 (-7)
> > EEH: Sleep 5s ahead of partial hotplug
> > pci 0004:04     : [PE# 00] Setting up window#0 0..3fffffff pg=1000
> > pci 0004:05     : [PE# 18] Setting up window#0 0..3fffffff pg=1000
> > pci 0004:06     : [PE# 30] Setting up window#0 0..3fffffff pg=1000
> > pci 0006:00:00.0: [PE# 0d] Setting up window 0..3fffffff pg=1000
> > pci 0006:00:01.0: [PE# 0c] Setting up window 0..3fffffff pg=1000
> > pci 0006:00:02.0: [PE# 0b] Setting up window 0..3fffffff pg=1000
> > EEH: Beginning: 'slot_reset'
> > EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0)
> > EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0)
> > EEH: Finished:'slot_reset' with aggregate recovery state:'none'
> > EEH: Notify device driver to resume
> > EEH: Beginning: 'resume'
> > EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0)
> > EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0)
> > EEH: Finished:'resume'
> > EEH: Recovery successful.
> > ---
> >  arch/powerpc/platforms/powernv/eeh-powernv.c | 8 ++++----
> >  arch/powerpc/platforms/powernv/pci-ioda.c    | 4 ++--
> >  arch/powerpc/platforms/powernv/pci.c         | 4 ++--
> >  3 files changed, 8 insertions(+), 8 deletions(-)
> >
> > diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
> > index abc0be7..f380789 100644
> > --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
> > +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
> > @@ -564,8 +564,8 @@ static void pnv_eeh_get_phb_diag(struct eeh_pe *pe)
> >  static int pnv_eeh_get_phb_state(struct eeh_pe *pe)
> >  {
> >  	struct pnv_phb *phb = pe->phb->private_data;
> > -	u8 fstate;
> > -	__be16 pcierr;
> > +	u8 fstate = 0;
> > +	__be16 pcierr = 0;
> >  	s64 rc;
> >  	int result = 0;
> >  
> > @@ -603,8 +603,8 @@ static int pnv_eeh_get_phb_state(struct eeh_pe *pe)
> >  static int pnv_eeh_get_pe_state(struct eeh_pe *pe)
> >  {
> >  	struct pnv_phb *phb = pe->phb->private_data;
> > -	u8 fstate;
> > -	__be16 pcierr;
> > +	u8 fstate = 0;
> > +	__be16 pcierr = 0;
> >  	s64 rc;
> >  	int result;
> >  
> > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> > index dd80744..72b5cc0 100644
> > --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> > @@ -604,8 +604,8 @@ static int pnv_ioda_unfreeze_pe(struct pnv_phb *phb, int pe_no, int opt)
> >  static int pnv_ioda_get_pe_state(struct pnv_phb *phb, int pe_no)
> >  {
> >  	struct pnv_ioda_pe *slave, *pe;
> > -	u8 fstate, state;
> > -	__be16 pcierr;
> > +	u8 fstate = 0, state;
> > +	__be16 pcierr = 0;
> >  	s64 rc;
> >  
> >  	/* Sanity check on PE number */
> > diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
> > index 13aef23..db230a35 100644
> > --- a/arch/powerpc/platforms/powernv/pci.c
> > +++ b/arch/powerpc/platforms/powernv/pci.c
> > @@ -602,8 +602,8 @@ static void pnv_pci_handle_eeh_config(struct pnv_phb *phb, u32 pe_no)
> >  static void pnv_pci_config_check_eeh(struct pci_dn *pdn)
> >  {
> >  	struct pnv_phb *phb = pdn->phb->private_data;
> > -	u8	fstate;
> > -	__be16	pcierr;
> > +	u8	fstate = 0;
> > +	__be16	pcierr = 0;
> >  	unsigned int pe_no;
> >  	s64	rc;
> >  
> > -- 
> > 2.17.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2018-11-20  3:54 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-19  4:25 [PATCH kernel] powerpc/powernv/eeh/npu: Fix uninitialized variables in opal_pci_eeh_freeze_status Alexey Kardashevskiy
2018-11-19 23:28 ` Sam Bobroff
2018-11-20  2:51 ` Michael Ellerman
2018-11-20  3:51   ` Sam Bobroff [this message]
2018-11-20  6:55     ` Alexey Kardashevskiy
2018-11-20  6:59       ` Russell Currey
2018-11-20 10:58         ` Michael Ellerman
2018-12-23 13:28 ` [kernel] " Michael Ellerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181120035123.GB9175@tungsten.ozlabs.ibm.com \
    --to=sbobroff@linux.ibm.com \
    --cc=aik@ozlabs.ru \
    --cc=alistair@popple.id.au \
    --cc=arbab@linux.ibm.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=oohall@gmail.com \
    --cc=pjaroszynski@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).