linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Russell Currey <ruscur@russell.cc>
To: Alexey Kardashevskiy <aik@ozlabs.ru>,
	Sam Bobroff <sbobroff@linux.ibm.com>,
	 Michael Ellerman <mpe@ellerman.id.au>
Cc: Alistair Popple <alistair@popple.id.au>,
	linuxppc-dev@lists.ozlabs.org,
	Piotr Jaroszynski <pjaroszynski@nvidia.com>,
	Oliver O'Halloran <oohall@gmail.com>,
	Reza Arbab <arbab@linux.ibm.com>
Subject: Re: [PATCH kernel] powerpc/powernv/eeh/npu: Fix uninitialized variables in opal_pci_eeh_freeze_status
Date: Tue, 20 Nov 2018 17:59:02 +1100	[thread overview]
Message-ID: <54e863e23af0fbc5f83b663fe7d6c95f86024c3c.camel@russell.cc> (raw)
In-Reply-To: <1c7660de-68fd-60b7-087d-b013479154e0@ozlabs.ru>

On Tue, 2018-11-20 at 17:55 +1100, Alexey Kardashevskiy wrote:
> 
> On 20/11/2018 14:51, Sam Bobroff wrote:
> > On Tue, Nov 20, 2018 at 01:51:06PM +1100, Michael Ellerman wrote:
> > > Alexey Kardashevskiy <aik@ozlabs.ru> writes:
> > > 
> > > > The current implementation of the OPAL_PCI_EEH_FREEZE_STATUS
> > > > call in
> > > > skiboot's NPU driver does not touch the pci_error_type
> > > > parameter so
> > > > it might have garbage but the powernv code analyzes it
> > > > nevertheless.
> > > > 
> > > > This initializes pcierr and fstate to zero in all call sites.
> > > > 
> > > > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> > > > ---
> > > 
> > > Can we tag this with a Fixes? And seems like it should probably
> > > go to
> > > stable, or can we not trigger this path on older kernels?
> > > 
> > > cheers
> > 
> > Hmm, it's triggered by use on an NPU PE so that would be any kernel
> > that
> > can run on P8 or later (AFAIK).
> > 
> > It looks like the issue was present earlier, but the code was last
> > touched when it was moved, in...
> > 
> > 40ae5f693f6a ("powerpc/powernv: Drop PHB operation get_state()")
> > 
> > ... which was back in v4.1.
> 
> The original commit is
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8c41a7f3f7593fe57
> 
> 2013-06-20 13:21:09 +0800
> ===
> powerpc/eeh: I/O chip EEH state retrieval
> 
> The patch adds I/O chip backend to retrieve the state for the
> indicated PE. While the PE state is temperarily unavailable,
> the upper layer (powernv platform) should return default delay
> (1 second).
> ===
> 
> This was 3.10-rc5 (this is what that sha1's Makefile has).
> 
> But this was not the issue till skiboot decided not to zero these
> parameters so "Fixes:" should point to what?

It should still be that commit, it's perfectly reasonable (however
unlikely) that someone could be running a 3.10 kernel with a modern
skiboot.

> 
> 
> > Sam.
> > 
> > > > Without this, this happens:
> > > > 
> > > > pnv_eeh_get_phb_diag: Failure -7 getting PHB#6 diag-data
> > > > EEH: PHB#6 failure detected, location: N/A
> > > > CPU: 23 PID: 5939 Comm: qemu-system-ppc Not tainted 4.19.0-
> > > > le_f5a7bb7_aikATfstn1-p1 torvalds#106
> > > > Call Trace:
> > > > [c000003fea9df9c0] [c000000000a990ec] dump_stack+0xb0/0xf4
> > > > (unreliable)
> > > > [c000003fea9dfa00] [c000000000038480]
> > > > eeh_dev_check_failure+0x1f0/0x5f0
> > > > [c000003fea9dfaa0] [c0000000000a2768]
> > > > pnv_pci_read_config+0x128/0x160
> > > > [c000003fea9dfae0] [c0000000005d2b0c]
> > > > pci_bus_read_config_dword+0x9c/0xf0
> > > > [c000003fea9dfb40] [c0000000005df3d4] pci_save_state+0x64/0x250
> > > > [c000003fea9dfbc0] [c0000000005e0730]
> > > > pci_dev_save_and_disable+0x70/0xa0
> > > > [c000003fea9dfbf0] [c0000000005e4078]
> > > > pci_try_reset_function+0x48/0xc0
> > > > [c000003fea9dfc20] [c00800001cbc1b1c]
> > > > vfio_pci_ioctl+0x334/0xea0 [vfio_pci]
> > > > [c000003fea9dfcf0] [c00800001ca9046c]
> > > > vfio_device_fops_unl_ioctl+0x44/0x70 [vfio]
> > > > [c000003fea9dfd10] [c00000000039fd84] do_vfs_ioctl+0xd4/0xa00
> > > > [c000003fea9dfdb0] [c0000000003a07b4] ksys_ioctl+0x104/0x120
> > > > [c000003fea9dfe00] [c0000000003a07f8] sys_ioctl+0x28/0x80
> > > > [c000003fea9dfe20] [c00000000000b3a4] system_call+0x5c/0x70
> > > > EEH: Detected error on PHB#6
> > > > EEH: This PCI device has failed 1 times in the last hour and
> > > > will be permanently disabled after 5 fail
> > > > ures.
> > > > EEH: Notify device drivers to shutdown
> > > > EEH: Beginning: 'error_detected(IO frozen)'
> > > > EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0)
> > > > EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0)
> > > > EEH: PE#c (PCI 0006:00:01.0): Invoking vfio-pci-
> > > > >error_detected(IO frozen)
> > > > EEH: PE#c (PCI 0006:00:01.0): vfio-pci driver reports: 'can
> > > > recover'
> > > > EEH: PE#c (PCI 0006:00:01.1): Invoking vfio-pci-
> > > > >error_detected(IO frozen)
> > > > EEH: PE#c (PCI 0006:00:01.1): vfio-pci driver reports: 'can
> > > > recover'
> > > > EEH: PE#b (PCI 0006:00:02.0): Invoking vfio-pci-
> > > > >error_detected(IO frozen)
> > > > EEH: PE#b (PCI 0006:00:02.0): vfio-pci driver reports: 'can
> > > > recover'
> > > > EEH: PE#b (PCI 0006:00:02.1): Invoking vfio-pci-
> > > > >error_detected(IO frozen)
> > > > EEH: PE#b (PCI 0006:00:02.1): vfio-pci driver reports: 'can
> > > > recover'
> > > > EEH: Finished:'error_detected(IO frozen)' with aggregate
> > > > recovery state:'can recover'
> > > > EEH: Collect temporary log
> > > > pnv_pci_dump_phb_diag_data: Unrecognized ioType 0
> > > > EEH: Reset without hotplug activity
> > > > iommu: Removing device 0006:00:01.0 from group 4
> > > > iommu: Removing device 0006:00:01.1 from group 4
> > > > iommu: Removing device 0006:00:02.0 from group 4
> > > > iommu: Removing device 0006:00:02.1 from group 4
> > > > pnv_ioda_freeze_pe: Failure -7 freezing PHB#6-PE#0
> > > > pnv_eeh_restore_config: Can't reinit PCI dev 0x0 (-7)
> > > > pnv_eeh_restore_config: Can't reinit PCI dev 0x1 (-7)
> > > > pnv_eeh_restore_config: Can't reinit PCI dev 0x8 (-7)
> > > > pnv_eeh_restore_config: Can't reinit PCI dev 0x9 (-7)
> > > > pnv_eeh_restore_config: Can't reinit PCI dev 0x10 (-7)
> > > > pnv_eeh_restore_config: Can't reinit PCI dev 0x11 (-7)
> > > > pnv_eeh_restore_config: Can't reinit PCI dev 0x0 (-7)
> > > > pnv_eeh_restore_config: Can't reinit PCI dev 0x1 (-7)
> > > > pnv_eeh_restore_config: Can't reinit PCI dev 0x8 (-7)
> > > > pnv_eeh_restore_config: Can't reinit PCI dev 0x9 (-7)
> > > > pnv_eeh_restore_config: Can't reinit PCI dev 0x10 (-7)
> > > > pnv_eeh_restore_config: Can't reinit PCI dev 0x11 (-7)
> > > > EEH: Sleep 5s ahead of partial hotplug
> > > > pci 0004:04     : [PE# 00] Setting up window#0 0..3fffffff
> > > > pg=1000
> > > > pci 0004:05     : [PE# 18] Setting up window#0 0..3fffffff
> > > > pg=1000
> > > > pci 0004:06     : [PE# 30] Setting up window#0 0..3fffffff
> > > > pg=1000
> > > > pci 0006:00:00.0: [PE# 0d] Setting up window 0..3fffffff
> > > > pg=1000
> > > > pci 0006:00:01.0: [PE# 0c] Setting up window 0..3fffffff
> > > > pg=1000
> > > > pci 0006:00:02.0: [PE# 0b] Setting up window 0..3fffffff
> > > > pg=1000
> > > > EEH: Beginning: 'slot_reset'
> > > > EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0)
> > > > EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0)
> > > > EEH: Finished:'slot_reset' with aggregate recovery state:'none'
> > > > EEH: Notify device driver to resume
> > > > EEH: Beginning: 'resume'
> > > > EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0)
> > > > EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0)
> > > > EEH: Finished:'resume'
> > > > EEH: Recovery successful.
> > > > ---
> > > >  arch/powerpc/platforms/powernv/eeh-powernv.c | 8 ++++----
> > > >  arch/powerpc/platforms/powernv/pci-ioda.c    | 4 ++--
> > > >  arch/powerpc/platforms/powernv/pci.c         | 4 ++--
> > > >  3 files changed, 8 insertions(+), 8 deletions(-)
> > > > 
> > > > diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c
> > > > b/arch/powerpc/platforms/powernv/eeh-powernv.c
> > > > index abc0be7..f380789 100644
> > > > --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
> > > > +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
> > > > @@ -564,8 +564,8 @@ static void pnv_eeh_get_phb_diag(struct
> > > > eeh_pe *pe)
> > > >  static int pnv_eeh_get_phb_state(struct eeh_pe *pe)
> > > >  {
> > > >  	struct pnv_phb *phb = pe->phb->private_data;
> > > > -	u8 fstate;
> > > > -	__be16 pcierr;
> > > > +	u8 fstate = 0;
> > > > +	__be16 pcierr = 0;
> > > >  	s64 rc;
> > > >  	int result = 0;
> > > >  
> > > > @@ -603,8 +603,8 @@ static int pnv_eeh_get_phb_state(struct
> > > > eeh_pe *pe)
> > > >  static int pnv_eeh_get_pe_state(struct eeh_pe *pe)
> > > >  {
> > > >  	struct pnv_phb *phb = pe->phb->private_data;
> > > > -	u8 fstate;
> > > > -	__be16 pcierr;
> > > > +	u8 fstate = 0;
> > > > +	__be16 pcierr = 0;
> > > >  	s64 rc;
> > > >  	int result;
> > > >  
> > > > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c
> > > > b/arch/powerpc/platforms/powernv/pci-ioda.c
> > > > index dd80744..72b5cc0 100644
> > > > --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> > > > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> > > > @@ -604,8 +604,8 @@ static int pnv_ioda_unfreeze_pe(struct
> > > > pnv_phb *phb, int pe_no, int opt)
> > > >  static int pnv_ioda_get_pe_state(struct pnv_phb *phb, int
> > > > pe_no)
> > > >  {
> > > >  	struct pnv_ioda_pe *slave, *pe;
> > > > -	u8 fstate, state;
> > > > -	__be16 pcierr;
> > > > +	u8 fstate = 0, state;
> > > > +	__be16 pcierr = 0;
> > > >  	s64 rc;
> > > >  
> > > >  	/* Sanity check on PE number */
> > > > diff --git a/arch/powerpc/platforms/powernv/pci.c
> > > > b/arch/powerpc/platforms/powernv/pci.c
> > > > index 13aef23..db230a35 100644
> > > > --- a/arch/powerpc/platforms/powernv/pci.c
> > > > +++ b/arch/powerpc/platforms/powernv/pci.c
> > > > @@ -602,8 +602,8 @@ static void
> > > > pnv_pci_handle_eeh_config(struct pnv_phb *phb, u32 pe_no)
> > > >  static void pnv_pci_config_check_eeh(struct pci_dn *pdn)
> > > >  {
> > > >  	struct pnv_phb *phb = pdn->phb->private_data;
> > > > -	u8	fstate;
> > > > -	__be16	pcierr;
> > > > +	u8	fstate = 0;
> > > > +	__be16	pcierr = 0;
> > > >  	unsigned int pe_no;
> > > >  	s64	rc;
> > > >  
> > > > -- 
> > > > 2.17.1


  reply	other threads:[~2018-11-20  7:09 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-19  4:25 [PATCH kernel] powerpc/powernv/eeh/npu: Fix uninitialized variables in opal_pci_eeh_freeze_status Alexey Kardashevskiy
2018-11-19 23:28 ` Sam Bobroff
2018-11-20  2:51 ` Michael Ellerman
2018-11-20  3:51   ` Sam Bobroff
2018-11-20  6:55     ` Alexey Kardashevskiy
2018-11-20  6:59       ` Russell Currey [this message]
2018-11-20 10:58         ` Michael Ellerman
2018-12-23 13:28 ` [kernel] " Michael Ellerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54e863e23af0fbc5f83b663fe7d6c95f86024c3c.camel@russell.cc \
    --to=ruscur@russell.cc \
    --cc=aik@ozlabs.ru \
    --cc=alistair@popple.id.au \
    --cc=arbab@linux.ibm.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=oohall@gmail.com \
    --cc=pjaroszynski@nvidia.com \
    --cc=sbobroff@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).