linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] powerpc/powernv: Bump opal_init initcall priority
@ 2015-06-11  9:25 Alistair Popple
  2015-06-12  9:47 ` Michael Ellerman
  2015-06-14  0:27 ` [PATCH] " Daniel Axtens
  0 siblings, 2 replies; 5+ messages in thread
From: Alistair Popple @ 2015-06-11  9:25 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev; +Cc: Gavin Shan, Alistair Popple

opal_init() is called via a machine_subsys_initcall(). Due to a hack
in the eeh code the eeh driver is initialised with at the same
initcall level. This means depending on link ordering the following
error can occur because the opal irchip has not been initialised:

irq: XICS didn't like hwirq-0x9 to VIRQ17 mapping (rc=-22)
pnv_eeh_post_init: Can't request OPAL event interrupt (0)

This patch solves the issue by making sure opal_init is called prior
to the subsystems that may need it.

Signed-off-by: Alistair Popple <alistair@popple.id.au>
Reported-by: Daniel Axtens <dja@axtens.net>
---

Michael,

This fixes a problem in your next tree introduced by my opal irqchip
series. I encountered similar problems when moving some of the other
subsystems across to the new interface which was the motivation behind
the first patch in that series (powerpc/powernv: Reorder OPAL
subsystem initialisation). I missed this one due to a quirk in the
generic eeh code.

It seems that initcall dependencies are not well defined (or at least
my understanding of the dependencies isn't), so I'm not overly
comfortable with this fix. To be honest I don't have a clear
understanding of what side effects bumping this up a level might have
- it boots, everything seems to "work", and a quick grep reveals
nothing obvious. However I'm open to suggestions for a better fix...

In the meantime this should fix the problem in your branch until I get
a chance to dig further.

- Alistair

 arch/powerpc/platforms/powernv/opal.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
index 9e9c483..cc3ba5f 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -718,7 +718,7 @@ static int __init opal_init(void)

 	return 0;
 }
-machine_subsys_initcall(powernv, opal_init);
+machine_arch_initcall(powernv, opal_init);

 void opal_shutdown(void)
 {
--
1.7.10.4

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: powerpc/powernv: Bump opal_init initcall priority
  2015-06-11  9:25 [PATCH] powerpc/powernv: Bump opal_init initcall priority Alistair Popple
@ 2015-06-12  9:47 ` Michael Ellerman
  2015-06-15  3:44   ` Alistair Popple
  2015-06-14  0:27 ` [PATCH] " Daniel Axtens
  1 sibling, 1 reply; 5+ messages in thread
From: Michael Ellerman @ 2015-06-12  9:47 UTC (permalink / raw)
  To: Alistair Popple, linuxppc-dev; +Cc: Alistair Popple, Gavin Shan

On Thu, 2015-11-06 at 09:25:29 UTC, Alistair Popple wrote:
> opal_init() is called via a machine_subsys_initcall(). Due to a hack
> in the eeh code the eeh driver is initialised with at the same
> initcall level. This means depending on link ordering the following
> error can occur because the opal irchip has not been initialised:
> 
> irq: XICS didn't like hwirq-0x9 to VIRQ17 mapping (rc=-22)
> pnv_eeh_post_init: Can't request OPAL event interrupt (0)
> 
> This patch solves the issue by making sure opal_init is called prior
> to the subsystems that may need it.

What is the hack in the eeh code?

I'm seeing eeh_ops->post_init() called from eeh_init() which is
core_initcall_sync(), is that what you're talking about?

Then pnv_eeh_post_init() is the powernv version of eeh_ops->post_init, and it
calls opal_event_request().

So you'd need the irq_chip setup by then I think?

So I guess I'm missing the hack you're talking about.

Regardless of which level it needs to be at, the only thing that needs to run
early is opal_event_init() am I right? Not all of opal_init().

cheers

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] powerpc/powernv: Bump opal_init initcall priority
  2015-06-11  9:25 [PATCH] powerpc/powernv: Bump opal_init initcall priority Alistair Popple
  2015-06-12  9:47 ` Michael Ellerman
@ 2015-06-14  0:27 ` Daniel Axtens
  1 sibling, 0 replies; 5+ messages in thread
From: Daniel Axtens @ 2015-06-14  0:27 UTC (permalink / raw)
  To: Alistair Popple; +Cc: Michael Ellerman, linuxppc-dev, Gavin Shan

On Thu, 2015-06-11 at 19:25 +1000, Alistair Popple wrote:
> opal_init() is called via a machine_subsys_initcall(). Due to a hack
> in the eeh code the eeh driver is initialised with at the same
> initcall level. This means depending on link ordering the following
> error can occur because the opal irchip has not been initialised:
> 
> irq: XICS didn't like hwirq-0x9 to VIRQ17 mapping (rc=-22)
> pnv_eeh_post_init: Can't request OPAL event interrupt (0)
> 
> This patch solves the issue by making sure opal_init is called prior
> to the subsystems that may need it.
> 
> Signed-off-by: Alistair Popple <alistair@popple.id.au>
> Reported-by: Daniel Axtens <dja@axtens.net>

This fixes the bug I was experiencing.
Tested-by: Daniel Axtens <dja@axtens.net>

-- 
Regards,
Daniel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: powerpc/powernv: Bump opal_init initcall priority
  2015-06-12  9:47 ` Michael Ellerman
@ 2015-06-15  3:44   ` Alistair Popple
  2015-06-15  7:13     ` Michael Ellerman
  0 siblings, 1 reply; 5+ messages in thread
From: Alistair Popple @ 2015-06-15  3:44 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, Gavin Shan

On Fri, 12 Jun 2015 19:47:35 Michael Ellerman wrote:
> On Thu, 2015-11-06 at 09:25:29 UTC, Alistair Popple wrote:
> > opal_init() is called via a machine_subsys_initcall(). Due to a hack
> > in the eeh code the eeh driver is initialised with at the same
> > initcall level. This means depending on link ordering the following
> > error can occur because the opal irchip has not been initialised:
> > 
> > irq: XICS didn't like hwirq-0x9 to VIRQ17 mapping (rc=-22)
> > pnv_eeh_post_init: Can't request OPAL event interrupt (0)
> > 
> > This patch solves the issue by making sure opal_init is called prior
> > to the subsystems that may need it.
> 
> What is the hack in the eeh code?
> 
> I'm seeing eeh_ops->post_init() called from eeh_init() which is
> core_initcall_sync(), is that what you're talking about?

Yes, but there is this at the start of eeh_init():

	/*
	 * We have to delay the initialization on PowerNV after
	 * the PCI hierarchy tree has been built because the PEs
	 * are figured out based on PCI devices instead of device
	 * tree nodes
	 */
	if (machine_is(powernv) && cnt++ <= 0)
		return ret;

Which basically stops eeh_init() running from the core initcall level on 
PowerNV. Instead on PowerNV eeh_init() gets called from pnv_pci_ioda_fixup() 
which itself is called via the PCI layer from a subsys_initcall (ie. the same 
level as opal_init() is called at without this patch). This is why I missed 
this during my testing - apparently I got lucky during my compilations and 
opal_init() must have been called prior to the pci subsystem initialisation.

Ideally I'd like to change the way eeh_init() is called and make it an 
explicit call for each machine type, however that is a more invasive change 
which will take some time as there seem to be a bunch of subtle dependencies 
between initcalls for different platforms that aren't well documented and I'm 
not convinced I understand them all yet. Hence this "quick" fix.

> Then pnv_eeh_post_init() is the powernv version of eeh_ops->post_init, and 
it
> calls opal_event_request().
> 
> So you'd need the irq_chip setup by then I think?

Yeah, it needs to be setup prior to eeh_post_init() being called. 

> So I guess I'm missing the hack you're talking about.
> 
> Regardless of which level it needs to be at, the only thing that needs to 
run
> early is opal_event_init() am I right? Not all of opal_init().

Yes, that is correct. But I was trying to avoid doing the thing which seemed 
to have got us in this mess in the first place (ie. just making everything 
that needs to run "early" run from a core_initcall). However I guess just 
running opal_event_init() earlier is less likely to have unintended side 
effects than running all of opal_init() earlier. Let me know what you'd 
prefer.

Thanks.

- Alistair

> cheers

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: powerpc/powernv: Bump opal_init initcall priority
  2015-06-15  3:44   ` Alistair Popple
@ 2015-06-15  7:13     ` Michael Ellerman
  0 siblings, 0 replies; 5+ messages in thread
From: Michael Ellerman @ 2015-06-15  7:13 UTC (permalink / raw)
  To: Alistair Popple; +Cc: linuxppc-dev, Gavin Shan

On Mon, 2015-06-15 at 13:44 +1000, Alistair Popple wrote:
> On Fri, 12 Jun 2015 19:47:35 Michael Ellerman wrote:
> > On Thu, 2015-11-06 at 09:25:29 UTC, Alistair Popple wrote:
> > > opal_init() is called via a machine_subsys_initcall(). Due to a hack
> > > in the eeh code the eeh driver is initialised with at the same
> > > initcall level. This means depending on link ordering the following
> > > error can occur because the opal irchip has not been initialised:
> > > 
> > > irq: XICS didn't like hwirq-0x9 to VIRQ17 mapping (rc=-22)
> > > pnv_eeh_post_init: Can't request OPAL event interrupt (0)
> > > 
> > > This patch solves the issue by making sure opal_init is called prior
> > > to the subsystems that may need it.
> > 
> > What is the hack in the eeh code?
> > 
> > I'm seeing eeh_ops->post_init() called from eeh_init() which is
> > core_initcall_sync(), is that what you're talking about?
> 
> Yes, but there is this at the start of eeh_init():
> 
> 	/*
> 	 * We have to delay the initialization on PowerNV after
> 	 * the PCI hierarchy tree has been built because the PEs
> 	 * are figured out based on PCI devices instead of device
> 	 * tree nodes
> 	 */
> 	if (machine_is(powernv) && cnt++ <= 0)
> 		return ret;
> 
> Which basically stops eeh_init() running from the core initcall level on 
> PowerNV. Instead on PowerNV eeh_init() gets called from pnv_pci_ioda_fixup() 
> which itself is called via the PCI layer from a subsys_initcall (ie. the same 
> level as opal_init() is called at without this patch). This is why I missed 
> this during my testing - apparently I got lucky during my compilations and 
> opal_init() must have been called prior to the pci subsystem initialisation.
> 
> Ideally I'd like to change the way eeh_init() is called and make it an 
> explicit call for each machine type, however that is a more invasive change 
> which will take some time as there seem to be a bunch of subtle dependencies 
> between initcalls for different platforms that aren't well documented and I'm 
> not convinced I understand them all yet. Hence this "quick" fix.

Yeah right, that's pretty gross. I don't grok why EEH needs to run as early as
it does, but there's probably a reason.

> > Regardless of which level it needs to be at, the only thing that needs to 
> run
> > early is opal_event_init() am I right? Not all of opal_init().
> 
> Yes, that is correct. But I was trying to avoid doing the thing which seemed 
> to have got us in this mess in the first place (ie. just making everything 
> that needs to run "early" run from a core_initcall). However I guess just 
> running opal_event_init() earlier is less likely to have unintended side 
> effects than running all of opal_init() earlier. Let me know what you'd 
> prefer.

Moving opal_event_init() earlier seems like the safest fix. It should only need
to be arch_initcall which is reasonable.


It might also be worth something like this for safety:

diff --git a/arch/powerpc/platforms/powernv/opal-irqchip.c b/arch/powerpc/platforms/powernv/opal-irqchip.c
index 841135f48981..c40e424fcde2 100644
--- a/arch/powerpc/platforms/powernv/opal-irqchip.c
+++ b/arch/powerpc/platforms/powernv/opal-irqchip.c
@@ -244,6 +244,9 @@ out:
  */
 int opal_event_request(unsigned int opal_event_nr)
 {
+       if (WARN_ON_ONCE(!opal_event_irqchip.domain))
+               return NO_IRQ;
+
        return irq_create_mapping(opal_event_irqchip.domain, opal_event_nr);
 }
 EXPORT_SYMBOL(opal_event_request);



cheers

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-06-15  7:13 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-11  9:25 [PATCH] powerpc/powernv: Bump opal_init initcall priority Alistair Popple
2015-06-12  9:47 ` Michael Ellerman
2015-06-15  3:44   ` Alistair Popple
2015-06-15  7:13     ` Michael Ellerman
2015-06-14  0:27 ` [PATCH] " Daniel Axtens

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).