linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* Regression with legacy IRQ numbers caused by 9a1091ef0017
@ 2015-01-14 22:14 Tony Lindgren
  2015-01-15 10:50 ` Russell King - ARM Linux
  2015-01-15 13:42 ` Marc Zyngier
  0 siblings, 2 replies; 21+ messages in thread
From: Tony Lindgren @ 2015-01-14 22:14 UTC (permalink / raw)
  To: linux-arm-kernel

Hi all,

Looks like the legacy IRQ numbers are now all wrong at least for omap4
since commit 9a1091ef0017 ("irqchip: gic: Support hierarchy irq domain.").

Instead of this:

# cat /proc/interrupts 
            CPU0       CPU1       
 29:       1124        981       GIC  29  twd
 39:          0          0       GIC  39  TWL6030-PIH
 41:          0          0       GIC  41  l3-dbg-irq
 42:          0          0       GIC  42  l3-app-irq
 44:          0          0       GIC  44  DMA
 45:       7854          0       GIC  45  omap-dma-engine
 52:          0          0       GIC  52  gpmc
...


We now have:

# cat /proc/interrupts 
            CPU0       CPU1       
 16:        343          0       GIC  69  gp_timer
 17:       1160       1017       GIC  29  twd
 18:          0          0       GIC  41  l3-dbg-irq
 19:          1          0       GIC  42  l3-app-irq
 22:       7850          0       GIC  45  omap-dma-engine
 44:          0          0  4a310000.gpio  18  DMA
 61:       2730          0  48055000.gpio   2  eth0
223:          0          0       GIC  52  gpmc
...

So the DMA interrupt using the legacy mapping with something like
irq = 12 + OMAP44XX_IRQ_GIC_START now is wrong and unfortunately
at least omaps still have a bunch of the legacy interrupts still
around.

And that naturally produces all kinds of strange errors like:

WARNING: CPU: 0 PID: 1 at drivers/bus/omap_l3_noc.c:147 l3_interrupt_handler+0x214/0x340()
44000000.ocp:L3 Custom Error: MASTER MPU TARGET L4CFG (Idle): Data Access in Supervisor mode during Functional access
...
[<c05f21e4>] (__irq_svc) from [<c05f1974>] (_raw_spin_unlock_irqrestore+0x34/0x44)
[<c05f1974>] (_raw_spin_unlock_irqrestore) from [<c00914a8>] (__setup_irq+0x244/0x530)
[<c00914a8>] (__setup_irq) from [<c00917d4>] (setup_irq+0x40/0x8c)
[<c00917d4>] (setup_irq) from [<c0039c8c>] (omap_system_dma_probe+0x1d4/0x2b4)
[<c0039c8c>] (omap_system_dma_probe) from [<c03b2200>] (platform_drv_probe+0x44/0xa4)
...

Looks like the logic changed from:

if (of_property_read_u32(node, "arm,routable-irqs", &nr_routable_irqs))

to just

if (node)

Which now causes irq_domain_add_linear() to be called instead of
irq_domain_add_legacy(), which causes the breakage.

Anybody got a sane fix in mind for the -rc series, or should we just
revert it for now?

Regards,

Tony

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Regression with legacy IRQ numbers caused by 9a1091ef0017
  2015-01-14 22:14 Regression with legacy IRQ numbers caused by 9a1091ef0017 Tony Lindgren
@ 2015-01-15 10:50 ` Russell King - ARM Linux
  2015-01-15 15:28   ` Tony Lindgren
  2015-01-15 13:42 ` Marc Zyngier
  1 sibling, 1 reply; 21+ messages in thread
From: Russell King - ARM Linux @ 2015-01-15 10:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 14, 2015 at 02:14:08PM -0800, Tony Lindgren wrote:
> Hi all,
> 
> Looks like the legacy IRQ numbers are now all wrong at least for omap4
> since commit 9a1091ef0017 ("irqchip: gic: Support hierarchy irq domain.").
> 
> Instead of this:
> 
> # cat /proc/interrupts 
>             CPU0       CPU1       
>  29:       1124        981       GIC  29  twd
>  39:          0          0       GIC  39  TWL6030-PIH
>  41:          0          0       GIC  41  l3-dbg-irq
>  42:          0          0       GIC  42  l3-app-irq
>  44:          0          0       GIC  44  DMA
>  45:       7854          0       GIC  45  omap-dma-engine
>  52:          0          0       GIC  52  gpmc
> ...
> 
> 
> We now have:
> 
> # cat /proc/interrupts 
>             CPU0       CPU1       
>  16:        343          0       GIC  69  gp_timer
>  17:       1160       1017       GIC  29  twd
>  18:          0          0       GIC  41  l3-dbg-irq
>  19:          1          0       GIC  42  l3-app-irq
>  22:       7850          0       GIC  45  omap-dma-engine
>  44:          0          0  4a310000.gpio  18  DMA
>  61:       2730          0  48055000.gpio   2  eth0
> 223:          0          0       GIC  52  gpmc
> ...
> 
> So the DMA interrupt using the legacy mapping with something like
> irq = 12 + OMAP44XX_IRQ_GIC_START now is wrong and unfortunately
> at least omaps still have a bunch of the legacy interrupts still
> around.
> 
> And that naturally produces all kinds of strange errors like:
> 
> WARNING: CPU: 0 PID: 1 at drivers/bus/omap_l3_noc.c:147 l3_interrupt_handler+0x214/0x340()
> 44000000.ocp:L3 Custom Error: MASTER MPU TARGET L4CFG (Idle): Data Access in Supervisor mode during Functional access
> ...
> [<c05f21e4>] (__irq_svc) from [<c05f1974>] (_raw_spin_unlock_irqrestore+0x34/0x44)
> [<c05f1974>] (_raw_spin_unlock_irqrestore) from [<c00914a8>] (__setup_irq+0x244/0x530)
> [<c00914a8>] (__setup_irq) from [<c00917d4>] (setup_irq+0x40/0x8c)
> [<c00917d4>] (setup_irq) from [<c0039c8c>] (omap_system_dma_probe+0x1d4/0x2b4)
> [<c0039c8c>] (omap_system_dma_probe) from [<c03b2200>] (platform_drv_probe+0x44/0xa4)
> ...

I don't think we've proven a link there.  While you're right that it
causes the wrong interrupt to be claimed, I have two kernels here,
both claim the same interrupt, one which is multi-platform and issues
that strange warning, and one which targets only OMAP4 which doesn't.

There's something else going on which causes the bus errors which we
haven't found.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Regression with legacy IRQ numbers caused by 9a1091ef0017
  2015-01-14 22:14 Regression with legacy IRQ numbers caused by 9a1091ef0017 Tony Lindgren
  2015-01-15 10:50 ` Russell King - ARM Linux
@ 2015-01-15 13:42 ` Marc Zyngier
  2015-01-15 14:27   ` Arnd Bergmann
  1 sibling, 1 reply; 21+ messages in thread
From: Marc Zyngier @ 2015-01-15 13:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 14 2015 at 10:14:08 pm GMT, Tony Lindgren <tony@atomide.com> wrote:
> Hi all,
>
> Looks like the legacy IRQ numbers are now all wrong at least for omap4
> since commit 9a1091ef0017 ("irqchip: gic: Support hierarchy irq domain.").
>
> Instead of this:
>
> # cat /proc/interrupts 
>             CPU0       CPU1       
>  29:       1124        981       GIC  29  twd
>  39:          0          0       GIC  39  TWL6030-PIH
>  41:          0          0       GIC  41  l3-dbg-irq
>  42:          0          0       GIC  42  l3-app-irq
>  44:          0          0       GIC  44  DMA
>  45:       7854          0       GIC  45  omap-dma-engine
>  52:          0          0       GIC  52  gpmc
> ...
>
>
> We now have:
>
> # cat /proc/interrupts 
>             CPU0       CPU1       
>  16:        343          0       GIC  69  gp_timer
>  17:       1160       1017       GIC  29  twd
>  18:          0          0       GIC  41  l3-dbg-irq
>  19:          1          0       GIC  42  l3-app-irq
>  22:       7850          0       GIC  45  omap-dma-engine
>  44:          0          0  4a310000.gpio  18  DMA
>  61:       2730          0  48055000.gpio   2  eth0
> 223:          0          0       GIC  52  gpmc
> ...
>
> So the DMA interrupt using the legacy mapping with something like
> irq = 12 + OMAP44XX_IRQ_GIC_START now is wrong and unfortunately
> at least omaps still have a bunch of the legacy interrupts still
> around.

Holy crap. How much of this do we have hanging around?

> And that naturally produces all kinds of strange errors like:
>
> WARNING: CPU: 0 PID: 1 at drivers/bus/omap_l3_noc.c:147 l3_interrupt_handler+0x214/0x340()
> 44000000.ocp:L3 Custom Error: MASTER MPU TARGET L4CFG (Idle): Data Access in Supervisor mode during Functional access
> ...
> [<c05f21e4>] (__irq_svc) from [<c05f1974>] (_raw_spin_unlock_irqrestore+0x34/0x44)
> [<c05f1974>] (_raw_spin_unlock_irqrestore) from [<c00914a8>] (__setup_irq+0x244/0x530)
> [<c00914a8>] (__setup_irq) from [<c00917d4>] (setup_irq+0x40/0x8c)
> [<c00917d4>] (setup_irq) from [<c0039c8c>] (omap_system_dma_probe+0x1d4/0x2b4)
> [<c0039c8c>] (omap_system_dma_probe) from [<c03b2200>] (platform_drv_probe+0x44/0xa4)
> ...
>
> Looks like the logic changed from:
>
> if (of_property_read_u32(node, "arm,routable-irqs", &nr_routable_irqs))
>
> to just
>
> if (node)
>
> Which now causes irq_domain_add_linear() to be called instead of
> irq_domain_add_legacy(), which causes the breakage.
>
> Anybody got a sane fix in mind for the -rc series, or should we just
> revert it for now?

Reverting it is going to kill other platforms, and I'd rather have a
workaround, short of fixing it for good (which seems ambitious at -rc4).

How about something along these lines:

diff --git a/arch/arm/mach-omap2/common.h b/arch/arm/mach-omap2/common.h
index 377eea8..b664494 100644
--- a/arch/arm/mach-omap2/common.h
+++ b/arch/arm/mach-omap2/common.h
@@ -211,6 +211,7 @@ extern struct device *omap2_get_iva_device(void);
 extern struct device *omap2_get_l3_device(void);
 extern struct device *omap4_get_dsp_device(void);
 
+unsigned int omap4_xlate_irq(unsigned int hwirq);
 void omap_gic_of_init(void);
 
 #ifdef CONFIG_CACHE_L2X0
diff --git a/arch/arm/mach-omap2/omap4-common.c b/arch/arm/mach-omap2/omap4-common.c
index b7cb44a..cc30e49 100644
--- a/arch/arm/mach-omap2/omap4-common.c
+++ b/arch/arm/mach-omap2/omap4-common.c
@@ -256,6 +256,38 @@ static int __init omap4_sar_ram_init(void)
 }
 omap_early_initcall(omap4_sar_ram_init);
 
+static struct of_device_id gic_match[] = {
+	{ .compatible = "arm,cortex-a9-gic", },
+	{ .compatible = "arm,cortex-a15-gic", },
+	{ },
+};
+
+static struct device_node *gic_node;
+
+unsigned int omap4_xlate_irq(unsigned int hwirq)
+{
+	struct of_phandle_args irq_data;
+	unsigned int irq;
+
+	if (!gic_node)
+		gic_node = of_find_matching_node(NULL, gic_match);
+
+	if (WARN_ON(!gic_node))
+		return hwirq;
+
+	irq_data.np = gic_node;
+	irq_data.args_count = 3;
+	irq_data.args[0] = 0;
+	irq_data.args[1] = hwirq - OMAP44XX_IRQ_GIC_START;
+	irq_data.args[2] = IRQ_TYPE_LEVEL_HIGH;
+
+	irq = irq_create_of_mapping(&irq_data);
+	if (WARN_ON(!irq))
+		irq = hwirq;
+
+	return irq;
+}
+
 void __init omap_gic_of_init(void)
 {
 	struct device_node *np;
diff --git a/arch/arm/mach-omap2/omap_hwmod.c b/arch/arm/mach-omap2/omap_hwmod.c
index cbb908d..9025fff 100644
--- a/arch/arm/mach-omap2/omap_hwmod.c
+++ b/arch/arm/mach-omap2/omap_hwmod.c
@@ -3534,9 +3534,15 @@ int omap_hwmod_fill_resources(struct omap_hwmod *oh, struct resource *res)
 
 	mpu_irqs_cnt = _count_mpu_irqs(oh);
 	for (i = 0; i < mpu_irqs_cnt; i++) {
+		unsigned int irq;
+
+		if (oh->xlate_irq)
+			irq = oh->xlate_irq((oh->mpu_irqs + i)->irq);
+		else
+			irq = (oh->mpu_irqs + i)->irq;
 		(res + r)->name = (oh->mpu_irqs + i)->name;
-		(res + r)->start = (oh->mpu_irqs + i)->irq;
-		(res + r)->end = (oh->mpu_irqs + i)->irq;
+		(res + r)->start = irq;
+		(res + r)->end = irq;
 		(res + r)->flags = IORESOURCE_IRQ;
 		r++;
 	}
diff --git a/arch/arm/mach-omap2/omap_hwmod.h b/arch/arm/mach-omap2/omap_hwmod.h
index 35ca6ef..5b42faf 100644
--- a/arch/arm/mach-omap2/omap_hwmod.h
+++ b/arch/arm/mach-omap2/omap_hwmod.h
@@ -676,6 +676,7 @@ struct omap_hwmod {
 	spinlock_t			_lock;
 	struct list_head		node;
 	struct omap_hwmod_ocp_if	*_mpu_port;
+	unsigned int			(*xlate_irq)(unsigned int);
 	u16				flags;
 	u8				mpu_rt_idx;
 	u8				response_lat;
diff --git a/arch/arm/mach-omap2/omap_hwmod_44xx_data.c b/arch/arm/mach-omap2/omap_hwmod_44xx_data.c
index c314b3c..f5e68a7 100644
--- a/arch/arm/mach-omap2/omap_hwmod_44xx_data.c
+++ b/arch/arm/mach-omap2/omap_hwmod_44xx_data.c
@@ -479,6 +479,7 @@ static struct omap_hwmod omap44xx_dma_system_hwmod = {
 	.class		= &omap44xx_dma_hwmod_class,
 	.clkdm_name	= "l3_dma_clkdm",
 	.mpu_irqs	= omap44xx_dma_system_irqs,
+	.xlate_irq	= omap4_xlate_irq,
 	.main_clk	= "l3_div_ck",
 	.prcm = {
 		.omap4 = {
@@ -640,6 +641,7 @@ static struct omap_hwmod omap44xx_dss_dispc_hwmod = {
 	.class		= &omap44xx_dispc_hwmod_class,
 	.clkdm_name	= "l3_dss_clkdm",
 	.mpu_irqs	= omap44xx_dss_dispc_irqs,
+	.xlate_irq	= omap4_xlate_irq,
 	.sdma_reqs	= omap44xx_dss_dispc_sdma_reqs,
 	.main_clk	= "dss_dss_clk",
 	.prcm = {
@@ -693,6 +695,7 @@ static struct omap_hwmod omap44xx_dss_dsi1_hwmod = {
 	.class		= &omap44xx_dsi_hwmod_class,
 	.clkdm_name	= "l3_dss_clkdm",
 	.mpu_irqs	= omap44xx_dss_dsi1_irqs,
+	.xlate_irq	= omap4_xlate_irq,
 	.sdma_reqs	= omap44xx_dss_dsi1_sdma_reqs,
 	.main_clk	= "dss_dss_clk",
 	.prcm = {
@@ -726,6 +729,7 @@ static struct omap_hwmod omap44xx_dss_dsi2_hwmod = {
 	.class		= &omap44xx_dsi_hwmod_class,
 	.clkdm_name	= "l3_dss_clkdm",
 	.mpu_irqs	= omap44xx_dss_dsi2_irqs,
+	.xlate_irq	= omap4_xlate_irq,
 	.sdma_reqs	= omap44xx_dss_dsi2_sdma_reqs,
 	.main_clk	= "dss_dss_clk",
 	.prcm = {
@@ -784,6 +788,7 @@ static struct omap_hwmod omap44xx_dss_hdmi_hwmod = {
 	 */
 	.flags		= HWMOD_SWSUP_SIDLE,
 	.mpu_irqs	= omap44xx_dss_hdmi_irqs,
+	.xlate_irq	= omap4_xlate_irq,
 	.sdma_reqs	= omap44xx_dss_hdmi_sdma_reqs,
 	.main_clk	= "dss_48mhz_clk",
 	.prcm = {

It seems to make my Panda-ES happy enough, but there is of course some
more to fix (prm44xx.c and twl_common.c). OMAP5 can be worked around the
same way.

Of course, this is in no way a proper fix, but I suppose the OMAP DT is
still missing a few bits...

Thoughts?

	M.
-- 
Jazz is not dead. It just smells funny.

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Regression with legacy IRQ numbers caused by 9a1091ef0017
  2015-01-15 13:42 ` Marc Zyngier
@ 2015-01-15 14:27   ` Arnd Bergmann
  2015-01-15 14:43     ` Marc Zyngier
  0 siblings, 1 reply; 21+ messages in thread
From: Arnd Bergmann @ 2015-01-15 14:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 15 January 2015 13:42:57 Marc Zyngier wrote:
> Of course, this is in no way a proper fix, but I suppose the OMAP DT is
> still missing a few bits...

I must be missing something here, but all the interrupts are listed
correctly in the DT, so what is the omap_hwmod_irq_info actually
achieving on omap4 and omap5?

Would it work if we just remove the incorrect copy of the resource
and use the one that comes from DT?

	Arnd

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Regression with legacy IRQ numbers caused by 9a1091ef0017
  2015-01-15 14:27   ` Arnd Bergmann
@ 2015-01-15 14:43     ` Marc Zyngier
  2015-01-15 15:37       ` Tony Lindgren
  2015-01-15 16:37       ` Arnd Bergmann
  0 siblings, 2 replies; 21+ messages in thread
From: Marc Zyngier @ 2015-01-15 14:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jan 15 2015 at  2:27:56 pm GMT, Arnd Bergmann <arnd@arndb.de> wrote:
> On Thursday 15 January 2015 13:42:57 Marc Zyngier wrote:
>> Of course, this is in no way a proper fix, but I suppose the OMAP DT is
>> still missing a few bits...
>
> I must be missing something here, but all the interrupts are listed
> correctly in the DT, so what is the omap_hwmod_irq_info actually
> achieving on omap4 and omap5?
>
> Would it work if we just remove the incorrect copy of the resource
> and use the one that comes from DT?

By the look of it, omap_hwmod_irq_info serves multiple purposes:
- low level configuration (pads, probably more stuff)
- interrupt description for some drivers, using resources.

It should be fairly easy to do the latter, but the former looks more
tricky (it would push the pad configuration down to the drivers, which
is avoided at the moment).

Probably there is a workable strategy, but my knowledge about OMAP is
close to *nothing*...

	M.
-- 
Jazz is not dead. It just smells funny.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Regression with legacy IRQ numbers caused by 9a1091ef0017
  2015-01-15 10:50 ` Russell King - ARM Linux
@ 2015-01-15 15:28   ` Tony Lindgren
  2015-01-15 17:19     ` Russell King - ARM Linux
  0 siblings, 1 reply; 21+ messages in thread
From: Tony Lindgren @ 2015-01-15 15:28 UTC (permalink / raw)
  To: linux-arm-kernel

* Russell King - ARM Linux <linux@arm.linux.org.uk> [150115 02:53]:
> On Wed, Jan 14, 2015 at 02:14:08PM -0800, Tony Lindgren wrote:
> > Hi all,
> > 
> > Looks like the legacy IRQ numbers are now all wrong at least for omap4
> > since commit 9a1091ef0017 ("irqchip: gic: Support hierarchy irq domain.").
> > 
> > Instead of this:
> > 
> > # cat /proc/interrupts 
> >             CPU0       CPU1       
> >  29:       1124        981       GIC  29  twd
> >  39:          0          0       GIC  39  TWL6030-PIH
> >  41:          0          0       GIC  41  l3-dbg-irq
> >  42:          0          0       GIC  42  l3-app-irq
> >  44:          0          0       GIC  44  DMA
> >  45:       7854          0       GIC  45  omap-dma-engine
> >  52:          0          0       GIC  52  gpmc
> > ...
> > 
> > 
> > We now have:
> > 
> > # cat /proc/interrupts 
> >             CPU0       CPU1       
> >  16:        343          0       GIC  69  gp_timer
> >  17:       1160       1017       GIC  29  twd
> >  18:          0          0       GIC  41  l3-dbg-irq
> >  19:          1          0       GIC  42  l3-app-irq
> >  22:       7850          0       GIC  45  omap-dma-engine
> >  44:          0          0  4a310000.gpio  18  DMA
> >  61:       2730          0  48055000.gpio   2  eth0
> > 223:          0          0       GIC  52  gpmc
> > ...
> > 
> > So the DMA interrupt using the legacy mapping with something like
> > irq = 12 + OMAP44XX_IRQ_GIC_START now is wrong and unfortunately
> > at least omaps still have a bunch of the legacy interrupts still
> > around.
> > 
> > And that naturally produces all kinds of strange errors like:
> > 
> > WARNING: CPU: 0 PID: 1 at drivers/bus/omap_l3_noc.c:147 l3_interrupt_handler+0x214/0x340()
> > 44000000.ocp:L3 Custom Error: MASTER MPU TARGET L4CFG (Idle): Data Access in Supervisor mode during Functional access
> > ...
> > [<c05f21e4>] (__irq_svc) from [<c05f1974>] (_raw_spin_unlock_irqrestore+0x34/0x44)
> > [<c05f1974>] (_raw_spin_unlock_irqrestore) from [<c00914a8>] (__setup_irq+0x244/0x530)
> > [<c00914a8>] (__setup_irq) from [<c00917d4>] (setup_irq+0x40/0x8c)
> > [<c00917d4>] (setup_irq) from [<c0039c8c>] (omap_system_dma_probe+0x1d4/0x2b4)
> > [<c0039c8c>] (omap_system_dma_probe) from [<c03b2200>] (platform_drv_probe+0x44/0xa4)
> > ...
> 
> I don't think we've proven a link there.  While you're right that it
> causes the wrong interrupt to be claimed, I have two kernels here,
> both claim the same interrupt, one which is multi-platform and issues
> that strange warning, and one which targets only OMAP4 which doesn't.
> 
> There's something else going on which causes the bus errors which we
> haven't found.

I think it gets triggered if you enable PREEMPT.

Regards,

Tony

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Regression with legacy IRQ numbers caused by 9a1091ef0017
  2015-01-15 14:43     ` Marc Zyngier
@ 2015-01-15 15:37       ` Tony Lindgren
  2015-01-16 16:56         ` Arnd Bergmann
  2015-01-15 16:37       ` Arnd Bergmann
  1 sibling, 1 reply; 21+ messages in thread
From: Tony Lindgren @ 2015-01-15 15:37 UTC (permalink / raw)
  To: linux-arm-kernel

* Marc Zyngier <marc.zyngier@arm.com> [150115 06:46]:
> On Thu, Jan 15 2015 at  2:27:56 pm GMT, Arnd Bergmann <arnd@arndb.de> wrote:
> > On Thursday 15 January 2015 13:42:57 Marc Zyngier wrote:
> >> Of course, this is in no way a proper fix, but I suppose the OMAP DT is
> >> still missing a few bits...
> >
> > I must be missing something here, but all the interrupts are listed
> > correctly in the DT, so what is the omap_hwmod_irq_info actually
> > achieving on omap4 and omap5?
> >
> > Would it work if we just remove the incorrect copy of the resource
> > and use the one that comes from DT?
> 
> By the look of it, omap_hwmod_irq_info serves multiple purposes:
> - low level configuration (pads, probably more stuff)

The muxing is only done for omap3 in legacy booting mode.

> - interrupt description for some drivers, using resources.

That's still used to create legacy platform_device entries on omap4
for legacy DMA, DSS, PRM. The twl6040 entries are already unused
and I have a patch queued to remove them.
 
> It should be fairly easy to do the latter, but the former looks more
> tricky (it would push the pad configuration down to the drivers, which
> is avoided at the moment).

The pad configuration is already done with pinctrl-single.
 
> Probably there is a workable strategy, but my knowledge about OMAP is
> close to *nothing*...

I have a feeling this might bite other platforms too and we just have
not noticed it yet..

Regards,

Tony

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Regression with legacy IRQ numbers caused by 9a1091ef0017
  2015-01-15 14:43     ` Marc Zyngier
  2015-01-15 15:37       ` Tony Lindgren
@ 2015-01-15 16:37       ` Arnd Bergmann
  1 sibling, 0 replies; 21+ messages in thread
From: Arnd Bergmann @ 2015-01-15 16:37 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 15 January 2015 14:43:35 Marc Zyngier wrote:
> On Thu, Jan 15 2015 at  2:27:56 pm GMT, Arnd Bergmann <arnd@arndb.de> wrote:
> > On Thursday 15 January 2015 13:42:57 Marc Zyngier wrote:
> >> Of course, this is in no way a proper fix, but I suppose the OMAP DT is
> >> still missing a few bits...
> >
> > I must be missing something here, but all the interrupts are listed
> > correctly in the DT, so what is the omap_hwmod_irq_info actually
> > achieving on omap4 and omap5?
> >
> > Would it work if we just remove the incorrect copy of the resource
> > and use the one that comes from DT?
> 
> By the look of it, omap_hwmod_irq_info serves multiple purposes:
> - low level configuration (pads, probably more stuff)
> - interrupt description for some drivers, using resources.
> 
> It should be fairly easy to do the latter, but the former looks more
> tricky (it would push the pad configuration down to the drivers, which
> is avoided at the moment).
> 
> Probably there is a workable strategy, but my knowledge about OMAP is
> close to *nothing*...

I don't know much about OMAP either so maybe it's better to let someone
comment who understands this better. ;-)

A number of commits have in the past removed omap_hwmod_irq_info entries
here, the last one was  09182ab11b49 ("ARM: OMAP4: hwmod data: Remove
irq entries from mcspi, mmc hwmods").

>From what I can tell, we could drop the omap44xx_dss_dispc_irqs,
omap44xx_dss_dsi1_irqs, omap44xx_dss_dsi2_irqs, and omap44xx_dss_hdmi_irqs
in the samem way, after this data got added to DT as part of
cfe86fcf2d0079f03 ("ARM: omap4.dtsi: add omapdss information").

Unfortunately, the omap-dma driver once again gets in the way, since
there are still users of this that are not converted to use dmaengine,
and unlike the proper driver (drivers/dma/omap-dma.c), the legacy
driver (arch/arm/mach-omap2/dma.c) does not get probed from DT and
instead gets the wrong irq numbers from hwmod now.

	Arnd

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Regression with legacy IRQ numbers caused by 9a1091ef0017
  2015-01-15 15:28   ` Tony Lindgren
@ 2015-01-15 17:19     ` Russell King - ARM Linux
  2015-01-16 16:21       ` Tony Lindgren
  0 siblings, 1 reply; 21+ messages in thread
From: Russell King - ARM Linux @ 2015-01-15 17:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jan 15, 2015 at 07:28:39AM -0800, Tony Lindgren wrote:
> * Russell King - ARM Linux <linux@arm.linux.org.uk> [150115 02:53]:
> > I don't think we've proven a link there.  While you're right that it
> > causes the wrong interrupt to be claimed, I have two kernels here,
> > both claim the same interrupt, one which is multi-platform and issues
> > that strange warning, and one which targets only OMAP4 which doesn't.
> > 
> > There's something else going on which causes the bus errors which we
> > haven't found.
> 
> I think it gets triggered if you enable PREEMPT.

That's something which we can try to prove... build running now with
CONFIG_PREEMPT=y

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Regression with legacy IRQ numbers caused by 9a1091ef0017
  2015-01-15 17:19     ` Russell King - ARM Linux
@ 2015-01-16 16:21       ` Tony Lindgren
  2015-01-16 16:30         ` Russell King - ARM Linux
  0 siblings, 1 reply; 21+ messages in thread
From: Tony Lindgren @ 2015-01-16 16:21 UTC (permalink / raw)
  To: linux-arm-kernel

* Russell King - ARM Linux <linux@arm.linux.org.uk> [150115 09:22]:
> On Thu, Jan 15, 2015 at 07:28:39AM -0800, Tony Lindgren wrote:
> > * Russell King - ARM Linux <linux@arm.linux.org.uk> [150115 02:53]:
> > > I don't think we've proven a link there.  While you're right that it
> > > causes the wrong interrupt to be claimed, I have two kernels here,
> > > both claim the same interrupt, one which is multi-platform and issues
> > > that strange warning, and one which targets only OMAP4 which doesn't.
> > > 
> > > There's something else going on which causes the bus errors which we
> > > haven't found.
> > 
> > I think it gets triggered if you enable PREEMPT.
> 
> That's something which we can try to prove... build running now with
> CONFIG_PREEMPT=y

Looks like you now have the omap_l3_noc error appear for sdp4430 in
your logs after enabling PREEMPT.

I guess that means case closed for this one?

Regards,

Tony

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Regression with legacy IRQ numbers caused by 9a1091ef0017
  2015-01-16 16:21       ` Tony Lindgren
@ 2015-01-16 16:30         ` Russell King - ARM Linux
  2015-01-16 16:41           ` Tony Lindgren
  0 siblings, 1 reply; 21+ messages in thread
From: Russell King - ARM Linux @ 2015-01-16 16:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jan 16, 2015 at 08:21:20AM -0800, Tony Lindgren wrote:
> * Russell King - ARM Linux <linux@arm.linux.org.uk> [150115 09:22]:
> > On Thu, Jan 15, 2015 at 07:28:39AM -0800, Tony Lindgren wrote:
> > > * Russell King - ARM Linux <linux@arm.linux.org.uk> [150115 02:53]:
> > > > I don't think we've proven a link there.  While you're right that it
> > > > causes the wrong interrupt to be claimed, I have two kernels here,
> > > > both claim the same interrupt, one which is multi-platform and issues
> > > > that strange warning, and one which targets only OMAP4 which doesn't.
> > > > 
> > > > There's something else going on which causes the bus errors which we
> > > > haven't found.
> > > 
> > > I think it gets triggered if you enable PREEMPT.
> > 
> > That's something which we can try to prove... build running now with
> > CONFIG_PREEMPT=y
> 
> Looks like you now have the omap_l3_noc error appear for sdp4430 in
> your logs after enabling PREEMPT.
> 
> I guess that means case closed for this one?

I would still like to understand /why/ enabling preempt causes the error.
Changing the preempt configuration really should not change what happens
on the bus.  (Think about it.)  It's an indication that there is some
other error present.

Unfortunately, the OMAP hardware appears to make it impossible to
determine what the access that caused the error was, so it looks like
it's pretty much undebuggable.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Regression with legacy IRQ numbers caused by 9a1091ef0017
  2015-01-16 16:30         ` Russell King - ARM Linux
@ 2015-01-16 16:41           ` Tony Lindgren
  2015-01-16 16:46             ` Felipe Balbi
  2015-01-16 17:22             ` Russell King - ARM Linux
  0 siblings, 2 replies; 21+ messages in thread
From: Tony Lindgren @ 2015-01-16 16:41 UTC (permalink / raw)
  To: linux-arm-kernel

* Russell King - ARM Linux <linux@arm.linux.org.uk> [150116 08:33]:
> On Fri, Jan 16, 2015 at 08:21:20AM -0800, Tony Lindgren wrote:
> > * Russell King - ARM Linux <linux@arm.linux.org.uk> [150115 09:22]:
> > > On Thu, Jan 15, 2015 at 07:28:39AM -0800, Tony Lindgren wrote:
> > > > * Russell King - ARM Linux <linux@arm.linux.org.uk> [150115 02:53]:
> > > > > I don't think we've proven a link there.  While you're right that it
> > > > > causes the wrong interrupt to be claimed, I have two kernels here,
> > > > > both claim the same interrupt, one which is multi-platform and issues
> > > > > that strange warning, and one which targets only OMAP4 which doesn't.
> > > > > 
> > > > > There's something else going on which causes the bus errors which we
> > > > > haven't found.
> > > > 
> > > > I think it gets triggered if you enable PREEMPT.
> > > 
> > > That's something which we can try to prove... build running now with
> > > CONFIG_PREEMPT=y
> > 
> > Looks like you now have the omap_l3_noc error appear for sdp4430 in
> > your logs after enabling PREEMPT.
> > 
> > I guess that means case closed for this one?
> 
> I would still like to understand /why/ enabling preempt causes the error.
> Changing the preempt configuration really should not change what happens
> on the bus.  (Think about it.)  It's an indication that there is some
> other error present.

We have a wrong irq number caused by $subject. And the wrong irq
gets triggered before the dma hardware is configured during dma
init. And then we get the invalid access error from omap_l3_noc.

> Unfortunately, the OMAP hardware appears to make it impossible to
> determine what the access that caused the error was, so it looks like
> it's pretty much undebuggable.

Yeah would be nice to have more info from omap_l3_noc.

Regards,

Tony

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Regression with legacy IRQ numbers caused by 9a1091ef0017
  2015-01-16 16:41           ` Tony Lindgren
@ 2015-01-16 16:46             ` Felipe Balbi
  2015-01-16 17:22             ` Russell King - ARM Linux
  1 sibling, 0 replies; 21+ messages in thread
From: Felipe Balbi @ 2015-01-16 16:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jan 16, 2015 at 08:41:06AM -0800, Tony Lindgren wrote:
> * Russell King - ARM Linux <linux@arm.linux.org.uk> [150116 08:33]:
> > On Fri, Jan 16, 2015 at 08:21:20AM -0800, Tony Lindgren wrote:
> > > * Russell King - ARM Linux <linux@arm.linux.org.uk> [150115 09:22]:
> > > > On Thu, Jan 15, 2015 at 07:28:39AM -0800, Tony Lindgren wrote:
> > > > > * Russell King - ARM Linux <linux@arm.linux.org.uk> [150115 02:53]:
> > > > > > I don't think we've proven a link there.  While you're right that it
> > > > > > causes the wrong interrupt to be claimed, I have two kernels here,
> > > > > > both claim the same interrupt, one which is multi-platform and issues
> > > > > > that strange warning, and one which targets only OMAP4 which doesn't.
> > > > > > 
> > > > > > There's something else going on which causes the bus errors which we
> > > > > > haven't found.
> > > > > 
> > > > > I think it gets triggered if you enable PREEMPT.
> > > > 
> > > > That's something which we can try to prove... build running now with
> > > > CONFIG_PREEMPT=y
> > > 
> > > Looks like you now have the omap_l3_noc error appear for sdp4430 in
> > > your logs after enabling PREEMPT.
> > > 
> > > I guess that means case closed for this one?
> > 
> > I would still like to understand /why/ enabling preempt causes the error.
> > Changing the preempt configuration really should not change what happens
> > on the bus.  (Think about it.)  It's an indication that there is some
> > other error present.
> 
> We have a wrong irq number caused by $subject. And the wrong irq
> gets triggered before the dma hardware is configured during dma
> init. And then we get the invalid access error from omap_l3_noc.
> 
> > Unfortunately, the OMAP hardware appears to make it impossible to
> > determine what the access that caused the error was, so it looks like
> > it's pretty much undebuggable.
> 
> Yeah would be nice to have more info from omap_l3_noc.

you can probably get more info by decoding all L4 instance errors. It's
just not implemented anywhere. In this case, we should decode l4cfg
which is who generated the error.

-- 
balbi
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20150116/691a076c/attachment.sig>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Regression with legacy IRQ numbers caused by 9a1091ef0017
  2015-01-15 15:37       ` Tony Lindgren
@ 2015-01-16 16:56         ` Arnd Bergmann
  2015-01-16 17:23           ` Marc Zyngier
  0 siblings, 1 reply; 21+ messages in thread
From: Arnd Bergmann @ 2015-01-16 16:56 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 15 January 2015 07:37:48 Tony Lindgren wrote:
> * Marc Zyngier <marc.zyngier@arm.com> [150115 06:46]:
> > On Thu, Jan 15 2015 at  2:27:56 pm GMT, Arnd Bergmann <arnd@arndb.de> wrote:
> > > On Thursday 15 January 2015 13:42:57 Marc Zyngier wrote:
> > Probably there is a workable strategy, but my knowledge about OMAP is
> > close to *nothing*...
> 
> I have a feeling this might bite other platforms too and we just have
> not noticed it yet..

I'm looking through the entire tree now, scanning for machines that have
GIC and use IORESOURCE_IRQ or DEFINE_RES_IRQ in their platform code.
Most platforms using GIC are completely converted to DT and have no
hardcoded legacy IRQs.

I have checked that cns3xxx and realview are both fine by inspection.

The only one I'm not sure about is shmobile, which looks like it might
suffer from the same problem. Simon/Magnus, could you verify this
with a multiplatform kernel on any SoC that has GIC and uses devices
that have interrupts defined in setup-*.c or board-*.c?

	Arnd

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Regression with legacy IRQ numbers caused by 9a1091ef0017
  2015-01-16 16:41           ` Tony Lindgren
  2015-01-16 16:46             ` Felipe Balbi
@ 2015-01-16 17:22             ` Russell King - ARM Linux
  2015-01-16 17:29               ` Tony Lindgren
  1 sibling, 1 reply; 21+ messages in thread
From: Russell King - ARM Linux @ 2015-01-16 17:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jan 16, 2015 at 08:41:06AM -0800, Tony Lindgren wrote:
> * Russell King - ARM Linux <linux@arm.linux.org.uk> [150116 08:33]:
> > I would still like to understand /why/ enabling preempt causes the error.
> > Changing the preempt configuration really should not change what happens
> > on the bus.  (Think about it.)  It's an indication that there is some
> > other error present.
> 
> We have a wrong irq number caused by $subject. And the wrong irq
> gets triggered before the dma hardware is configured during dma
> init. And then we get the invalid access error from omap_l3_noc.

... which should happen whether or not preempt is enabled, which is
really my point.

We know tha the wrong IRQ gets requested by the driver - and that wrong
IRQ is requested whether or not we have preempt enabled.  Yet we get
the warning whether or not preempt is enabled.

The DMA handler is not registered as a threaded handler, so it's not
depending on a context switch to execute omap2_dma_irq_handler().

Another reason why I don't agree with your explanation is that by the
time setup_irq() is called, we have already poked at the DMA hardware
several times - omap_clear_dma() and omap2_disable_irq_lch() will have
been called for each DMA channel - and both will write to the hardware.

What's more is that the only things left after setup_irq() has been
called is to possibly reserve the first two DMA channels and print
the DMA message (via show_dma_caps).  So I see nothing after setup_irq()
which would "finish" any unfinished hardware initialisation.

The final reason I don't agree is that I've put a printk() in
omap2_dma_irq_handler(), and this does not trigger.

So, I think this has nothing to do with the DMA hardware /at all/,
but more to do with the GPIO code, and it suggests that the GPIO code
publishes IRQs before it is safe for those IRQs to be used.

Maybe it has to do with omap_gpio_irq_handler() being called... added
printk(), nope, that's not called either.  So it's not an IRQ which
gets triggered at all.

What is called are (in order):

omap_gpio_unmask_irq()
omap_set_gpio_irqenable()
omap_enable_gpio_irqbank()

and this reveals where the problem is, especially when you then add
instrumentation into the runtime PM functions - and this reveals that
when a GPIO IRQ is requested, these functions are called while the
GPIO is runtime suspended.

_That_ is where the *real* problem lies - requesting a GPIO interrupt
results in the kernel touching possibly runtime-suspended hardware.

The reason it happens with preempt is that preempt introduces scheduling
points during the kernel boot which would not otherwise be there (with
preempt disabled, you have to hit an explicit context switch due to
contention on some lock or a wait in order for some other thread to run.)

So, the GPIO driver really needs fixing - and I'd suggest fixing it
first, before fixing the DMA problem, because the DMA problem allows
us to see the GPIO problem.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Regression with legacy IRQ numbers caused by 9a1091ef0017
  2015-01-16 16:56         ` Arnd Bergmann
@ 2015-01-16 17:23           ` Marc Zyngier
  2015-01-17  0:48             ` Simon Horman
  0 siblings, 1 reply; 21+ messages in thread
From: Marc Zyngier @ 2015-01-16 17:23 UTC (permalink / raw)
  To: linux-arm-kernel

On 16/01/15 16:56, Arnd Bergmann wrote:
> On Thursday 15 January 2015 07:37:48 Tony Lindgren wrote:
>> * Marc Zyngier <marc.zyngier@arm.com> [150115 06:46]:
>>> On Thu, Jan 15 2015 at  2:27:56 pm GMT, Arnd Bergmann <arnd@arndb.de> wrote:
>>>> On Thursday 15 January 2015 13:42:57 Marc Zyngier wrote:
>>> Probably there is a workable strategy, but my knowledge about OMAP is
>>> close to *nothing*...
>>
>> I have a feeling this might bite other platforms too and we just have
>> not noticed it yet..
> 
> I'm looking through the entire tree now, scanning for machines that have
> GIC and use IORESOURCE_IRQ or DEFINE_RES_IRQ in their platform code.
> Most platforms using GIC are completely converted to DT and have no
> hardcoded legacy IRQs.
> 
> I have checked that cns3xxx and realview are both fine by inspection.
> 
> The only one I'm not sure about is shmobile, which looks like it might
> suffer from the same problem. Simon/Magnus, could you verify this
> with a multiplatform kernel on any SoC that has GIC and uses devices
> that have interrupts defined in setup-*.c or board-*.c?

There are 3 patches floating around for shmobile, converting their
non-DT support to directly initializing the GIC instead of relying on
irqchip_init(). That's assuming their DT implementation doesn't use any
of these device declarations.

If they do, we could use a hack similar to the one I implemented for
OMAP, populating the virtual IRQ in the resource at boot time, just
after the irqchip initialization.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Regression with legacy IRQ numbers caused by 9a1091ef0017
  2015-01-16 17:22             ` Russell King - ARM Linux
@ 2015-01-16 17:29               ` Tony Lindgren
  2015-01-16 22:52                 ` Tony Lindgren
  0 siblings, 1 reply; 21+ messages in thread
From: Tony Lindgren @ 2015-01-16 17:29 UTC (permalink / raw)
  To: linux-arm-kernel

* Russell King - ARM Linux <linux@arm.linux.org.uk> [150116 09:25]:
> On Fri, Jan 16, 2015 at 08:41:06AM -0800, Tony Lindgren wrote:
> > * Russell King - ARM Linux <linux@arm.linux.org.uk> [150116 08:33]:
> > > I would still like to understand /why/ enabling preempt causes the error.
> > > Changing the preempt configuration really should not change what happens
> > > on the bus.  (Think about it.)  It's an indication that there is some
> > > other error present.
> > 
> > We have a wrong irq number caused by $subject. And the wrong irq
> > gets triggered before the dma hardware is configured during dma
> > init. And then we get the invalid access error from omap_l3_noc.
> 
> ... which should happen whether or not preempt is enabled, which is
> really my point.
> 
> We know tha the wrong IRQ gets requested by the driver - and that wrong
> IRQ is requested whether or not we have preempt enabled.  Yet we get
> the warning whether or not preempt is enabled.
> 
> The DMA handler is not registered as a threaded handler, so it's not
> depending on a context switch to execute omap2_dma_irq_handler().
> 
> Another reason why I don't agree with your explanation is that by the
> time setup_irq() is called, we have already poked at the DMA hardware
> several times - omap_clear_dma() and omap2_disable_irq_lch() will have
> been called for each DMA channel - and both will write to the hardware.
> 
> What's more is that the only things left after setup_irq() has been
> called is to possibly reserve the first two DMA channels and print
> the DMA message (via show_dma_caps).  So I see nothing after setup_irq()
> which would "finish" any unfinished hardware initialisation.
> 
> The final reason I don't agree is that I've put a printk() in
> omap2_dma_irq_handler(), and this does not trigger.

Oh, yes that blows my theory completely then.
 
> So, I think this has nothing to do with the DMA hardware /at all/,
> but more to do with the GPIO code, and it suggests that the GPIO code
> publishes IRQs before it is safe for those IRQs to be used.
> 
> Maybe it has to do with omap_gpio_irq_handler() being called... added
> printk(), nope, that's not called either.  So it's not an IRQ which
> gets triggered at all.
> 
> What is called are (in order):
> 
> omap_gpio_unmask_irq()
> omap_set_gpio_irqenable()
> omap_enable_gpio_irqbank()
> 
> and this reveals where the problem is, especially when you then add
> instrumentation into the runtime PM functions - and this reveals that
> when a GPIO IRQ is requested, these functions are called while the
> GPIO is runtime suspended.
> 
> _That_ is where the *real* problem lies - requesting a GPIO interrupt
> results in the kernel touching possibly runtime-suspended hardware.
> 
> The reason it happens with preempt is that preempt introduces scheduling
> points during the kernel boot which would not otherwise be there (with
> preempt disabled, you have to hit an explicit context switch due to
> contention on some lock or a wait in order for some other thread to run.)

OK makes sense.
 
> So, the GPIO driver really needs fixing - and I'd suggest fixing it
> first, before fixing the DMA problem, because the DMA problem allows
> us to see the GPIO problem.

Yes we need to fix that.

Regards,

Tony

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Regression with legacy IRQ numbers caused by 9a1091ef0017
  2015-01-16 17:29               ` Tony Lindgren
@ 2015-01-16 22:52                 ` Tony Lindgren
  2015-01-16 22:57                   ` Russell King - ARM Linux
  0 siblings, 1 reply; 21+ messages in thread
From: Tony Lindgren @ 2015-01-16 22:52 UTC (permalink / raw)
  To: linux-arm-kernel

* Tony Lindgren <tony@atomide.com> [150116 09:36]:
> * Russell King - ARM Linux <linux@arm.linux.org.uk> [150116 09:25]:
> > On Fri, Jan 16, 2015 at 08:41:06AM -0800, Tony Lindgren wrote:
> > > * Russell King - ARM Linux <linux@arm.linux.org.uk> [150116 08:33]:
> > > > I would still like to understand /why/ enabling preempt causes the error.
> > > > Changing the preempt configuration really should not change what happens
> > > > on the bus.  (Think about it.)  It's an indication that there is some
> > > > other error present.
> > > 
> > > We have a wrong irq number caused by $subject. And the wrong irq
> > > gets triggered before the dma hardware is configured during dma
> > > init. And then we get the invalid access error from omap_l3_noc.
> > 
> > ... which should happen whether or not preempt is enabled, which is
> > really my point.
> > 
> > We know tha the wrong IRQ gets requested by the driver - and that wrong
> > IRQ is requested whether or not we have preempt enabled.  Yet we get
> > the warning whether or not preempt is enabled.
> > 
> > The DMA handler is not registered as a threaded handler, so it's not
> > depending on a context switch to execute omap2_dma_irq_handler().
> > 
> > Another reason why I don't agree with your explanation is that by the
> > time setup_irq() is called, we have already poked at the DMA hardware
> > several times - omap_clear_dma() and omap2_disable_irq_lch() will have
> > been called for each DMA channel - and both will write to the hardware.
> > 
> > What's more is that the only things left after setup_irq() has been
> > called is to possibly reserve the first two DMA channels and print
> > the DMA message (via show_dma_caps).  So I see nothing after setup_irq()
> > which would "finish" any unfinished hardware initialisation.
> > 
> > The final reason I don't agree is that I've put a printk() in
> > omap2_dma_irq_handler(), and this does not trigger.
> 
> Oh, yes that blows my theory completely then.
>  
> > So, I think this has nothing to do with the DMA hardware /at all/,
> > but more to do with the GPIO code, and it suggests that the GPIO code
> > publishes IRQs before it is safe for those IRQs to be used.
> > 
> > Maybe it has to do with omap_gpio_irq_handler() being called... added
> > printk(), nope, that's not called either.  So it's not an IRQ which
> > gets triggered at all.
> > 
> > What is called are (in order):
> > 
> > omap_gpio_unmask_irq()
> > omap_set_gpio_irqenable()
> > omap_enable_gpio_irqbank()
> > 
> > and this reveals where the problem is, especially when you then add
> > instrumentation into the runtime PM functions - and this reveals that
> > when a GPIO IRQ is requested, these functions are called while the
> > GPIO is runtime suspended.
> > 
> > _That_ is where the *real* problem lies - requesting a GPIO interrupt
> > results in the kernel touching possibly runtime-suspended hardware.
> > 
> > The reason it happens with preempt is that preempt introduces scheduling
> > points during the kernel boot which would not otherwise be there (with
> > preempt disabled, you have to hit an explicit context switch due to
> > contention on some lock or a wait in order for some other thread to run.)
> 
> OK makes sense.
>  
> > So, the GPIO driver really needs fixing - and I'd suggest fixing it
> > first, before fixing the DMA problem, because the DMA problem allows
> > us to see the GPIO problem.
> 
> Yes we need to fix that.

Posted a minimal fix for that one as a separate thread:

[PATCH 1/1] gpio: omap: Fix bad device access with setup_irq()

Regards,

Tony

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Regression with legacy IRQ numbers caused by 9a1091ef0017
  2015-01-16 22:57                   ` Russell King - ARM Linux
@ 2015-01-16 22:57                     ` Tony Lindgren
  0 siblings, 0 replies; 21+ messages in thread
From: Tony Lindgren @ 2015-01-16 22:57 UTC (permalink / raw)
  To: linux-arm-kernel

* Russell King - ARM Linux <linux@arm.linux.org.uk> [150116 15:00]:
> On Fri, Jan 16, 2015 at 02:52:44PM -0800, Tony Lindgren wrote:
> > * Tony Lindgren <tony@atomide.com> [150116 09:36]:
> > > * Russell King - ARM Linux <linux@arm.linux.org.uk> [150116 09:25]:
> > > > So, the GPIO driver really needs fixing - and I'd suggest fixing it
> > > > first, before fixing the DMA problem, because the DMA problem allows
> > > > us to see the GPIO problem.
> > > 
> > > Yes we need to fix that.
> > 
> > Posted a minimal fix for that one as a separate thread:
> > 
> > [PATCH 1/1] gpio: omap: Fix bad device access with setup_irq()
> 
> Thanks, I'll throw that onto the build tree for tonights build.

Great thanks!

Tony

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Regression with legacy IRQ numbers caused by 9a1091ef0017
  2015-01-16 22:52                 ` Tony Lindgren
@ 2015-01-16 22:57                   ` Russell King - ARM Linux
  2015-01-16 22:57                     ` Tony Lindgren
  0 siblings, 1 reply; 21+ messages in thread
From: Russell King - ARM Linux @ 2015-01-16 22:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jan 16, 2015 at 02:52:44PM -0800, Tony Lindgren wrote:
> * Tony Lindgren <tony@atomide.com> [150116 09:36]:
> > * Russell King - ARM Linux <linux@arm.linux.org.uk> [150116 09:25]:
> > > So, the GPIO driver really needs fixing - and I'd suggest fixing it
> > > first, before fixing the DMA problem, because the DMA problem allows
> > > us to see the GPIO problem.
> > 
> > Yes we need to fix that.
> 
> Posted a minimal fix for that one as a separate thread:
> 
> [PATCH 1/1] gpio: omap: Fix bad device access with setup_irq()

Thanks, I'll throw that onto the build tree for tonights build.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Regression with legacy IRQ numbers caused by 9a1091ef0017
  2015-01-16 17:23           ` Marc Zyngier
@ 2015-01-17  0:48             ` Simon Horman
  0 siblings, 0 replies; 21+ messages in thread
From: Simon Horman @ 2015-01-17  0:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jan 16, 2015 at 05:23:05PM +0000, Marc Zyngier wrote:
> On 16/01/15 16:56, Arnd Bergmann wrote:
> > On Thursday 15 January 2015 07:37:48 Tony Lindgren wrote:
> >> * Marc Zyngier <marc.zyngier@arm.com> [150115 06:46]:
> >>> On Thu, Jan 15 2015 at  2:27:56 pm GMT, Arnd Bergmann <arnd@arndb.de> wrote:
> >>>> On Thursday 15 January 2015 13:42:57 Marc Zyngier wrote:
> >>> Probably there is a workable strategy, but my knowledge about OMAP is
> >>> close to *nothing*...
> >>
> >> I have a feeling this might bite other platforms too and we just have
> >> not noticed it yet..
> > 
> > I'm looking through the entire tree now, scanning for machines that have
> > GIC and use IORESOURCE_IRQ or DEFINE_RES_IRQ in their platform code.
> > Most platforms using GIC are completely converted to DT and have no
> > hardcoded legacy IRQs.
> > 
> > I have checked that cns3xxx and realview are both fine by inspection.
> > 
> > The only one I'm not sure about is shmobile, which looks like it might
> > suffer from the same problem. Simon/Magnus, could you verify this
> > with a multiplatform kernel on any SoC that has GIC and uses devices
> > that have interrupts defined in setup-*.c or board-*.c?
> 
> There are 3 patches floating around for shmobile, converting their
> non-DT support to directly initializing the GIC instead of relying on
> irqchip_init().

There is also a fourth patch pending to fix one last SoC, the r8a73a4.
My understanding is that should be the end of the problems that
we have been seeing in this area.

> That's assuming their DT implementation doesn't use any
> of these device declarations.

I believe that assumption is correct. shmobile does not use any
devices that have interrupts defined in setup-*.c or board-*.c when
booting using multiplatform.

> If they do, we could use a hack similar to the one I implemented for
> OMAP, populating the virtual IRQ in the resource at boot time, just
> after the irqchip initialization.
> 
> Thanks,
> 
> 	M.
> -- 
> Jazz is not dead. It just smells funny...
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2015-01-17  0:48 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-14 22:14 Regression with legacy IRQ numbers caused by 9a1091ef0017 Tony Lindgren
2015-01-15 10:50 ` Russell King - ARM Linux
2015-01-15 15:28   ` Tony Lindgren
2015-01-15 17:19     ` Russell King - ARM Linux
2015-01-16 16:21       ` Tony Lindgren
2015-01-16 16:30         ` Russell King - ARM Linux
2015-01-16 16:41           ` Tony Lindgren
2015-01-16 16:46             ` Felipe Balbi
2015-01-16 17:22             ` Russell King - ARM Linux
2015-01-16 17:29               ` Tony Lindgren
2015-01-16 22:52                 ` Tony Lindgren
2015-01-16 22:57                   ` Russell King - ARM Linux
2015-01-16 22:57                     ` Tony Lindgren
2015-01-15 13:42 ` Marc Zyngier
2015-01-15 14:27   ` Arnd Bergmann
2015-01-15 14:43     ` Marc Zyngier
2015-01-15 15:37       ` Tony Lindgren
2015-01-16 16:56         ` Arnd Bergmann
2015-01-16 17:23           ` Marc Zyngier
2015-01-17  0:48             ` Simon Horman
2015-01-15 16:37       ` Arnd Bergmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).