iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* Re: 3.16rc3 multiplatform, Armada 370 and IOMMU: unbootable kernel
       [not found]     ` <53B5CA26.7050405-wi1+55ScJUtKEb57/3fJTNBPR1lH4CV8@public.gmane.org>
@ 2014-07-04  8:47       ` Laurent Pinchart
  2014-07-05 15:03         ` Ezequiel Garcia
  0 siblings, 1 reply; 10+ messages in thread
From: Laurent Pinchart @ 2014-07-04  8:47 UTC (permalink / raw)
  To: Gregory CLEMENT
  Cc: Thomas Petazzoni, Andrew Lunn, Jason Cooper,
	Hiroshi.DOYU-xNZwKgViW5gAvxtiuMwx3w,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Ezequiel Garcia, Paolo Pisati, Florian Vaussard,
	Sebastian Hesselbarth

Hi Gregory,

(CC'ing the IOMMU mailing list)

On Thursday 03 July 2014 23:24:54 Gregory CLEMENT wrote:
> On 03/07/2014 23:07, Gregory CLEMENT wrote:
> > On 03/07/2014 23:01, Thomas Petazzoni wrote:
> >> Hello,
> >> 
> >> If you have touched the OMAP IOMMU driver recently, please read on.
> >> 
> >> On Thu, 03 Jul 2014 22:57:38 +0200, Gregory CLEMENT wrote:
> >>>> So it calls bus_set_iommu() unconditionally, without caring at all
> >>>> whether it is running on a platform that actually cares about OMAP
> >>>> IOMMU. And then later on, a bus notifier of the IOMMU subsystem gets
> >>>> called, and some NULL pointer gets dereferenced. I'm pretty sure that
> >>>> if you comment out this subsys_initcall(), you won't see the problem
> >>>> anymore.

The OMAP IOMMU driver isn't the only one registering itself with the platform 
bus at probe time, regardless of the system the kernel runs on. Some IOMMU 
drivers push the bus_set_iommu() call to the probe function, but that's not a 
good solution either, as we could have several instances of the same IOMMU, 
and all of them should be probed before the bus_set_iommu() call.

We need a quick fix for v3.16, but we also need to fix bus_set_iommu(). That's 
on my to-do list for <insert some fuzzy future date here>, but if you want to 
give it a go, please feel free to do so. In all cases, this is a good occasion 
to start discussing the problem. Any opinion on what the perfect API would 
look like ?

> >>> Indeed I comment it, and I didn't see the problem anymore.
> >>> 
> >>>> However, this code has been around since a while, so I don't know if
> >>>> it's actually the change that makes it visible. Maybe some other IOMMU
> >>>> core internal change makes it actually visible. But this
> >>>> subsys_initcall() that does random stuff without caring about the
> >>>> platform it runs on anyway looks incorrect.
> >>> 
> >>> I think that nobody until now have run this configuration.
> >> 
> >> Hum, indeed, I was assuming multi_v7_defconfig would have that enabled,
> >> but it only has the Tegra IOMMU enabled, and not the OMAP IOMMU one.
> >> 
> >> So, the bug clearly belongs to the developers of the OMAP IOMMU driver,
> >> so I've added a bunch of people who touched this driver recently in Cc.
> > 
> > To add more information: I thought the problem was here from a long time
> > so I tested a 3.14 kernel, and in this case the kernel booted without any
> > problem. So the bug is pretty recent. I will do a last test with 3.15 and
> > I will keep you inform.
> 
> So it also boot well in 3.15 and then failed in 3.16-rc3. I hope it will
> help the developers of the OMAP IOMMU driver to fix it.

Thank you. I've had a look at the OMAP IOMMU driver changes between v3.15 and 
v3.16-rc3, and didn't find at first sight any change that could explain the 
crash.

286f600 iommu/omap: Fix map protection value handling
67b779d iommu/omap: Remove comment about supporting single page mappings only
f7129a0 iommu/omap: Fix 'no page for' debug message in flush_iotlb_page()
5acc97d iommu/omap: Move to_iommu definition from omap-iopgtable.h
2ac6133 iommu/omap: Remove omap_iommu_domain_has_cap() function
d760e3e iommu/omap: Correct init value of iotlb_entry valid field

Could you try reverting those changes and retest ? If the problem doesn't 
disappear, we'll need to look somewhere else.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 3.16rc3 multiplatform, Armada 370 and IOMMU: unbootable kernel
  2014-07-04  8:47       ` 3.16rc3 multiplatform, Armada 370 and IOMMU: unbootable kernel Laurent Pinchart
@ 2014-07-05 15:03         ` Ezequiel Garcia
       [not found]           ` <20140705150308.GA28791-nAQHv47ARr+vIlHkl8J1cg@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Ezequiel Garcia @ 2014-07-05 15:03 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Greg Kroah-Hartman, linux-kernel, Gregory CLEMENT,
	Thomas Petazzoni, Andrew Lunn, Jason Cooper, Joerg Roedel,
	Hiroshi.DOYU, Florian Vaussard, Paolo Pisati, linux-arm-kernel,
	Sebastian Hesselbarth, iommu

After following Gregory's stacktrace (also reproduced here):

[<c02451f8>] (iommu_bus_notifier) from [<c00512e8>] (notifier_call_chain+0x64/0x9c)
[<c00512e8>] (notifier_call_chain) from [<c00514cc>] (__blocking_notifier_call_chain+0x40/0x58)
[<c00514cc>] (__blocking_notifier_call_chain) from [<c00514f8>] (blocking_notifier_call_chain+0x14/0x1c)
[<c00514f8>] (blocking_notifier_call_chain) from [<c01d225c>] (device_add+0x424/0x524)
[<c01d225c>] (device_add) from [<c0186d90>] (pci_device_add+0xec/0x110)
[<c0186d90>] (pci_device_add) from [<c0186e54>] (pci_scan_single_device+0xa0/0xac)

I added a few printks and found that the problem is that the iommu_bus_notifier is
called for the 'pci' bus type, which has a null iommu_ops.

On 04 Jul 10:47 AM, Laurent Pinchart wrote:
[..]
> 
> We need a quick fix for v3.16, ...

Therefore, a quick fix would be to simply check for that:

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index e5555fc..b712cb2 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -536,6 +536,9 @@ static int iommu_bus_notifier(struct notifier_block *nb,
        struct iommu_group *group;
        unsigned long group_action = 0;
 
+       if (!ops)
+               return 0;
+
        /*
         * ADD/DEL call into iommu driver ops if provided, which may
         * result in ADD/DEL notifiers to group->notifier

This (nasty workaround?) patch makes the problem go away.

[..]
> > So it also boot well in 3.15 and then failed in 3.16-rc3. I hope it will
> > help the developers of the OMAP IOMMU driver to fix it.
> 
> Thank you. I've had a look at the OMAP IOMMU driver changes between v3.15 and 
> v3.16-rc3, and didn't find at first sight any change that could explain the 
> crash.
> 
> 286f600 iommu/omap: Fix map protection value handling
> 67b779d iommu/omap: Remove comment about supporting single page mappings only
> f7129a0 iommu/omap: Fix 'no page for' debug message in flush_iotlb_page()
> 5acc97d iommu/omap: Move to_iommu definition from omap-iopgtable.h
> 2ac6133 iommu/omap: Remove omap_iommu_domain_has_cap() function
> d760e3e iommu/omap: Correct init value of iotlb_entry valid field
> 
> Could you try reverting those changes and retest ? If the problem doesn't 
> disappear, we'll need to look somewhere else.
> 

I reverted the above commits but nothing changed. I'm far from being an expert,
but it sounds odd to have this bus notifier (that got registered for the
platform bus type) called by a pci bus type.

This doesn't look like a mvebu-specific problem, other platforms should be
affected as well. I'm Ccing GregKH, in case he has an idea what's going on.
-- 
Ezequiel García, Free Electrons
Embedded Linux, Kernel and Android Engineering
http://free-electrons.com

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: 3.16rc3 multiplatform, Armada 370 and IOMMU: unbootable kernel
       [not found]           ` <20140705150308.GA28791-nAQHv47ARr+vIlHkl8J1cg@public.gmane.org>
@ 2014-07-05 20:59             ` Greg Kroah-Hartman
  2014-07-07 10:58               ` Ezequiel Garcia
  0 siblings, 1 reply; 10+ messages in thread
From: Greg Kroah-Hartman @ 2014-07-05 20:59 UTC (permalink / raw)
  To: Ezequiel Garcia
  Cc: Thomas Petazzoni, Andrew Lunn, Jason Cooper,
	Hiroshi.DOYU-xNZwKgViW5gAvxtiuMwx3w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Laurent Pinchart, Paolo Pisati, Gregory CLEMENT,
	Sebastian Hesselbarth, Florian Vaussard,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Sat, Jul 05, 2014 at 12:03:08PM -0300, Ezequiel Garcia wrote:
> After following Gregory's stacktrace (also reproduced here):
> 
> [<c02451f8>] (iommu_bus_notifier) from [<c00512e8>] (notifier_call_chain+0x64/0x9c)
> [<c00512e8>] (notifier_call_chain) from [<c00514cc>] (__blocking_notifier_call_chain+0x40/0x58)
> [<c00514cc>] (__blocking_notifier_call_chain) from [<c00514f8>] (blocking_notifier_call_chain+0x14/0x1c)
> [<c00514f8>] (blocking_notifier_call_chain) from [<c01d225c>] (device_add+0x424/0x524)
> [<c01d225c>] (device_add) from [<c0186d90>] (pci_device_add+0xec/0x110)
> [<c0186d90>] (pci_device_add) from [<c0186e54>] (pci_scan_single_device+0xa0/0xac)
> 
> I added a few printks and found that the problem is that the iommu_bus_notifier is
> called for the 'pci' bus type, which has a null iommu_ops.
> 
> On 04 Jul 10:47 AM, Laurent Pinchart wrote:
> [..]
> > 
> > We need a quick fix for v3.16, ...
> 
> Therefore, a quick fix would be to simply check for that:
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index e5555fc..b712cb2 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -536,6 +536,9 @@ static int iommu_bus_notifier(struct notifier_block *nb,
>         struct iommu_group *group;
>         unsigned long group_action = 0;
>  
> +       if (!ops)
> +               return 0;
> +
>         /*
>          * ADD/DEL call into iommu driver ops if provided, which may
>          * result in ADD/DEL notifiers to group->notifier
> 
> This (nasty workaround?) patch makes the problem go away.
> 
> [..]
> > > So it also boot well in 3.15 and then failed in 3.16-rc3. I hope it will
> > > help the developers of the OMAP IOMMU driver to fix it.
> > 
> > Thank you. I've had a look at the OMAP IOMMU driver changes between v3.15 and 
> > v3.16-rc3, and didn't find at first sight any change that could explain the 
> > crash.
> > 
> > 286f600 iommu/omap: Fix map protection value handling
> > 67b779d iommu/omap: Remove comment about supporting single page mappings only
> > f7129a0 iommu/omap: Fix 'no page for' debug message in flush_iotlb_page()
> > 5acc97d iommu/omap: Move to_iommu definition from omap-iopgtable.h
> > 2ac6133 iommu/omap: Remove omap_iommu_domain_has_cap() function
> > d760e3e iommu/omap: Correct init value of iotlb_entry valid field
> > 
> > Could you try reverting those changes and retest ? If the problem doesn't 
> > disappear, we'll need to look somewhere else.
> > 
> 
> I reverted the above commits but nothing changed. I'm far from being an expert,
> but it sounds odd to have this bus notifier (that got registered for the
> platform bus type) called by a pci bus type.

Why wouldn't the PCI bus set this up for its devices?  Are you
"assuming" you know the bus type and that's the issue?

I see the a number of different places this is being initialized for the
pci bus.

Ah, look at drivers/iommu/fsl_pamu_domain.c, odds are, it shouldn't be
doing that logic in the pamu_domain_init() code, using the same bus ops
for different bus types, that's ripe for major problems...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 3.16rc3 multiplatform, Armada 370 and IOMMU: unbootable kernel
  2014-07-05 20:59             ` Greg Kroah-Hartman
@ 2014-07-07 10:58               ` Ezequiel Garcia
       [not found]                 ` <20140707105818.GA1101-nAQHv47ARr+vIlHkl8J1cg@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Ezequiel Garcia @ 2014-07-07 10:58 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Thomas Petazzoni, Andrew Lunn, Jason Cooper, Joerg Roedel,
	Hiroshi.DOYU, linux-kernel, iommu, Laurent Pinchart, Paolo Pisati,
	Gregory CLEMENT, Sebastian Hesselbarth, Florian Vaussard,
	linux-arm-kernel

On 05 Jul 01:59 PM, Greg Kroah-Hartman wrote:
> On Sat, Jul 05, 2014 at 12:03:08PM -0300, Ezequiel Garcia wrote:
> > After following Gregory's stacktrace (also reproduced here):
> > 
> > [<c02451f8>] (iommu_bus_notifier) from [<c00512e8>] (notifier_call_chain+0x64/0x9c)
> > [<c00512e8>] (notifier_call_chain) from [<c00514cc>] (__blocking_notifier_call_chain+0x40/0x58)
> > [<c00514cc>] (__blocking_notifier_call_chain) from [<c00514f8>] (blocking_notifier_call_chain+0x14/0x1c)
> > [<c00514f8>] (blocking_notifier_call_chain) from [<c01d225c>] (device_add+0x424/0x524)
> > [<c01d225c>] (device_add) from [<c0186d90>] (pci_device_add+0xec/0x110)
> > [<c0186d90>] (pci_device_add) from [<c0186e54>] (pci_scan_single_device+0xa0/0xac)
> > 
> > I added a few printks and found that the problem is that the iommu_bus_notifier is
> > called for the 'pci' bus type, which has a null iommu_ops.
> > 
> > On 04 Jul 10:47 AM, Laurent Pinchart wrote:
> > [..]
> > > 
> > > We need a quick fix for v3.16, ...
> > 
> > Therefore, a quick fix would be to simply check for that:
> > 
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index e5555fc..b712cb2 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -536,6 +536,9 @@ static int iommu_bus_notifier(struct notifier_block *nb,
> >         struct iommu_group *group;
> >         unsigned long group_action = 0;
> >  
> > +       if (!ops)
> > +               return 0;
> > +
> >         /*
> >          * ADD/DEL call into iommu driver ops if provided, which may
> >          * result in ADD/DEL notifiers to group->notifier
> > 
> > This (nasty workaround?) patch makes the problem go away.
> > 
> > [..]
> > > > So it also boot well in 3.15 and then failed in 3.16-rc3. I hope it will
> > > > help the developers of the OMAP IOMMU driver to fix it.
> > > 
> > > Thank you. I've had a look at the OMAP IOMMU driver changes between v3.15 and 
> > > v3.16-rc3, and didn't find at first sight any change that could explain the 
> > > crash.
> > > 
> > > 286f600 iommu/omap: Fix map protection value handling
> > > 67b779d iommu/omap: Remove comment about supporting single page mappings only
> > > f7129a0 iommu/omap: Fix 'no page for' debug message in flush_iotlb_page()
> > > 5acc97d iommu/omap: Move to_iommu definition from omap-iopgtable.h
> > > 2ac6133 iommu/omap: Remove omap_iommu_domain_has_cap() function
> > > d760e3e iommu/omap: Correct init value of iotlb_entry valid field
> > > 
> > > Could you try reverting those changes and retest ? If the problem doesn't 
> > > disappear, we'll need to look somewhere else.
> > > 
> > 
> > I reverted the above commits but nothing changed. I'm far from being an expert,
> > but it sounds odd to have this bus notifier (that got registered for the
> > platform bus type) called by a pci bus type.
> 
> Why wouldn't the PCI bus set this up for its devices?  Are you
> "assuming" you know the bus type and that's the issue?
> 

Thanks for looking at this.

I guess I snipped the thread and lost most of the information about the panic.
Here's the original bug report:

http://www.spinics.net/lists/arm-kernel/msg344059.html

The problem reported involves enabling OMAP IOMMU driver and not any other IOMMU
driver. Doing some tracing and adding a few prints, we found that
omap_iommu_init() sets a bus notifier for the platform bus type:

omap_iommu_init -> bus_set_iommu -> iommu_bus_init:

static void iommu_bus_init(struct bus_type *bus, struct iommu_ops *ops)
{
        bus_register_notifier(bus, &iommu_bus_nb);
        bus_for_each_dev(bus, NULL, ops, add_iommu_group);
}

But the iommu bus notifier gets called for the 'pci' bus type, which
has the iommu_ops field NULL (since it hasn't been set for iommu).

The panic is here when the NULL field is dereferenced:

static int iommu_bus_notifier(struct notifier_block *nb,
                              unsigned long action, void *data)
{
        struct device *dev = data;
        struct iommu_ops *ops = dev->bus->iommu_ops;


> I see the a number of different places this is being initialized for the
> pci bus.
> 
> Ah, look at drivers/iommu/fsl_pamu_domain.c, odds are, it shouldn't be
> doing that logic in the pamu_domain_init() code, using the same bus ops
> for different bus types, that's ripe for major problems...
> 

Maybe I'm missing something, but since the fsl driver is not enabled it has
nothing to do. And since the OMAP IOMMU doesn't set the iommu bus notifier
for the 'pci' bus type, I was wondering if there's a problem with the bus
notifier itself.

I hope the above makes sense.

Thanks again for helping out,
-- 
Ezequiel García, Free Electrons
Embedded Linux, Kernel and Android Engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 3.16rc3 multiplatform, Armada 370 and IOMMU: unbootable kernel
       [not found]                 ` <20140707105818.GA1101-nAQHv47ARr+vIlHkl8J1cg@public.gmane.org>
@ 2014-07-07 18:30                   ` Greg Kroah-Hartman
  2014-07-07 23:37                     ` Ezequiel Garcia
  0 siblings, 1 reply; 10+ messages in thread
From: Greg Kroah-Hartman @ 2014-07-07 18:30 UTC (permalink / raw)
  To: Ezequiel Garcia
  Cc: Thomas Petazzoni, Andrew Lunn, Jason Cooper,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Hiroshi.DOYU-xNZwKgViW5gAvxtiuMwx3w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Laurent Pinchart, Paolo Pisati, Gregory CLEMENT, Florian Vaussard,
	Sebastian Hesselbarth

On Mon, Jul 07, 2014 at 07:58:18AM -0300, Ezequiel Garcia wrote:
> On 05 Jul 01:59 PM, Greg Kroah-Hartman wrote:
> > On Sat, Jul 05, 2014 at 12:03:08PM -0300, Ezequiel Garcia wrote:
> > > After following Gregory's stacktrace (also reproduced here):
> > > 
> > > [<c02451f8>] (iommu_bus_notifier) from [<c00512e8>] (notifier_call_chain+0x64/0x9c)
> > > [<c00512e8>] (notifier_call_chain) from [<c00514cc>] (__blocking_notifier_call_chain+0x40/0x58)
> > > [<c00514cc>] (__blocking_notifier_call_chain) from [<c00514f8>] (blocking_notifier_call_chain+0x14/0x1c)
> > > [<c00514f8>] (blocking_notifier_call_chain) from [<c01d225c>] (device_add+0x424/0x524)
> > > [<c01d225c>] (device_add) from [<c0186d90>] (pci_device_add+0xec/0x110)
> > > [<c0186d90>] (pci_device_add) from [<c0186e54>] (pci_scan_single_device+0xa0/0xac)
> > > 
> > > I added a few printks and found that the problem is that the iommu_bus_notifier is
> > > called for the 'pci' bus type, which has a null iommu_ops.
> > > 
> > > On 04 Jul 10:47 AM, Laurent Pinchart wrote:
> > > [..]
> > > > 
> > > > We need a quick fix for v3.16, ...
> > > 
> > > Therefore, a quick fix would be to simply check for that:
> > > 
> > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > > index e5555fc..b712cb2 100644
> > > --- a/drivers/iommu/iommu.c
> > > +++ b/drivers/iommu/iommu.c
> > > @@ -536,6 +536,9 @@ static int iommu_bus_notifier(struct notifier_block *nb,
> > >         struct iommu_group *group;
> > >         unsigned long group_action = 0;
> > >  
> > > +       if (!ops)
> > > +               return 0;
> > > +
> > >         /*
> > >          * ADD/DEL call into iommu driver ops if provided, which may
> > >          * result in ADD/DEL notifiers to group->notifier
> > > 
> > > This (nasty workaround?) patch makes the problem go away.
> > > 
> > > [..]
> > > > > So it also boot well in 3.15 and then failed in 3.16-rc3. I hope it will
> > > > > help the developers of the OMAP IOMMU driver to fix it.
> > > > 
> > > > Thank you. I've had a look at the OMAP IOMMU driver changes between v3.15 and 
> > > > v3.16-rc3, and didn't find at first sight any change that could explain the 
> > > > crash.
> > > > 
> > > > 286f600 iommu/omap: Fix map protection value handling
> > > > 67b779d iommu/omap: Remove comment about supporting single page mappings only
> > > > f7129a0 iommu/omap: Fix 'no page for' debug message in flush_iotlb_page()
> > > > 5acc97d iommu/omap: Move to_iommu definition from omap-iopgtable.h
> > > > 2ac6133 iommu/omap: Remove omap_iommu_domain_has_cap() function
> > > > d760e3e iommu/omap: Correct init value of iotlb_entry valid field
> > > > 
> > > > Could you try reverting those changes and retest ? If the problem doesn't 
> > > > disappear, we'll need to look somewhere else.
> > > > 
> > > 
> > > I reverted the above commits but nothing changed. I'm far from being an expert,
> > > but it sounds odd to have this bus notifier (that got registered for the
> > > platform bus type) called by a pci bus type.
> > 
> > Why wouldn't the PCI bus set this up for its devices?  Are you
> > "assuming" you know the bus type and that's the issue?
> > 
> 
> Thanks for looking at this.
> 
> I guess I snipped the thread and lost most of the information about the panic.
> Here's the original bug report:
> 
> http://www.spinics.net/lists/arm-kernel/msg344059.html
> 
> The problem reported involves enabling OMAP IOMMU driver and not any other IOMMU
> driver. Doing some tracing and adding a few prints, we found that
> omap_iommu_init() sets a bus notifier for the platform bus type:
> 
> omap_iommu_init -> bus_set_iommu -> iommu_bus_init:
> 
> static void iommu_bus_init(struct bus_type *bus, struct iommu_ops *ops)
> {
>         bus_register_notifier(bus, &iommu_bus_nb);
>         bus_for_each_dev(bus, NULL, ops, add_iommu_group);
> }
> 
> But the iommu bus notifier gets called for the 'pci' bus type, which
> has the iommu_ops field NULL (since it hasn't been set for iommu).

So this is what needs to be figured out, how is the notifier being
called with a PCI device?  Who else called iommu_bus_init() for the PCI
bus?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 3.16rc3 multiplatform, Armada 370 and IOMMU: unbootable kernel
  2014-07-07 18:30                   ` Greg Kroah-Hartman
@ 2014-07-07 23:37                     ` Ezequiel Garcia
       [not found]                       ` <20140707233758.GA1456-nAQHv47ARr+vIlHkl8J1cg@public.gmane.org>
                                         ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Ezequiel Garcia @ 2014-07-07 23:37 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Paolo Pisati
  Cc: Thomas Petazzoni, Andrew Lunn, Jason Cooper, Joerg Roedel,
	Hiroshi.DOYU, linux-kernel, iommu, Laurent Pinchart,
	Gregory CLEMENT, Sebastian Hesselbarth, Florian Vaussard,
	linux-arm-kernel

On 07 Jul 11:30 AM, Greg Kroah-Hartman wrote:
> On Mon, Jul 07, 2014 at 07:58:18AM -0300, Ezequiel Garcia wrote:
[..]
> > 
> > I guess I snipped the thread and lost most of the information about the panic.
> > Here's the original bug report:
> > 
> > http://www.spinics.net/lists/arm-kernel/msg344059.html
> > 
> > The problem reported involves enabling OMAP IOMMU driver and not any other IOMMU
> > driver. Doing some tracing and adding a few prints, we found that
> > omap_iommu_init() sets a bus notifier for the platform bus type:
> > 
> > omap_iommu_init -> bus_set_iommu -> iommu_bus_init:
> > 
> > static void iommu_bus_init(struct bus_type *bus, struct iommu_ops *ops)
> > {
> >         bus_register_notifier(bus, &iommu_bus_nb);
> >         bus_for_each_dev(bus, NULL, ops, add_iommu_group);
> > }
> > 
> > But the iommu bus notifier gets called for the 'pci' bus type, which
> > has the iommu_ops field NULL (since it hasn't been set for iommu).
> 
> So this is what needs to be figured out, how is the notifier being
> called with a PCI device?  Who else called iommu_bus_init() for the PCI
> bus?
> 

Nobody. The only caller of iommu_bus_init() the above OMAP IOMMU.

However, I found something interesting. Tried to bisect this without much
luck; I did two bisects and ended up in the same merge commit:

# good: [0f16aa3c24a216d14d7f0587e1cbd2c1b51a38f3] Merge tag 'samsung-dt-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung into next/dt
# first bad commit: [755a9ba7bf24a45b6dbf8bb15a5a56c8ed12461a] Merge tag 'dt-for-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc into next

So, after doing a few diff's between that good and bad and searching for
"bus_notifier" changes, saw something strange in arch/arm/mach-mvebu/coherency.c.

It seems bus_register_notifier() is been called for platform and pci devices
with the *same* notifier block. Haven't looked close enough, but you mentioned
that could cause trouble?

This patch fixes the issue here:

diff --git a/arch/arm/mach-mvebu/coherency.c b/arch/arm/mach-mvebu/coherency.c
index 477202f..2bdc323 100644
--- a/arch/arm/mach-mvebu/coherency.c
+++ b/arch/arm/mach-mvebu/coherency.c
@@ -292,6 +292,10 @@ static struct notifier_block mvebu_hwcc_nb = {
 	.notifier_call = mvebu_hwcc_notifier,
 };
 
+static struct notifier_block mvebu_hwcc_pci_nb = {
+	.notifier_call = mvebu_hwcc_notifier,
+};
+
 static void __init armada_370_coherency_init(struct device_node *np)
 {
 	struct resource res;
@@ -427,7 +431,7 @@ static int __init coherency_pci_init(void)
 {
 	if (coherency_available())
 		bus_register_notifier(&pci_bus_type,
-				       &mvebu_hwcc_nb);
+				       &mvebu_hwcc_pci_nb);
 	return 0;
 }
 
Paolo, can you apply it and confirm it fixes the problem?

Greg, can you confirm using the same notifier block pointer
for two different bus types makes the bus notifier go nuts?

Thanks for the help,
-- 
Ezequiel García, Free Electrons
Embedded Linux, Kernel and Android Engineering
http://free-electrons.com

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: 3.16rc3 multiplatform, Armada 370 and IOMMU: unbootable kernel
       [not found]                       ` <20140707233758.GA1456-nAQHv47ARr+vIlHkl8J1cg@public.gmane.org>
@ 2014-07-07 23:47                         ` Greg Kroah-Hartman
  0 siblings, 0 replies; 10+ messages in thread
From: Greg Kroah-Hartman @ 2014-07-07 23:47 UTC (permalink / raw)
  To: Ezequiel Garcia
  Cc: Thomas Petazzoni, Andrew Lunn, Jason Cooper,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Hiroshi.DOYU-xNZwKgViW5gAvxtiuMwx3w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Laurent Pinchart, Paolo Pisati, Gregory CLEMENT, Florian Vaussard,
	Sebastian Hesselbarth

On Mon, Jul 07, 2014 at 08:37:58PM -0300, Ezequiel Garcia wrote:
> On 07 Jul 11:30 AM, Greg Kroah-Hartman wrote:
> > On Mon, Jul 07, 2014 at 07:58:18AM -0300, Ezequiel Garcia wrote:
> [..]
> > > 
> > > I guess I snipped the thread and lost most of the information about the panic.
> > > Here's the original bug report:
> > > 
> > > http://www.spinics.net/lists/arm-kernel/msg344059.html
> > > 
> > > The problem reported involves enabling OMAP IOMMU driver and not any other IOMMU
> > > driver. Doing some tracing and adding a few prints, we found that
> > > omap_iommu_init() sets a bus notifier for the platform bus type:
> > > 
> > > omap_iommu_init -> bus_set_iommu -> iommu_bus_init:
> > > 
> > > static void iommu_bus_init(struct bus_type *bus, struct iommu_ops *ops)
> > > {
> > >         bus_register_notifier(bus, &iommu_bus_nb);
> > >         bus_for_each_dev(bus, NULL, ops, add_iommu_group);
> > > }
> > > 
> > > But the iommu bus notifier gets called for the 'pci' bus type, which
> > > has the iommu_ops field NULL (since it hasn't been set for iommu).
> > 
> > So this is what needs to be figured out, how is the notifier being
> > called with a PCI device?  Who else called iommu_bus_init() for the PCI
> > bus?
> > 
> 
> Nobody. The only caller of iommu_bus_init() the above OMAP IOMMU.
> 
> However, I found something interesting. Tried to bisect this without much
> luck; I did two bisects and ended up in the same merge commit:
> 
> # good: [0f16aa3c24a216d14d7f0587e1cbd2c1b51a38f3] Merge tag 'samsung-dt-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung into next/dt
> # first bad commit: [755a9ba7bf24a45b6dbf8bb15a5a56c8ed12461a] Merge tag 'dt-for-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc into next
> 
> So, after doing a few diff's between that good and bad and searching for
> "bus_notifier" changes, saw something strange in arch/arm/mach-mvebu/coherency.c.
> 
> It seems bus_register_notifier() is been called for platform and pci devices
> with the *same* notifier block. Haven't looked close enough, but you mentioned
> that could cause trouble?
> 
> This patch fixes the issue here:
> 
> diff --git a/arch/arm/mach-mvebu/coherency.c b/arch/arm/mach-mvebu/coherency.c
> index 477202f..2bdc323 100644
> --- a/arch/arm/mach-mvebu/coherency.c
> +++ b/arch/arm/mach-mvebu/coherency.c
> @@ -292,6 +292,10 @@ static struct notifier_block mvebu_hwcc_nb = {
>  	.notifier_call = mvebu_hwcc_notifier,
>  };
>  
> +static struct notifier_block mvebu_hwcc_pci_nb = {
> +	.notifier_call = mvebu_hwcc_notifier,
> +};
> +
>  static void __init armada_370_coherency_init(struct device_node *np)
>  {
>  	struct resource res;
> @@ -427,7 +431,7 @@ static int __init coherency_pci_init(void)
>  {
>  	if (coherency_available())
>  		bus_register_notifier(&pci_bus_type,
> -				       &mvebu_hwcc_nb);
> +				       &mvebu_hwcc_pci_nb);
>  	return 0;
>  }
>  
> Paolo, can you apply it and confirm it fixes the problem?
> 
> Greg, can you confirm using the same notifier block pointer
> for two different bus types makes the bus notifier go nuts?

The bus notifier doesn't "go nuts", but the call back functions must be
able to handle the devices for that specific bus being passed to the
function for it.

Again, you shouldn't ever mix bus types with the same notifier unless
you _really_ know what you are doing, and even then, I wouldn't
recommend it.

greg k-h

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 3.16rc3 multiplatform, Armada 370 and IOMMU: unbootable kernel
  2014-07-07 23:37                     ` Ezequiel Garcia
       [not found]                       ` <20140707233758.GA1456-nAQHv47ARr+vIlHkl8J1cg@public.gmane.org>
@ 2014-07-08  7:41                       ` Thomas Petazzoni
  2014-07-08  9:20                         ` Paolo Pisati
  2014-07-08 12:01                       ` Jason Cooper
  2 siblings, 1 reply; 10+ messages in thread
From: Thomas Petazzoni @ 2014-07-08  7:41 UTC (permalink / raw)
  To: Ezequiel Garcia
  Cc: Greg Kroah-Hartman, Paolo Pisati, Andrew Lunn, Jason Cooper,
	Joerg Roedel, Hiroshi.DOYU, linux-kernel, iommu, Laurent Pinchart,
	Gregory CLEMENT, Sebastian Hesselbarth, Florian Vaussard,
	linux-arm-kernel

Dear Ezequiel Garcia,

On Mon, 7 Jul 2014 20:37:58 -0300, Ezequiel Garcia wrote:

> It seems bus_register_notifier() is been called for platform and pci devices
> with the *same* notifier block. Haven't looked close enough, but you mentioned
> that could cause trouble?
> 
> This patch fixes the issue here:
> 
> diff --git a/arch/arm/mach-mvebu/coherency.c b/arch/arm/mach-mvebu/coherency.c
> index 477202f..2bdc323 100644
> --- a/arch/arm/mach-mvebu/coherency.c
> +++ b/arch/arm/mach-mvebu/coherency.c
> @@ -292,6 +292,10 @@ static struct notifier_block mvebu_hwcc_nb = {
>  	.notifier_call = mvebu_hwcc_notifier,
>  };
>  
> +static struct notifier_block mvebu_hwcc_pci_nb = {
> +	.notifier_call = mvebu_hwcc_notifier,
> +};
> +
>  static void __init armada_370_coherency_init(struct device_node *np)
>  {
>  	struct resource res;
> @@ -427,7 +431,7 @@ static int __init coherency_pci_init(void)
>  {
>  	if (coherency_available())
>  		bus_register_notifier(&pci_bus_type,
> -				       &mvebu_hwcc_nb);
> +				       &mvebu_hwcc_pci_nb);
>  	return 0;
>  }
>  
> Paolo, can you apply it and confirm it fixes the problem?
> 
> Greg, can you confirm using the same notifier block pointer
> for two different bus types makes the bus notifier go nuts?

Looking at how notifier_chain_register() is implemented (which gets
called by bus_register_notifier() ->
blocking_notifier_chain_register()), I indeed don't see how a single
'struct notifier_block' can be registered on multiple notifier chains,
so I believe your patch is correct.

Thanks for the investigation!

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 3.16rc3 multiplatform, Armada 370 and IOMMU: unbootable kernel
  2014-07-08  7:41                       ` Thomas Petazzoni
@ 2014-07-08  9:20                         ` Paolo Pisati
  0 siblings, 0 replies; 10+ messages in thread
From: Paolo Pisati @ 2014-07-08  9:20 UTC (permalink / raw)
  To: Thomas Petazzoni
  Cc: Ezequiel Garcia, Greg Kroah-Hartman, Andrew Lunn, Jason Cooper,
	Joerg Roedel, Hiroshi.DOYU, linux-kernel, iommu, Laurent Pinchart,
	Gregory CLEMENT, Sebastian Hesselbarth, Florian Vaussard,
	linux-arm-kernel@lists.infradead.org

On Tue, Jul 8, 2014 at 9:41 AM, Thomas Petazzoni
<thomas.petazzoni@free-electrons.com> wrote:
> Dear Ezequiel Garcia,
>
> On Mon, 7 Jul 2014 20:37:58 -0300, Ezequiel Garcia wrote:
>
>> It seems bus_register_notifier() is been called for platform and pci devices
>> with the *same* notifier block. Haven't looked close enough, but you mentioned
>> that could cause trouble?
>>
>> This patch fixes the issue here:
>>
>> diff --git a/arch/arm/mach-mvebu/coherency.c b/arch/arm/mach-mvebu/coherency.c
>> index 477202f..2bdc323 100644
>> --- a/arch/arm/mach-mvebu/coherency.c
>> +++ b/arch/arm/mach-mvebu/coherency.c
>> @@ -292,6 +292,10 @@ static struct notifier_block mvebu_hwcc_nb = {
>>       .notifier_call = mvebu_hwcc_notifier,
>>  };
>>
>> +static struct notifier_block mvebu_hwcc_pci_nb = {
>> +     .notifier_call = mvebu_hwcc_notifier,
>> +};
>> +
>>  static void __init armada_370_coherency_init(struct device_node *np)
>>  {
>>       struct resource res;
>> @@ -427,7 +431,7 @@ static int __init coherency_pci_init(void)
>>  {
>>       if (coherency_available())
>>               bus_register_notifier(&pci_bus_type,
>> -                                    &mvebu_hwcc_nb);
>> +                                    &mvebu_hwcc_pci_nb);
>>       return 0;
>>  }
>>
>> Paolo, can you apply it and confirm it fixes the problem?
>>
>> Greg, can you confirm using the same notifier block pointer
>> for two different bus types makes the bus notifier go nuts?
>
> Looking at how notifier_chain_register() is implemented (which gets
> called by bus_register_notifier() ->
> blocking_notifier_chain_register()), I indeed don't see how a single
> 'struct notifier_block' can be registered on multiple notifier chains,
> so I believe your patch is correct.
>

yes, i confim this patch fixes my issue, thanks!

-- 
bye,
p.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 3.16rc3 multiplatform, Armada 370 and IOMMU: unbootable kernel
  2014-07-07 23:37                     ` Ezequiel Garcia
       [not found]                       ` <20140707233758.GA1456-nAQHv47ARr+vIlHkl8J1cg@public.gmane.org>
  2014-07-08  7:41                       ` Thomas Petazzoni
@ 2014-07-08 12:01                       ` Jason Cooper
  2 siblings, 0 replies; 10+ messages in thread
From: Jason Cooper @ 2014-07-08 12:01 UTC (permalink / raw)
  To: Ezequiel Garcia
  Cc: Greg Kroah-Hartman, Paolo Pisati, Thomas Petazzoni, Andrew Lunn,
	Joerg Roedel, Hiroshi.DOYU, linux-kernel, iommu, Laurent Pinchart,
	Gregory CLEMENT, Sebastian Hesselbarth, Florian Vaussard,
	linux-arm-kernel

Ezequiel,

On Mon, Jul 07, 2014 at 08:37:58PM -0300, Ezequiel Garcia wrote:
> On 07 Jul 11:30 AM, Greg Kroah-Hartman wrote:
> > On Mon, Jul 07, 2014 at 07:58:18AM -0300, Ezequiel Garcia wrote:
> [..]
> > > 
> > > I guess I snipped the thread and lost most of the information about the panic.
> > > Here's the original bug report:
> > > 
> > > http://www.spinics.net/lists/arm-kernel/msg344059.html
> > > 
> > > The problem reported involves enabling OMAP IOMMU driver and not any other IOMMU
> > > driver. Doing some tracing and adding a few prints, we found that
> > > omap_iommu_init() sets a bus notifier for the platform bus type:
> > > 
> > > omap_iommu_init -> bus_set_iommu -> iommu_bus_init:
> > > 
> > > static void iommu_bus_init(struct bus_type *bus, struct iommu_ops *ops)
> > > {
> > >         bus_register_notifier(bus, &iommu_bus_nb);
> > >         bus_for_each_dev(bus, NULL, ops, add_iommu_group);
> > > }
> > > 
> > > But the iommu bus notifier gets called for the 'pci' bus type, which
> > > has the iommu_ops field NULL (since it hasn't been set for iommu).
> > 
> > So this is what needs to be figured out, how is the notifier being
> > called with a PCI device?  Who else called iommu_bus_init() for the PCI
> > bus?
> > 
> 
> Nobody. The only caller of iommu_bus_init() the above OMAP IOMMU.
> 
> However, I found something interesting. Tried to bisect this without much
> luck; I did two bisects and ended up in the same merge commit:
> 
> # good: [0f16aa3c24a216d14d7f0587e1cbd2c1b51a38f3] Merge tag 'samsung-dt-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung into next/dt
> # first bad commit: [755a9ba7bf24a45b6dbf8bb15a5a56c8ed12461a] Merge tag 'dt-for-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc into next
> 
> So, after doing a few diff's between that good and bad and searching for
> "bus_notifier" changes, saw something strange in arch/arm/mach-mvebu/coherency.c.
> 
> It seems bus_register_notifier() is been called for platform and pci devices
> with the *same* notifier block. Haven't looked close enough, but you mentioned
> that could cause trouble?
> 
> This patch fixes the issue here:

Thanks for digging in to this!  Please submit this as patch on it's own
and I'll merge it in.

thx,

Jason.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-07-08 12:01 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20140703135146.GA6898@luxor.wired.org>
     [not found] ` <53B5C61D.2060308@free-electrons.com>
     [not found]   ` <53B5CA26.7050405@free-electrons.com>
     [not found]     ` <53B5CA26.7050405-wi1+55ScJUtKEb57/3fJTNBPR1lH4CV8@public.gmane.org>
2014-07-04  8:47       ` 3.16rc3 multiplatform, Armada 370 and IOMMU: unbootable kernel Laurent Pinchart
2014-07-05 15:03         ` Ezequiel Garcia
     [not found]           ` <20140705150308.GA28791-nAQHv47ARr+vIlHkl8J1cg@public.gmane.org>
2014-07-05 20:59             ` Greg Kroah-Hartman
2014-07-07 10:58               ` Ezequiel Garcia
     [not found]                 ` <20140707105818.GA1101-nAQHv47ARr+vIlHkl8J1cg@public.gmane.org>
2014-07-07 18:30                   ` Greg Kroah-Hartman
2014-07-07 23:37                     ` Ezequiel Garcia
     [not found]                       ` <20140707233758.GA1456-nAQHv47ARr+vIlHkl8J1cg@public.gmane.org>
2014-07-07 23:47                         ` Greg Kroah-Hartman
2014-07-08  7:41                       ` Thomas Petazzoni
2014-07-08  9:20                         ` Paolo Pisati
2014-07-08 12:01                       ` Jason Cooper

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).