linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* -next fails to boot as of today on S3C6410
@ 2011-11-22 19:27 Mark Brown
  2011-11-22 19:31 ` Russell King - ARM Linux
  0 siblings, 1 reply; 30+ messages in thread
From: Mark Brown @ 2011-11-22 19:27 UTC (permalink / raw)
  To: linux-arm-kernel

Testing with my s3c6410 based platform I'm seeing no output after
"Uncompressing Linux... done, booting the kernel." when running -next.
Yesterday's -next booted fine.

I've verified that this was introduced in the Russell's for-next, and
bisection of that branch tells me that the offending commit is "ARM:
vic: device tree binding" though an attempt to revert that in -next
failed.  This is especially surprising as I have USE_OF turned off, I'm
guessing that the issue is something to do with the conversion to use an
irq_domain that happened in there.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* -next fails to boot as of today on S3C6410
  2011-11-22 19:27 -next fails to boot as of today on S3C6410 Mark Brown
@ 2011-11-22 19:31 ` Russell King - ARM Linux
  2011-11-22 19:39   ` Mark Brown
  0 siblings, 1 reply; 30+ messages in thread
From: Russell King - ARM Linux @ 2011-11-22 19:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Nov 22, 2011 at 07:27:41PM +0000, Mark Brown wrote:
> Testing with my s3c6410 based platform I'm seeing no output after
> "Uncompressing Linux... done, booting the kernel." when running -next.
> Yesterday's -next booted fine.
> 
> I've verified that this was introduced in the Russell's for-next, and
> bisection of that branch tells me that the offending commit is "ARM:
> vic: device tree binding" though an attempt to revert that in -next
> failed.  This is especially surprising as I have USE_OF turned off, I'm
> guessing that the issue is something to do with the conversion to use an
> irq_domain that happened in there.

Thanks for the report.  I'll hold off pushing this stuff into the
devel-stable branch while Jamie looks into this so that we can fix
it and avoid having a bisection failure in this set of commits.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* -next fails to boot as of today on S3C6410
  2011-11-22 19:31 ` Russell King - ARM Linux
@ 2011-11-22 19:39   ` Mark Brown
  2011-11-22 22:21     ` Jamie Iles
  0 siblings, 1 reply; 30+ messages in thread
From: Mark Brown @ 2011-11-22 19:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Nov 22, 2011 at 07:31:24PM +0000, Russell King - ARM Linux wrote:
> On Tue, Nov 22, 2011 at 07:27:41PM +0000, Mark Brown wrote:

> > I've verified that this was introduced in the Russell's for-next, and
> > bisection of that branch tells me that the offending commit is "ARM:
> > vic: device tree binding" though an attempt to revert that in -next
> > failed.  This is especially surprising as I have USE_OF turned off, I'm

> Thanks for the report.  I'll hold off pushing this stuff into the
> devel-stable branch while Jamie looks into this so that we can fix
> it and avoid having a bisection failure in this set of commits.

Thanks.  Note that I'm not 100% sure I believe the bisection result as
reverting didn't fix the issue and I've stared at the code a bit without
seeing anything that set off alarm bells.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* -next fails to boot as of today on S3C6410
  2011-11-22 19:39   ` Mark Brown
@ 2011-11-22 22:21     ` Jamie Iles
  2011-11-22 22:58       ` Mark Brown
                         ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Jamie Iles @ 2011-11-22 22:21 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Mark,

On Tue, Nov 22, 2011 at 07:39:57PM +0000, Mark Brown wrote:
> On Tue, Nov 22, 2011 at 07:31:24PM +0000, Russell King - ARM Linux wrote:
> > On Tue, Nov 22, 2011 at 07:27:41PM +0000, Mark Brown wrote:
> 
> > > I've verified that this was introduced in the Russell's for-next, and
> > > bisection of that branch tells me that the offending commit is "ARM:
> > > vic: device tree binding" though an attempt to revert that in -next
> > > failed.  This is especially surprising as I have USE_OF turned off, I'm
> 
> > Thanks for the report.  I'll hold off pushing this stuff into the
> > devel-stable branch while Jamie looks into this so that we can fix
> > it and avoid having a bisection failure in this set of commits.
> 
> Thanks.  Note that I'm not 100% sure I believe the bisection result as
> reverting didn't fix the issue and I've stared at the code a bit without
> seeing anything that set off alarm bells.

No, you're right - this is the offending commit.  It actually needs 
this[1] fix and things should be okay.

Thomas, Rob, would one of you be able to apply this please?  I'm not 
sure if this would normally go through Grant or not.

Thanks,

Jamie

1. https://lkml.org/lkml/2011/11/10/186

^ permalink raw reply	[flat|nested] 30+ messages in thread

* -next fails to boot as of today on S3C6410
  2011-11-22 22:21     ` Jamie Iles
@ 2011-11-22 22:58       ` Mark Brown
  2011-11-22 23:03         ` Jamie Iles
  2011-11-22 23:34       ` Rob Herring
  2011-11-23 12:05       ` Mark Brown
  2 siblings, 1 reply; 30+ messages in thread
From: Mark Brown @ 2011-11-22 22:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Nov 22, 2011 at 10:21:35PM +0000, Jamie Iles wrote:
> On Tue, Nov 22, 2011 at 07:39:57PM +0000, Mark Brown wrote:

> > Thanks.  Note that I'm not 100% sure I believe the bisection result as
> > reverting didn't fix the issue and I've stared at the code a bit without
> > seeing anything that set off alarm bells.

> No, you're right - this is the offending commit.  It actually needs 
> this[1] fix and things should be okay.

> Thomas, Rob, would one of you be able to apply this please?  I'm not 
> sure if this would normally go through Grant or not.

Gah, right.  I'll give this a test tomorrow.

You *really* should point out dependencies like this when sending
patches so we can avoid this sort of bisection breakage when posting
things so that whoever's applying the patches can avoid introducing this
sort of breakage and so that we don't have people like me sitting trying
to work out if they did something thick.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* -next fails to boot as of today on S3C6410
  2011-11-22 22:58       ` Mark Brown
@ 2011-11-22 23:03         ` Jamie Iles
  0 siblings, 0 replies; 30+ messages in thread
From: Jamie Iles @ 2011-11-22 23:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Nov 22, 2011 at 10:58:49PM +0000, Mark Brown wrote:
> On Tue, Nov 22, 2011 at 10:21:35PM +0000, Jamie Iles wrote:
> > On Tue, Nov 22, 2011 at 07:39:57PM +0000, Mark Brown wrote:
> 
> > > Thanks.  Note that I'm not 100% sure I believe the bisection result as
> > > reverting didn't fix the issue and I've stared at the code a bit without
> > > seeing anything that set off alarm bells.
> 
> > No, you're right - this is the offending commit.  It actually needs 
> > this[1] fix and things should be okay.
> 
> > Thomas, Rob, would one of you be able to apply this please?  I'm not 
> > sure if this would normally go through Grant or not.
> 
> Gah, right.  I'll give this a test tomorrow.
> 
> You *really* should point out dependencies like this when sending
> patches so we can avoid this sort of bisection breakage when posting
> things so that whoever's applying the patches can avoid introducing this
> sort of breakage and so that we don't have people like me sitting trying
> to work out if they did something thick.

Sorry about that.  At the time I thought the fix was small enough that 
it would get applied quickly without problems and would be in way before 
these patches, but I didn't follow it up early enough.  I'll make sure 
that it doesn't happen again though.

Thanks,

Jamie

^ permalink raw reply	[flat|nested] 30+ messages in thread

* -next fails to boot as of today on S3C6410
  2011-11-22 22:21     ` Jamie Iles
  2011-11-22 22:58       ` Mark Brown
@ 2011-11-22 23:34       ` Rob Herring
  2011-11-23 12:05       ` Mark Brown
  2 siblings, 0 replies; 30+ messages in thread
From: Rob Herring @ 2011-11-22 23:34 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/22/2011 04:21 PM, Jamie Iles wrote:
> Hi Mark,
> 
> On Tue, Nov 22, 2011 at 07:39:57PM +0000, Mark Brown wrote:
>> On Tue, Nov 22, 2011 at 07:31:24PM +0000, Russell King - ARM Linux wrote:
>>> On Tue, Nov 22, 2011 at 07:27:41PM +0000, Mark Brown wrote:
>>
>>>> I've verified that this was introduced in the Russell's for-next, and
>>>> bisection of that branch tells me that the offending commit is "ARM:
>>>> vic: device tree binding" though an attempt to revert that in -next
>>>> failed.  This is especially surprising as I have USE_OF turned off, I'm
>>
>>> Thanks for the report.  I'll hold off pushing this stuff into the
>>> devel-stable branch while Jamie looks into this so that we can fix
>>> it and avoid having a bisection failure in this set of commits.
>>
>> Thanks.  Note that I'm not 100% sure I believe the bisection result as
>> reverting didn't fix the issue and I've stared at the code a bit without
>> seeing anything that set off alarm bells.
> 
> No, you're right - this is the offending commit.  It actually needs 
> this[1] fix and things should be okay.
> 
> Thomas, Rob, would one of you be able to apply this please?  I'm not 
> sure if this would normally go through Grant or not.
> 

I think it should go thru Thomas or have his ack if not.

Rob

^ permalink raw reply	[flat|nested] 30+ messages in thread

* -next fails to boot as of today on S3C6410
  2011-11-22 22:21     ` Jamie Iles
  2011-11-22 22:58       ` Mark Brown
  2011-11-22 23:34       ` Rob Herring
@ 2011-11-23 12:05       ` Mark Brown
  2011-11-23 12:28         ` Jamie Iles
  2 siblings, 1 reply; 30+ messages in thread
From: Mark Brown @ 2011-11-23 12:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Nov 22, 2011 at 10:21:35PM +0000, Jamie Iles wrote:

> No, you're right - this is the offending commit.  It actually needs 
> this[1] fix and things should be okay.

> 1. https://lkml.org/lkml/2011/11/10/186

I've tested with that patch and it doesn't seem to be the only fix
that's required - my system still exhibits the same behaviour.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* -next fails to boot as of today on S3C6410
  2011-11-23 12:05       ` Mark Brown
@ 2011-11-23 12:28         ` Jamie Iles
  2011-11-23 13:01           ` Mark Brown
  0 siblings, 1 reply; 30+ messages in thread
From: Jamie Iles @ 2011-11-23 12:28 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Mark,

On Wed, Nov 23, 2011 at 12:05:35PM +0000, Mark Brown wrote:
> On Tue, Nov 22, 2011 at 10:21:35PM +0000, Jamie Iles wrote:
> 
> > No, you're right - this is the offending commit.  It actually needs 
> > this[1] fix and things should be okay.
> 
> > 1. https://lkml.org/lkml/2011/11/10/186
> 
> I've tested with that patch and it doesn't seem to be the only fix
> that's required - my system still exhibits the same behaviour.

Hmm, I really can't see what could cause this - we don't have a to_irq() 
ops method, hwirq_base is correctly initialised to 0 with 32 irqs per 
VIC and we aren't using dynamic irq_desc's.

Do you have any low-level serial debug?

Thanks,

Jamie

^ permalink raw reply	[flat|nested] 30+ messages in thread

* -next fails to boot as of today on S3C6410
  2011-11-23 12:28         ` Jamie Iles
@ 2011-11-23 13:01           ` Mark Brown
  2011-11-23 13:32             ` Jamie Iles
  0 siblings, 1 reply; 30+ messages in thread
From: Mark Brown @ 2011-11-23 13:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Nov 23, 2011 at 12:28:55PM +0000, Jamie Iles wrote:

> Hmm, I really can't see what could cause this - we don't have a to_irq() 
> ops method, hwirq_base is correctly initialised to 0 with 32 irqs per 
> VIC and we aren't using dynamic irq_desc's.

Are you sure the numbers come out the same as they would otherwise.

> Do you have any low-level serial debug?

I tried adding some yesterday around the VIC registration in the CPU
code but it didn't actually appear on the console so I'm none the wiser.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* -next fails to boot as of today on S3C6410
  2011-11-23 13:01           ` Mark Brown
@ 2011-11-23 13:32             ` Jamie Iles
  2011-11-23 13:50               ` Mark Brown
  0 siblings, 1 reply; 30+ messages in thread
From: Jamie Iles @ 2011-11-23 13:32 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Mark,

On Wed, Nov 23, 2011 at 01:01:35PM +0000, Mark Brown wrote:
> On Wed, Nov 23, 2011 at 12:28:55PM +0000, Jamie Iles wrote:
> 
> > Hmm, I really can't see what could cause this - we don't have a to_irq() 
> > ops method, hwirq_base is correctly initialised to 0 with 32 irqs per 
> > VIC and we aren't using dynamic irq_desc's.
> 
> Are you sure the numbers come out the same as they would otherwise.

Yes, I've just tried a little bit of test code (I don't have a non-DT 
VIC platform) but for each IRQ (starting from 32 like your platform) for 
2 VIC's it just sets the hwirq (0->31 for each VIC) for each IRQ and 
that's all the IRQ domain stuff does for non-DT platforms.

> > Do you have any low-level serial debug?
> 
> I tried adding some yesterday around the VIC registration in the CPU
> code but it didn't actually appear on the console so I'm none the wiser.

OK.  You mentioned you had trouble reverting the patch yesterday - I've 
pushed a branch (git://github.com/jamieiles/linux-2.6-ji.git 
vic-dt-revert-next) which is today's next with all of the VIC patches 
reverted if you want to try that to be sure.  I've built s3c6400_defconfig 
successfully, but can't test it unless there's a qemu model somewhere?

Thanks,

Jamie

^ permalink raw reply	[flat|nested] 30+ messages in thread

* -next fails to boot as of today on S3C6410
  2011-11-23 13:32             ` Jamie Iles
@ 2011-11-23 13:50               ` Mark Brown
  2011-11-23 14:33                 ` Jamie Iles
  0 siblings, 1 reply; 30+ messages in thread
From: Mark Brown @ 2011-11-23 13:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Nov 23, 2011 at 01:32:37PM +0000, Jamie Iles wrote:
> On Wed, Nov 23, 2011 at 01:01:35PM +0000, Mark Brown wrote:

> > I tried adding some yesterday around the VIC registration in the CPU
> > code but it didn't actually appear on the console so I'm none the wiser.

> OK.  You mentioned you had trouble reverting the patch yesterday - I've 
> pushed a branch (git://github.com/jamieiles/linux-2.6-ji.git 
> vic-dt-revert-next) which is today's next with all of the VIC patches 
> reverted if you want to try that to be sure.  I've built s3c6400_defconfig 
> successfully, but can't test it unless there's a qemu model somewhere?

It wasn't the mechanics of reverting that caused problems, it was the
fact that when I reverted things still didn't work.  Actually, I've gone
back and retested with the for-next branch from today and it appears
that for some reason the issue has gone away there which is a bit
confusing though as I write this it occurs to me that I did turn device
tree back on (my board doesn't use it but I typically build it in) which
I had off when I tested yesterday and would explain the issue.

It looks like there's a second bug breaking the boot in there - testing
your commit the board boots but if I test the tip of Russell's for-next
then that breaks again.  I'm just starting another bisect and
considering sending a patch to add a git finger-point alias.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* -next fails to boot as of today on S3C6410
  2011-11-23 13:50               ` Mark Brown
@ 2011-11-23 14:33                 ` Jamie Iles
  2011-11-23 14:55                   ` Mark Brown
  0 siblings, 1 reply; 30+ messages in thread
From: Jamie Iles @ 2011-11-23 14:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Nov 23, 2011 at 01:50:34PM +0000, Mark Brown wrote:
> On Wed, Nov 23, 2011 at 01:32:37PM +0000, Jamie Iles wrote:
> > On Wed, Nov 23, 2011 at 01:01:35PM +0000, Mark Brown wrote:
> 
> > > I tried adding some yesterday around the VIC registration in the CPU
> > > code but it didn't actually appear on the console so I'm none the wiser.
> 
> > OK.  You mentioned you had trouble reverting the patch yesterday - I've 
> > pushed a branch (git://github.com/jamieiles/linux-2.6-ji.git 
> > vic-dt-revert-next) which is today's next with all of the VIC patches 
> > reverted if you want to try that to be sure.  I've built s3c6400_defconfig 
> > successfully, but can't test it unless there's a qemu model somewhere?
> 
> It wasn't the mechanics of reverting that caused problems, it was the
> fact that when I reverted things still didn't work.  Actually, I've gone
> back and retested with the for-next branch from today and it appears
> that for some reason the issue has gone away there which is a bit
> confusing though as I write this it occurs to me that I did turn device
> tree back on (my board doesn't use it but I typically build it in) which
> I had off when I tested yesterday and would explain the issue.

OK, I've just tried hacking something on picoxcell that uses IRQ numbers 
starting at 32 rather than 0 and registering the vic manually rather 
than using the device tree binding and it all runs nicely.  So USE_OF is 
enabled, but it's as close as I can get to your system and it isn't 
using any of the DT stuff in the VIC driver.

> It looks like there's a second bug breaking the boot in there - testing
> your commit the board boots but if I test the tip of Russell's for-next
> then that breaks again.  I'm just starting another bisect and
> considering sending a patch to add a git finger-point alias.

Thanks for your patience Mark!  Please let me know if there's anything I 
can do to help.

Jamie

^ permalink raw reply	[flat|nested] 30+ messages in thread

* -next fails to boot as of today on S3C6410
  2011-11-23 14:33                 ` Jamie Iles
@ 2011-11-23 14:55                   ` Mark Brown
  2011-11-23 17:52                     ` Rob Herring
  0 siblings, 1 reply; 30+ messages in thread
From: Mark Brown @ 2011-11-23 14:55 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Nov 23, 2011 at 02:33:22PM +0000, Jamie Iles wrote:
> On Wed, Nov 23, 2011 at 01:50:34PM +0000, Mark Brown wrote:

> > It looks like there's a second bug breaking the boot in there - testing
> > your commit the board boots but if I test the tip of Russell's for-next
> > then that breaks again.  I'm just starting another bisect and
> > considering sending a patch to add a git finger-point alias.

> Thanks for your patience Mark!  Please let me know if there's anything I 
> can do to help.

My bisect turned up a second candidate - Nicholas' commit 549158d (ARM:
move iotable mappings within the vmalloc region).  I reverted that and
subsequent commits depending on it and my system now appears to boot
fine with -next (with OF turned on to get the ops for the VIC).  This
code is all a bit deep for me so I've no real idea what might be wrong.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* -next fails to boot as of today on S3C6410
  2011-11-23 14:55                   ` Mark Brown
@ 2011-11-23 17:52                     ` Rob Herring
  2011-11-23 17:54                       ` Mark Brown
  0 siblings, 1 reply; 30+ messages in thread
From: Rob Herring @ 2011-11-23 17:52 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/23/2011 08:55 AM, Mark Brown wrote:
> On Wed, Nov 23, 2011 at 02:33:22PM +0000, Jamie Iles wrote:
>> On Wed, Nov 23, 2011 at 01:50:34PM +0000, Mark Brown wrote:
> 
>>> It looks like there's a second bug breaking the boot in there - testing
>>> your commit the board boots but if I test the tip of Russell's for-next
>>> then that breaks again.  I'm just starting another bisect and
>>> considering sending a patch to add a git finger-point alias.
> 
>> Thanks for your patience Mark!  Please let me know if there's anything I 
>> can do to help.
> 
> My bisect turned up a second candidate - Nicholas' commit 549158d (ARM:
> move iotable mappings within the vmalloc region).  I reverted that and
> subsequent commits depending on it and my system now appears to boot
> fine with -next (with OF turned on to get the ops for the VIC).  This
> code is all a bit deep for me so I've no real idea what might be wrong.

The first suspect should be overlapping static mappings in the iotable.
This was allowed before and now is not. Looking at some of the 6410
platforms, I don't see any though. But you didn't mention which platform
you are on. There is a 1KB mapping which is a bit unusual.

Rob

^ permalink raw reply	[flat|nested] 30+ messages in thread

* -next fails to boot as of today on S3C6410
  2011-11-23 17:52                     ` Rob Herring
@ 2011-11-23 17:54                       ` Mark Brown
  2011-11-23 19:08                         ` Nicolas Pitre
  0 siblings, 1 reply; 30+ messages in thread
From: Mark Brown @ 2011-11-23 17:54 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Nov 23, 2011 at 11:52:10AM -0600, Rob Herring wrote:
> On 11/23/2011 08:55 AM, Mark Brown wrote:

> > My bisect turned up a second candidate - Nicholas' commit 549158d (ARM:
> > move iotable mappings within the vmalloc region).  I reverted that and
> > subsequent commits depending on it and my system now appears to boot
> > fine with -next (with OF turned on to get the ops for the VIC).  This
> > code is all a bit deep for me so I've no real idea what might be wrong.

> The first suspect should be overlapping static mappings in the iotable.
> This was allowed before and now is not. Looking at some of the 6410
> platforms, I don't see any though. But you didn't mention which platform
> you are on. There is a 1KB mapping which is a bit unusual.

I'm on Cragganmore 6410 which just calls a bunch of arch-generic stuff
(though now I look I do see a comment saying that we're relying on some
setup from the bootloader which I get may have been trashed...).

^ permalink raw reply	[flat|nested] 30+ messages in thread

* -next fails to boot as of today on S3C6410
  2011-11-23 17:54                       ` Mark Brown
@ 2011-11-23 19:08                         ` Nicolas Pitre
  2011-11-24  0:32                           ` Nicolas Pitre
  0 siblings, 1 reply; 30+ messages in thread
From: Nicolas Pitre @ 2011-11-23 19:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 23 Nov 2011, Mark Brown wrote:

> On Wed, Nov 23, 2011 at 11:52:10AM -0600, Rob Herring wrote:
> > On 11/23/2011 08:55 AM, Mark Brown wrote:
> 
> > > My bisect turned up a second candidate - Nicholas' commit 549158d (ARM:
> > > move iotable mappings within the vmalloc region).  I reverted that and
> > > subsequent commits depending on it and my system now appears to boot
> > > fine with -next (with OF turned on to get the ops for the VIC).  This
> > > code is all a bit deep for me so I've no real idea what might be wrong.
> 
> > The first suspect should be overlapping static mappings in the iotable.
> > This was allowed before and now is not. Looking at some of the 6410
> > platforms, I don't see any though. But you didn't mention which platform
> > you are on. There is a 1KB mapping which is a bit unusual.
> 
> I'm on Cragganmore 6410 which just calls a bunch of arch-generic stuff
> (though now I look I do see a comment saying that we're relying on some
> setup from the bootloader which I get may have been trashed...).

I'm working on a way to get sensible debug messages out.

The problem is that, as soon as we execute this loop in 
devicemaps_init():

        for (addr = VMALLOC_START; addr; addr += PMD_SIZE)
                pmd_clear(pmd_off_k(addr));

we lose all means of communication with the outside world until 
mdesc->map_io() and the following TLB/cache flushes are done.  It is 
most likely that you are having a problem in the middle of that.


Nicolas

^ permalink raw reply	[flat|nested] 30+ messages in thread

* -next fails to boot as of today on S3C6410
  2011-11-23 19:08                         ` Nicolas Pitre
@ 2011-11-24  0:32                           ` Nicolas Pitre
  2011-11-24 12:09                             ` Mark Brown
  2011-11-24 12:56                             ` Mark Brown
  0 siblings, 2 replies; 30+ messages in thread
From: Nicolas Pitre @ 2011-11-24  0:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 23 Nov 2011, Nicolas Pitre wrote:

> On Wed, 23 Nov 2011, Mark Brown wrote:
> 
> > On Wed, Nov 23, 2011 at 11:52:10AM -0600, Rob Herring wrote:
> > > On 11/23/2011 08:55 AM, Mark Brown wrote:
> > 
> > > > My bisect turned up a second candidate - Nicholas' commit 549158d (ARM:
> > > > move iotable mappings within the vmalloc region).  I reverted that and
> > > > subsequent commits depending on it and my system now appears to boot
> > > > fine with -next (with OF turned on to get the ops for the VIC).  This
> > > > code is all a bit deep for me so I've no real idea what might be wrong.
> > 
> > > The first suspect should be overlapping static mappings in the iotable.
> > > This was allowed before and now is not. Looking at some of the 6410
> > > platforms, I don't see any though. But you didn't mention which platform
> > > you are on. There is a 1KB mapping which is a bit unusual.
> > 
> > I'm on Cragganmore 6410 which just calls a bunch of arch-generic stuff
> > (though now I look I do see a comment saying that we're relying on some
> > setup from the bootloader which I get may have been trashed...).
> 
> I'm working on a way to get sensible debug messages out.
> 
> The problem is that, as soon as we execute this loop in 
> devicemaps_init():
> 
>         for (addr = VMALLOC_START; addr; addr += PMD_SIZE)
>                 pmd_clear(pmd_off_k(addr));
> 
> we lose all means of communication with the outside world until 
> mdesc->map_io() and the following TLB/cache flushes are done.  It is 
> most likely that you are having a problem in the middle of that.

OK, here it is.  Please try this patch:

-------- >8
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index acb2c7ec0a..51b4307591 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -939,23 +939,54 @@ void __init arm_mm_memblock_reserve(void)
 #endif
 }
 
+#ifdef CONFIG_DEBUG_LL
+
 /*
- * Set up the device mappings.  Since we clear out the page tables for all
- * mappings above VMALLOC_START, we will remove any debug device mappings.
- * This means you have to be careful how you debug this function, or any
- * called function.  This means you can't use any function or debugging
- * method which may touch any device, otherwise the kernel _will_ crash.
+ * The remainder of the page table containing debug mappings will be
+ * cleared and recreated by devicemaps_init().  To allow for debug devices
+ * to remain accessible, we create a temporary copy of the current page table
+ * here and switch to it, and switch back once the final mappings have
+ * been created.
+ */
+
+static pgd_t * __init install_temp_mm(void)
+{
+	pgd_t *temp_pgd = early_alloc(PTRS_PER_PGD * sizeof(pgd_t));
+	pgd_t *init_pgd = pgd_offset_k(0);
+	memcpy(temp_pgd + USER_PTRS_PER_PGD, init_pgd + USER_PTRS_PER_PGD,
+	       (PTRS_PER_PGD - USER_PTRS_PER_PGD) * sizeof(pgd_t));
+	clean_dcache_area(temp_pgd, PTRS_PER_PGD * sizeof(pgd_t));
+	cpu_switch_mm(temp_pgd, &init_mm);
+	return temp_pgd;
+}
+
+static void __init clear_temp_mm(pgd_t *temp_pgd)
+{
+	cpu_switch_mm(init_mm.pgd, &init_mm);
+	memblock_free(__pa(temp_pgd), PTRS_PER_PGD * sizeof(pgd_t));
+}
+
+#else
+#define set_temp_mm()	(NULL)
+#define clear_temp_mm()	do { } while (0)
+#endif
+
+/*
+ * Set up the device mappings.
  */
 static void __init devicemaps_init(struct machine_desc *mdesc)
 {
 	struct map_desc map;
 	unsigned long addr;
+	pgd_t *temp_pgd;
 
 	/*
 	 * Allocate the vector page early.
 	 */
 	vectors_page = early_alloc(PAGE_SIZE);
 
+	temp_pgd = install_temp_mm();
+
 	for (addr = VMALLOC_START; addr; addr += PMD_SIZE)
 		pmd_clear(pmd_off_k(addr));
 
@@ -1012,6 +1043,8 @@ static void __init devicemaps_init(struct machine_desc *mdesc)
 	if (mdesc->map_io)
 		mdesc->map_io();
 
+	clear_temp_mm(temp_pgd);
+
 	/*
 	 * Finally flush the caches and tlb to ensure that we're in a
 	 * consistent state wrt the writebuffer.  This also ensures that
-------- >8

Then, if you didn't already do so, you should enable CONFIG_DEBUG_LL 
and hack something like this into the printk code:

-------- >8
diff --git a/kernel/printk.c b/kernel/printk.c
index 1455a0d4ee..b543370cfd 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -876,6 +876,11 @@ asmlinkage int vprintk(const char *fmt, va_list args)
 	printed_len += vscnprintf(printk_buf + printed_len,
 				  sizeof(printk_buf) - printed_len, fmt, args);
 
+	{
+		extern void printascii(char *);
+		printascii(printk_buf);
+	}
+
 	p = printk_buf;
 
 	/* Read log level and handle special printk prefix */
-------- >8

That should provide you with the ability to figure out what's wrong.


Nicolas

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* -next fails to boot as of today on S3C6410
  2011-11-24  0:32                           ` Nicolas Pitre
@ 2011-11-24 12:09                             ` Mark Brown
  2011-11-24 12:56                             ` Mark Brown
  1 sibling, 0 replies; 30+ messages in thread
From: Mark Brown @ 2011-11-24 12:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Nov 23, 2011 at 07:32:58PM -0500, Nicolas Pitre wrote:

> OK, here it is.  Please try this patch:

Ah, excellent - that does get me debug output which halts at:

| Memory policy: ECC disabled, Data cache writeback.

if that rings any bells.  I'll poke around further when I get a chance.

> Then, if you didn't already do so, you should enable CONFIG_DEBUG_LL 
> and hack something like this into the printk code:

Yeah, I do.  We should probably have this in the standard code as an
option somehow, it's fairly painless but irritating.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* -next fails to boot as of today on S3C6410
  2011-11-24  0:32                           ` Nicolas Pitre
  2011-11-24 12:09                             ` Mark Brown
@ 2011-11-24 12:56                             ` Mark Brown
  2011-11-24 16:08                               ` Mark Brown
  1 sibling, 1 reply; 30+ messages in thread
From: Mark Brown @ 2011-11-24 12:56 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Nov 23, 2011 at 07:32:58PM -0500, Nicolas Pitre wrote:

> That should provide you with the ability to figure out what's wrong.

Found one problem which I just sent a patch for, iotable_init() was not
coping with being passed an empty table which it had previously been OK
with.  However, the boot still dies shortly afterwards when we try to
read the CPU ID in s3c64xx_init_cpu() and presumably take an exception
due to the system address either not being mapped or mapped somewhere
other than where the code expects.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* -next fails to boot as of today on S3C6410
  2011-11-24 12:56                             ` Mark Brown
@ 2011-11-24 16:08                               ` Mark Brown
  2011-11-24 17:59                                 ` Nicolas Pitre
  0 siblings, 1 reply; 30+ messages in thread
From: Mark Brown @ 2011-11-24 16:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Nov 24, 2011 at 12:56:16PM +0000, Mark Brown wrote:
> On Wed, Nov 23, 2011 at 07:32:58PM -0500, Nicolas Pitre wrote:

> > That should provide you with the ability to figure out what's wrong.

> Found one problem which I just sent a patch for, iotable_init() was not
> coping with being passed an empty table which it had previously been OK
> with.  However, the boot still dies shortly afterwards when we try to
> read the CPU ID in s3c64xx_init_cpu() and presumably take an exception
> due to the system address either not being mapped or mapped somewhere
> other than where the code expects.

I went back and had another look at this - the reason we're dying in
s3c64xx_init_cpu() is that it attempts to use the virtual address to
read the ID register but due to the debug patch Nicholas provided to
keep the serial port alive when I have DEBUG_LL turned on the mappings
set up by the CPU initialization were not actually being used by the
kernel yet.  The upshot is that the boot problems in current -next are
fixed by the combination of the irq_domain patch and the patch I posted
earlier today and DEBUG_LL will need something a bit different to keep
it running while doing the remapping.

Thanks to everyone who helped debug this.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* -next fails to boot as of today on S3C6410
  2011-11-24 16:08                               ` Mark Brown
@ 2011-11-24 17:59                                 ` Nicolas Pitre
  2011-11-24 18:02                                   ` Mark Brown
  0 siblings, 1 reply; 30+ messages in thread
From: Nicolas Pitre @ 2011-11-24 17:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 24 Nov 2011, Mark Brown wrote:

> On Thu, Nov 24, 2011 at 12:56:16PM +0000, Mark Brown wrote:
> > On Wed, Nov 23, 2011 at 07:32:58PM -0500, Nicolas Pitre wrote:
> 
> > > That should provide you with the ability to figure out what's wrong.
> 
> > Found one problem which I just sent a patch for, iotable_init() was not
> > coping with being passed an empty table which it had previously been OK
> > with.  However, the boot still dies shortly afterwards when we try to
> > read the CPU ID in s3c64xx_init_cpu() and presumably take an exception
> > due to the system address either not being mapped or mapped somewhere
> > other than where the code expects.
> 
> I went back and had another look at this - the reason we're dying in
> s3c64xx_init_cpu() is that it attempts to use the virtual address to
> read the ID register but due to the debug patch Nicholas provided to
> keep the serial port alive when I have DEBUG_LL turned on the mappings
> set up by the CPU initialization were not actually being used by the
> kernel yet.  The upshot is that the boot problems in current -next are
> fixed by the combination of the irq_domain patch and the patch I posted
> earlier today and DEBUG_LL will need something a bit different to keep
> it running while doing the remapping.

No.  The s3c64xx code is wrong.  It relies on a freshly installed 
mapping that has not been flushed to RAM yet.  So when it works for you, 
That's due to pure luck, most probably because the page table is not 
cached when the mapping is created and the cache is not allocated on 
write.


Nicolas

^ permalink raw reply	[flat|nested] 30+ messages in thread

* -next fails to boot as of today on S3C6410
  2011-11-24 17:59                                 ` Nicolas Pitre
@ 2011-11-24 18:02                                   ` Mark Brown
  2011-11-24 18:22                                     ` Nicolas Pitre
  0 siblings, 1 reply; 30+ messages in thread
From: Mark Brown @ 2011-11-24 18:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Nov 24, 2011 at 12:59:38PM -0500, Nicolas Pitre wrote:

> No.  The s3c64xx code is wrong.  It relies on a freshly installed 
> mapping that has not been flushed to RAM yet.  So when it works for you, 
> That's due to pure luck, most probably because the page table is not 
> cached when the mapping is created and the cache is not allocated on 
> write.

Ah, right - I was assuming that the existing code was OK (given that
it's been in use for years).  What should it be doing to trigger the
flush?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* -next fails to boot as of today on S3C6410
  2011-11-24 18:02                                   ` Mark Brown
@ 2011-11-24 18:22                                     ` Nicolas Pitre
  2011-11-24 18:31                                       ` Mark Brown
  0 siblings, 1 reply; 30+ messages in thread
From: Nicolas Pitre @ 2011-11-24 18:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 24 Nov 2011, Mark Brown wrote:

> On Thu, Nov 24, 2011 at 12:59:38PM -0500, Nicolas Pitre wrote:
> 
> > No.  The s3c64xx code is wrong.  It relies on a freshly installed 
> > mapping that has not been flushed to RAM yet.  So when it works for you, 
> > That's due to pure luck, most probably because the page table is not 
> > cached when the mapping is created and the cache is not allocated on 
> > write.
> 
> Ah, right - I was assuming that the existing code was OK (given that
> it's been in use for years).  What should it be doing to trigger the
> flush?

Calling s3c_init_cpu(() elsewhere i.e. outside of the s3c64xx_init_io() 
call path, but after the later has been called, would be best.  Maybe 
from mdesc->init_early() ?


Nicolas

^ permalink raw reply	[flat|nested] 30+ messages in thread

* -next fails to boot as of today on S3C6410
  2011-11-24 18:22                                     ` Nicolas Pitre
@ 2011-11-24 18:31                                       ` Mark Brown
  2011-11-24 18:37                                         ` Nicolas Pitre
  0 siblings, 1 reply; 30+ messages in thread
From: Mark Brown @ 2011-11-24 18:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Nov 24, 2011 at 01:22:35PM -0500, Nicolas Pitre wrote:
> On Thu, 24 Nov 2011, Mark Brown wrote:

> > Ah, right - I was assuming that the existing code was OK (given that
> > it's been in use for years).  What should it be doing to trigger the
> > flush?

> Calling s3c_init_cpu(() elsewhere i.e. outside of the s3c64xx_init_io() 
> call path, but after the later has been called, would be best.  Maybe 
> from mdesc->init_early() ?

It looks like that'll need more substantial refactoring than I have the
time to do right now, currently the CPU ID is immediately used to decide
how to carry on with low level setup.  The other option that occurs to
me is to move the check so we can just use the physical address, is that
crazy?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* -next fails to boot as of today on S3C6410
  2011-11-24 18:31                                       ` Mark Brown
@ 2011-11-24 18:37                                         ` Nicolas Pitre
  2011-11-24 18:40                                           ` Nicolas Pitre
  2011-11-24 18:49                                           ` Mark Brown
  0 siblings, 2 replies; 30+ messages in thread
From: Nicolas Pitre @ 2011-11-24 18:37 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 24 Nov 2011, Mark Brown wrote:

> On Thu, Nov 24, 2011 at 01:22:35PM -0500, Nicolas Pitre wrote:
> > On Thu, 24 Nov 2011, Mark Brown wrote:
> 
> > > Ah, right - I was assuming that the existing code was OK (given that
> > > it's been in use for years).  What should it be doing to trigger the
> > > flush?
> 
> > Calling s3c_init_cpu(() elsewhere i.e. outside of the s3c64xx_init_io() 
> > call path, but after the later has been called, would be best.  Maybe 
> > from mdesc->init_early() ?
> 
> It looks like that'll need more substantial refactoring than I have the
> time to do right now, currently the CPU ID is immediately used to decide
> how to carry on with low level setup.

Where, and from what call path?

> The other option that occurs to
> me is to move the check so we can just use the physical address, is that
> crazy?

Yes, it is.  ;-)  Please don't do that.


Nicolas

^ permalink raw reply	[flat|nested] 30+ messages in thread

* -next fails to boot as of today on S3C6410
  2011-11-24 18:37                                         ` Nicolas Pitre
@ 2011-11-24 18:40                                           ` Nicolas Pitre
  2011-11-24 18:49                                           ` Mark Brown
  1 sibling, 0 replies; 30+ messages in thread
From: Nicolas Pitre @ 2011-11-24 18:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 24 Nov 2011, Nicolas Pitre wrote:

> On Thu, 24 Nov 2011, Mark Brown wrote:
> 
> > On Thu, Nov 24, 2011 at 01:22:35PM -0500, Nicolas Pitre wrote:
> > > On Thu, 24 Nov 2011, Mark Brown wrote:
> > 
> > > > Ah, right - I was assuming that the existing code was OK (given that
> > > > it's been in use for years).  What should it be doing to trigger the
> > > > flush?
> > 
> > > Calling s3c_init_cpu(() elsewhere i.e. outside of the s3c64xx_init_io() 
> > > call path, but after the later has been called, would be best.  Maybe 
> > > from mdesc->init_early() ?
> > 
> > It looks like that'll need more substantial refactoring than I have the
> > time to do right now, currently the CPU ID is immediately used to decide
> > how to carry on with low level setup.
> 
> Where, and from what call path?

Nevermind.  I see the problem now.  :-/


Nicolas

^ permalink raw reply	[flat|nested] 30+ messages in thread

* -next fails to boot as of today on S3C6410
  2011-11-24 18:37                                         ` Nicolas Pitre
  2011-11-24 18:40                                           ` Nicolas Pitre
@ 2011-11-24 18:49                                           ` Mark Brown
  2011-11-26 21:52                                             ` Russell King - ARM Linux
  1 sibling, 1 reply; 30+ messages in thread
From: Mark Brown @ 2011-11-24 18:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Nov 24, 2011 at 01:37:51PM -0500, Nicolas Pitre wrote:
> On Thu, 24 Nov 2011, Mark Brown wrote:

> > It looks like that'll need more substantial refactoring than I have the
> > time to do right now, currently the CPU ID is immediately used to decide
> > how to carry on with low level setup.

> Where, and from what call path?

s3c64xx_init_io() is:

	/* initialise the io descriptors we need for initialisation */
	iotable_init(s3c_iodesc, ARRAY_SIZE(s3c_iodesc));
	iotable_init(mach_desc, size);
	init_consistent_dma_size(SZ_8M);

	/* detect cpu id */
	s3c64xx_init_cpu();

	s3c_init_cpu(samsung_cpu_id, cpu_ids, ARRAY_SIZE(cpu_ids));

where s3c64xx_init_cpu() is the function relying on the effects of the
first iotable_init().  It initializes samsung_cpu_id for the call to 
s3c_init_cpu() which calls the callbacks in s3c6400.c or s3c6410.c.

Simply moving the detection to init_early() fails though I only tried
briefly.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* -next fails to boot as of today on S3C6410
  2011-11-24 18:49                                           ` Mark Brown
@ 2011-11-26 21:52                                             ` Russell King - ARM Linux
  2011-11-27  0:28                                               ` Nicolas Pitre
  0 siblings, 1 reply; 30+ messages in thread
From: Russell King - ARM Linux @ 2011-11-26 21:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Nov 24, 2011 at 06:49:46PM +0000, Mark Brown wrote:
> s3c64xx_init_io() is:
> 
> 	/* initialise the io descriptors we need for initialisation */
> 	iotable_init(s3c_iodesc, ARRAY_SIZE(s3c_iodesc));
> 	iotable_init(mach_desc, size);
> 	init_consistent_dma_size(SZ_8M);
> 
> 	/* detect cpu id */
> 	s3c64xx_init_cpu();
> 
> 	s3c_init_cpu(samsung_cpu_id, cpu_ids, ARRAY_SIZE(cpu_ids));
> 
> where s3c64xx_init_cpu() is the function relying on the effects of the
> first iotable_init().  It initializes samsung_cpu_id for the call to 
> s3c_init_cpu() which calls the callbacks in s3c6400.c or s3c6410.c.
> 
> Simply moving the detection to init_early() fails though I only tried
> briefly.

Okay, so this looks like a regression introduced by Nicolas' patches,
so I'm save to move the devel-stable marker to the merge commit in my
devel branch between the initial restart changes and Marc's irqchip
consolidation effort.

Until this problem is resolved, I'm going to treat Nicolas' vmalloc
branch as volatile.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* -next fails to boot as of today on S3C6410
  2011-11-26 21:52                                             ` Russell King - ARM Linux
@ 2011-11-27  0:28                                               ` Nicolas Pitre
  0 siblings, 0 replies; 30+ messages in thread
From: Nicolas Pitre @ 2011-11-27  0:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, 26 Nov 2011, Russell King - ARM Linux wrote:

> On Thu, Nov 24, 2011 at 06:49:46PM +0000, Mark Brown wrote:
> > s3c64xx_init_io() is:
> > 
> > 	/* initialise the io descriptors we need for initialisation */
> > 	iotable_init(s3c_iodesc, ARRAY_SIZE(s3c_iodesc));
> > 	iotable_init(mach_desc, size);
> > 	init_consistent_dma_size(SZ_8M);
> > 
> > 	/* detect cpu id */
> > 	s3c64xx_init_cpu();
> > 
> > 	s3c_init_cpu(samsung_cpu_id, cpu_ids, ARRAY_SIZE(cpu_ids));
> > 
> > where s3c64xx_init_cpu() is the function relying on the effects of the
> > first iotable_init().  It initializes samsung_cpu_id for the call to 
> > s3c_init_cpu() which calls the callbacks in s3c6400.c or s3c6410.c.
> > 
> > Simply moving the detection to init_early() fails though I only tried
> > briefly.
> 
> Okay, so this looks like a regression introduced by Nicolas' patches,
> so I'm save to move the devel-stable marker to the merge commit in my
> devel branch between the initial restart changes and Marc's irqchip
> consolidation effort.
> 
> Until this problem is resolved, I'm going to treat Nicolas' vmalloc
> branch as volatile.

I just folded Mark's fix into my branch.  So feel free to re-pull the 
whole thing:

	git://git.linaro.org/people/nico/linux vmalloc

Thanks.


Nicolas

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2011-11-27  0:28 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-22 19:27 -next fails to boot as of today on S3C6410 Mark Brown
2011-11-22 19:31 ` Russell King - ARM Linux
2011-11-22 19:39   ` Mark Brown
2011-11-22 22:21     ` Jamie Iles
2011-11-22 22:58       ` Mark Brown
2011-11-22 23:03         ` Jamie Iles
2011-11-22 23:34       ` Rob Herring
2011-11-23 12:05       ` Mark Brown
2011-11-23 12:28         ` Jamie Iles
2011-11-23 13:01           ` Mark Brown
2011-11-23 13:32             ` Jamie Iles
2011-11-23 13:50               ` Mark Brown
2011-11-23 14:33                 ` Jamie Iles
2011-11-23 14:55                   ` Mark Brown
2011-11-23 17:52                     ` Rob Herring
2011-11-23 17:54                       ` Mark Brown
2011-11-23 19:08                         ` Nicolas Pitre
2011-11-24  0:32                           ` Nicolas Pitre
2011-11-24 12:09                             ` Mark Brown
2011-11-24 12:56                             ` Mark Brown
2011-11-24 16:08                               ` Mark Brown
2011-11-24 17:59                                 ` Nicolas Pitre
2011-11-24 18:02                                   ` Mark Brown
2011-11-24 18:22                                     ` Nicolas Pitre
2011-11-24 18:31                                       ` Mark Brown
2011-11-24 18:37                                         ` Nicolas Pitre
2011-11-24 18:40                                           ` Nicolas Pitre
2011-11-24 18:49                                           ` Mark Brown
2011-11-26 21:52                                             ` Russell King - ARM Linux
2011-11-27  0:28                                               ` Nicolas Pitre

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).