All of lore.kernel.org
 help / color / mirror / Atom feed
From: sudeep.holla@arm.com (Sudeep Holla)
To: linux-arm-kernel@lists.infradead.org
Subject: Versatile Express randomly fails to boot - Versatile Express to be removed from nightly testing
Date: Mon, 16 Mar 2015 17:47:46 +0000	[thread overview]
Message-ID: <55071742.6000405@arm.com> (raw)
In-Reply-To: <20150316130419.GI8656@n2100.arm.linux.org.uk>

Hi Russell,

On 16/03/15 13:04, Russell King - ARM Linux wrote:
> On Mon, Mar 16, 2015 at 09:35:53AM +0000, Russell King - ARM Linux wrote:
>> On Mon, Mar 16, 2015 at 12:42:39AM +0000, Russell King - ARM Linux wrote:
>>> On Mon, Mar 16, 2015 at 12:04:38AM +0000, Russell King - ARM Linux wrote:
>>>> On Sun, Mar 15, 2015 at 09:33:30PM +0000, Russell King - ARM Linux wrote:
>>>>> I'm going to try a few other kernels to try and track down what's going
>>>>> on - whether something from arm-soc or my tree is responsible for this
>>>>> really weird behaviour.
>>>>
>>>> Okay, this is weird - it seems that it's caused by the FIQ oops
>>>> dumping code/FIQ changes which I've carried for many months
>>>> unchanged in my tree.
>>>
>>> More weirdness.  Progressing forwards through my development code
>>> showed that when I merged the patch I mentioned in the previous mail,
>>> things started to fail.
>>>
>>> As I also mentioned, I'd drop that branch (two patches, one adding
>>> the IPI backtrace stuff and the second one updating the GIC to allow
>>> it to raise FIQs on suitably equipped platforms.)  I would have
>>> expected that to have worked, but it just failed after four boot
>>> iterations.  So either it's not the FIQ, or it is the FIQ code _and_
>>> also something else.  Or it has something to do with the placement
>>> of functions in the kernel.
>>>
>>> I'll try more stuff tomorrow, working from where I presently am
>>> (which is basically last night's code minus the FIQ changes) by
>>> removing other changes to see what brings us back to a working
>>> system.
>>>
>>> As I've already said - this is really weird because all of these
>>> changes were also tested against -rc1... those which weren't are:
>>>
>>> mm: fold arch_randomize_brk into ARCH_HAS_ELF_RANDOMIZE
>>> mm: split ET_DYN ASLR from mmap ASLR
>>> mm: move randomize_et_dyn into ELF_ET_DYN_BASE
>>> mm: expose arch_mmap_rnd when available
>>> arm: factor out mmap ASLR into mmap_rnd
>>>
>>> and a number of clkdev rework patches (to make it use clk_hw
>>> internally.)  Neither of these should be affecting it, but that's
>>> something I will be testing tomorrow.
>>
>> Okay, reverting the ASLR changes and the clkdev changes annoyingly still
>> results in random failure.
>
> After ruling out ASLR and clkdev, I started progressively reverting other
> stuff in the build tree.  Eventually, I got down to reverting the L2C
> change I've been carrying since the L2C cleanups.
>
> With that lot reverted, which is slightly more than the previously known
> good test, it booted five times without issue.
>
> So, I thought I'd add that L2C change to the list of bad commits, and try
> omitting _just_ the L2C and FIQ changes... and it still fails - on the
> first test boot iteration.
>
> I think I'm going to declare that this is all down to some obscure
> hardware problem with Versatile Express, which is tickled by the layout
> of the kernel against the cache, and take it out of the nightly system
> (it's pointless having unstable hardware being tested; random failures
> are completely meaningless.)
>

I was able to see exact behaviour on my VExpress setup with CA9X4 
core-tile. Few observations from my side:

1. This issue can be reproduced even on v3.19
2. As you suspected L2C, I tried disabling L2C and it seems to solve
    the issue
3. Since it's very random and enabling LL_DEBUG made it difficult to
    reproduce the issue, I tried to dump the stack using DS5 debugger
4. The stack is exactly same always both on v4.0-rc* and v3.19 kernel
    and on multiple runs
5. Connecting to h/w debugger, stopping and re-starting the CPUs,
    solves the issue. It's helping CPUs to get out of __radix_tree_lookup
    somehow

Stacktrace
==========
(sorry it's looks different from std. Linux backtrace as this one id 
dump from DS5)

CPU 0
----
#0 __radix_tree_lookup( root = <Value currently has no location>, index 
= 16, nodep = (struct radix_tree_node**) 0x0, slotp = (void***) 0x0 ) at 
radix-tree.c:517
#1 generic_handle_irq( irq = 16 ) at irqdesc.c:349
#2 __handle_domain_irq( domain = (struct irq_domain*) 0xBF004400, hwirq 
= 16, lookup = <Value currently has no location>, regs = <Value 
currently has no location> ) at irqdesc.c:391
#3 __raw_readl( addr = <Value optimised away by compiler> ) at io.h:121
#4 gic_handle_irq( regs = (struct pt_regs*) 0x805F1F40 ) at irq-gic.c:277
#5 [__irq_svc+0x40]


CPU1
----
#0 __radix_tree_lookup( root = <Value currently has no location>, index 
= 16, nodep = (struct radix_tree_node**) 0x0, slotp = (void***) 0x0 ) at 
radix-tree.c:517
#1 __irq_get_desc_lock( irq = <Value currently has no location>, flags = 
(long unsigned int*) 0xBF08BF94, bus = false, check = 3 ) at irqdesc.c:544
#2 enable_percpu_irq( irq = 16, type = 0 ) at manage.c:1583
#3 twd_timer_cpu_notify( self = <Value not available : Undefined value 
in stack frame for register R0>, action = <Value currently has no 
location>, hcpu = <Value not available : Undefined value in stack frame 
for register R2> ) at smp_twd.c:322
#4 notifier_call_chain( nl = <Value currently has no location>, val = 
<Value not available : Undefined value in stack frame for register R1>, 
v = <Value not available : Undefined value in stack frame for register 
R2>, nr_to_call = <Value not available : Undefined value in stack frame 
for register R3>, nr_calls = (int*) 0x0 ) at notifier.c:95
#5 notifier_to_errno( ret = <Value currently has no location> ) at 
notifier.h:179
#6 cpu_notify( val = <Value currently has no location>, v = <Value 
currently has no location> ) at cpu.c:234
#7 secondary_start_kernel() at smp.c:367

CPU2 & CPU3
-----------
Not booted yet, still waiting in bootloader

  reply	other threads:[~2015-03-16 17:47 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-15 21:33 Versatile Express randomly fails to boot Russell King - ARM Linux
2015-03-16  0:04 ` Russell King - ARM Linux
2015-03-16  0:42   ` Russell King - ARM Linux
2015-03-16  9:35     ` Russell King - ARM Linux
2015-03-16 13:04       ` Versatile Express randomly fails to boot - Versatile Express to be removed from nightly testing Russell King - ARM Linux
2015-03-16 17:47         ` Sudeep Holla [this message]
2015-03-16 18:16           ` Russell King - ARM Linux
2015-03-16 19:16             ` Sudeep Holla
2015-03-16 19:52               ` Russell King - ARM Linux
2015-03-17 12:05                 ` Sudeep Holla
2015-03-17 15:36                   ` Russell King - ARM Linux
2015-03-17 15:51                     ` Sudeep Holla
2015-03-17 16:17                       ` Russell King - ARM Linux
2015-03-30 14:03                         ` Russell King - ARM Linux
2015-03-30 14:48                           ` Sudeep Holla
2015-03-30 15:05                             ` Russell King - ARM Linux
2015-03-30 15:39                               ` Sudeep Holla
2015-03-31 17:27                                 ` Sudeep Holla
2015-04-02 14:13                                   ` Russell King - ARM Linux
2015-04-02 17:38                                     ` Sudeep Holla
2016-06-14 15:31                                       ` Jon Medhurst (Tixy)
2016-06-14 15:52                                         ` Russell King - ARM Linux
2016-06-14 16:44                                           ` Sudeep Holla
2016-06-14 16:49                                             ` Russell King - ARM Linux
2016-06-15  9:27                                               ` Jon Medhurst (Tixy)
2016-06-15  9:32                                                 ` Sudeep Holla
2016-06-15  9:50                                                   ` Jon Medhurst (Tixy)
2016-06-15  9:59                                                     ` Sudeep Holla
2016-06-15  9:27                                               ` Sudeep Holla
2016-06-14 16:31                                         ` Sudeep Holla

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55071742.6000405@arm.com \
    --to=sudeep.holla@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.