linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: tixy@linaro.org (Jon Medhurst (Tixy))
To: linux-arm-kernel@lists.infradead.org
Subject: Versatile Express randomly fails to boot - Versatile Express to be removed from nightly testing
Date: Tue, 14 Jun 2016 16:31:25 +0100	[thread overview]
Message-ID: <1465918285.2840.41.camel@linaro.org> (raw)
In-Reply-To: <551D7EAB.1000200@arm.com>

Hi Sudeep

Over the past several days I think I've been unknowingly reproducing
many of the steps in this old discussion thread [1] about A9 Versatile
Express boot failures. It's only when I found myself looking at the L2
cache timings that I got a vague recollection and dug out this old
thread again. Was there any resolution to the issue? As far as I can
work out, the A9x4 CoreTile stopped working around Linux 3.18 (the
problem isn't 100% reproducible so it's difficult to tell).

Using "arm,tag-latency = <2 2 1>" as Russell seemed to indicate [2]
fixed things for him, also works for me. So should we update mainline
device-tree with that?

Alternatively, we could assume nobody cares about A9 as presumably Linux
has been unbootable for a year without anyone raising the issue. (The
only reason I'm looking at it is I may be making U-Boot changes for
vexpress and I wanted to test them).

But if we are going to just ignore things, I think it would be good to
delete the A9 dts, or the L2 cache entry, so other people in the future
don't waste days trying to track down the problem.

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-March/330860.html
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-May/342005.html

-- 
Tixy


n Thu, 2015-04-02 at 18:38 +0100, Sudeep Holla wrote:
> 
> On 02/04/15 15:13, Russell King - ARM Linux wrote:
> > On Tue, Mar 31, 2015 at 06:27:30PM +0100, Sudeep Holla wrote:
> >> Not sure on that as v3.18 with DT seems to be working fine and passed
> >> overnight reboot testing.
> >
> > Okay, that suggests there's something post v3.18 which is causing this,
> > rather than it being a DT vs non-DT thing.
> >
> 
> Correct. Just to be 100% sure I reverted that non-DT removal commit on
> both v3.19-rc1 and v4.0-rc6 and was able to reproduce issue without DT.
> 
> > An extra data point which I've just found (by enabling attempts to do
> > hibernation on various test platforms) is that the Versatile Express
> > appears to be incapable of taking a CPU offline.
> >
> > This crashes the entire system with sometimes random results.  Sometimes
> > it'll appear that a spinlock has been left owned by CPU#1 which is
> > offline.  Sometimes it'll silently hang.  Sometimes it'll start slowly
> > dumping kernel messages from the start of the kernel's ring buffer (!),
> > eg:
> >
> > PM: freeze of devices complete after 29.342 msecs
> > PM: late freeze of devices complete after 6.398 msecs
> > PM: noirq freeze of devices complete after 5.493 msecs
> > Disabling non-boot CPUs ...
> > __cpu_disable(1)
> > __cpu_die(1)
> > handle_IPI(0)
> > Booting Linux on physical CPU 0x0
> >
> > So far, it's not managed to take a CPU successfully offline and know that
> > it has.  If I disable the calls to cpu_enter_lowpower() and
> > cpu_leave_lowpower(), then it appears to work.
> >
> > This leads me to wonder whether flush_cache_louis() works... which led me
> > in turn to ARM_ERRATA_643719, which is disabled in my builds.  However,
> > the CA9 tile has a r0p1 CA9, which allegedly suffers from this errata.
> >
> 
> Yes I observed that and tested for this issue enabling it. It's doesn't
> affect and I still hit the issue.
> 
> [...]
> >
> > I haven't tested going back to a tag latency of 1 1 1 yet.  Can you
> > confirm whether you have this errata enabled for your tests?
> >
> I have now gone back to <1 1 1> latency to debug the issue as it's
> easier to reproduce with that latencies.
> 
> After I failed terribly to bisect between v3.18..v3.19-c1, as it depends
> a lot on the config you choose(a lot of changes introduced as it's merge
> window), I started looking at the code where we hit this issue since
> it's always in __radix_tree_lookup in lib/radix-tree.c while
> accessing the slots to see if it provides any more details.
> 
> Regards,
> Sudeep
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2016-06-14 15:31 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-15 21:33 Versatile Express randomly fails to boot Russell King - ARM Linux
2015-03-16  0:04 ` Russell King - ARM Linux
2015-03-16  0:42   ` Russell King - ARM Linux
2015-03-16  9:35     ` Russell King - ARM Linux
2015-03-16 13:04       ` Versatile Express randomly fails to boot - Versatile Express to be removed from nightly testing Russell King - ARM Linux
2015-03-16 17:47         ` Sudeep Holla
2015-03-16 18:16           ` Russell King - ARM Linux
2015-03-16 19:16             ` Sudeep Holla
2015-03-16 19:52               ` Russell King - ARM Linux
2015-03-17 12:05                 ` Sudeep Holla
2015-03-17 15:36                   ` Russell King - ARM Linux
2015-03-17 15:51                     ` Sudeep Holla
2015-03-17 16:17                       ` Russell King - ARM Linux
2015-03-30 14:03                         ` Russell King - ARM Linux
2015-03-30 14:48                           ` Sudeep Holla
2015-03-30 15:05                             ` Russell King - ARM Linux
2015-03-30 15:39                               ` Sudeep Holla
2015-03-31 17:27                                 ` Sudeep Holla
2015-04-02 14:13                                   ` Russell King - ARM Linux
2015-04-02 17:38                                     ` Sudeep Holla
2016-06-14 15:31                                       ` Jon Medhurst (Tixy) [this message]
2016-06-14 15:52                                         ` Russell King - ARM Linux
2016-06-14 16:44                                           ` Sudeep Holla
2016-06-14 16:49                                             ` Russell King - ARM Linux
2016-06-15  9:27                                               ` Jon Medhurst (Tixy)
2016-06-15  9:32                                                 ` Sudeep Holla
2016-06-15  9:50                                                   ` Jon Medhurst (Tixy)
2016-06-15  9:59                                                     ` Sudeep Holla
2016-06-15  9:27                                               ` Sudeep Holla
2016-06-14 16:31                                         ` Sudeep Holla

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1465918285.2840.41.camel@linaro.org \
    --to=tixy@linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).