[PATCH] [PATCH] arm64: Boot failure on m400 with new cont PTEs

All of lore.kernel.org
 help / color / mirror / Atom feed

From: jeremy.linton@arm.com (Jeremy Linton)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH] [PATCH] arm64: Boot failure on m400 with new cont PTEs
Date: Wed, 18 Nov 2015 11:14:02 -0600	[thread overview]
Message-ID: <564CB1DA.4090304@arm.com> (raw)
In-Reply-To: <20151118162932.GA13355@leverpostej>

On 11/18/2015 10:29 AM, Mark Rutland wrote:
> On Wed, Nov 18, 2015 at 10:08:58AM -0600, Jeremy Linton wrote:
>> 	No, its not defconfig, its roughly the RHELSA config tossed into a
>> mainline 4.4 tree and all the default options selected. AFAIK RHELSA
>> is still limited access.
>
> That renders this extremely difficult for anyone else to reproduce...

Well the kernel in question boots fine on a Juno. I haven't tried any 
other APM based machines. And given whats happening I doubt its config 
related.

> That 48 / 0b110000 for the DFSC decodes as "TLB conflict abort" per the
> ARM ARM. Other than that, the WnR bit is set in the ISS.
>
> So this is probably a break-before-make issue.
>
> Can you figure out where 0xfffffe0000d60588 pointed to, and where in the
> kernel the access was performed? It would be nice to know if this is
> consistently happening at some edge of the kernel address space.

I decoded everything when I initially saw it, but it didn't make a lick 
of sense related to what I was attempting to accomplish so I didn't keep 
any of it. Only later when I found out it wasn't related to the patches 
I was applying did I start trying to track down the regression. Even so, 
given some other patches that went in, it wasn't blindingly obvious 
where the problem was until I was sure that it was related to the linear 
mapping changes. AKA I didn't think anyone would be able to debug the 
failure with that little information, maybe i'm wrong on that point... 
Anyway, the kernel that produced that failure is long gone, I can in the 
near future attempt to reproduce the message.

>> Once I find/fix the console issue on that machine with 4.4rc1 (there
>> are a small handful of issues that keep mainline from working on it,
>> including the sata patch that was posted, and rejected), I will
>> focus on hoisting the tlb flush into create_mapping_late() and
>> removing the splattering of flushes in those code paths. That is
>> unless there is a reason to be preforming them as soon as the
>> directories are split.
>
> We need to figure out exactly what maintenance we actually need.
>
> Hoisting the TLB flush isn't necessarily possible if we need to perform
> break-before-make at the PTE level, and even that may not be possible
> for the kernel page tables; we might need to do something more
> drastic like using ASIDs and double-buffering them...
>
> We also need to figure out what's happening with the code as it is.

Well, I'm suspect what is happening is that there are conflicting TLB's 
hanging around, one for a cont range that is overlapping a stale non 
cont one. This sort of implies that this has been happening all along, 
AKA RO regions were being "lazy" activated if you will. Its only on a 
core that aborts when it detects that (which i assume requires differing 
size entries for this core) does it cause problems. The 
break-before-make issue, seems like it won't cause a big problem here as 
long as there is some way to assure valid TLBs before the update, and 
then assure they are cleared following it. Hence the overly aggressive 
change works because it flushes following every cont block update. Which 
would bother me more if the code were run more than once per boot (or in 
the future per module load/unload if someone gets around to updating the 
no execute reliably).

next prev parent reply	other threads:[~2015-11-18 17:14 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-18 15:03 [PATCH] [PATCH] arm64: Boot failure on m400 with new cont PTEs Jeremy Linton
2015-11-18 15:20 ` Mark Rutland
2015-11-18 16:08   ` Jeremy Linton
2015-11-18 16:29     ` Mark Rutland
2015-11-18 17:14       ` Jeremy Linton [this message]
2015-11-18 18:04         ` Mark Rutland
2015-11-18 19:31           ` Jeremy Linton
2015-11-19 11:31             ` Mark Rutland
2015-11-20 19:52               ` Mark Rutland
2015-11-23 12:15                 ` Catalin Marinas
2015-11-23 13:49                   ` Mark Rutland
2015-11-23 14:48                     ` Jeremy Linton
2015-11-23 15:41                       ` Will Deacon
2015-11-23 15:46                         ` Jeremy Linton
2015-11-23 14:31                   ` Jeremy Linton
2015-11-20 20:15               ` Mark Rutland
2015-11-23 15:51       ` Catalin Marinas
2015-11-23 16:02         ` Jeremy Linton
2015-11-23 16:37           ` Laura Abbott
2015-11-23 16:42             ` Jeremy Linton
2015-11-23 17:52               ` Laura Abbott
2015-11-23 18:46                 ` Jeremy Linton
2015-11-24  8:04               ` Ard Biesheuvel
2015-11-23 16:52           ` Catalin Marinas
2015-11-23 17:24             ` Catalin Marinas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=564CB1DA.4090304@arm.com \
    --to=jeremy.linton@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.