linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Catalin Marinas <catalin.marinas@arm.com>
To: "Zhang, Lei" <zhang.lei@jp.fujitsu.com>
Cc: 'Mark Rutland' <mark.rutland@arm.com>,
	"'james.morse@arm.com'" <james.morse@arm.com>,
	"'will.deacon@arm.com'" <will.deacon@arm.com>,
	"'linux-kernel@vger.kernel.org'" <linux-kernel@vger.kernel.org>,
	"'linux-arm-kernel@lists.infradead.org'"
	<linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH v3 0/1] arm64: Add workaround for Fujitsu A64FX erratum 010001
Date: Tue, 29 Jan 2019 18:10:32 +0000	[thread overview]
Message-ID: <20190129181032.GC224095@arrakis.emea.arm.com> (raw)
In-Reply-To: <8898674D84E3B24BA3A2D289B872026A6A2C04E6@G01JPEXMBKW03>

Hi,

Could you please copy the whole description from the cover letter to the
actual patch and only send one email (full description as in here
together with the patch)? If we commit this to the kernel, it would be
useful to have the information in the log for reference later on.

More comments below:

On Tue, Jan 29, 2019 at 12:29:58PM +0000, Zhang, Lei wrote:
> On some variants of the Fujitsu-A64FX cores ver(1.0, 1.1),  
> memory accesses may cause undefined fault (Data abort, DFSC=0b111111).
> This problem will be fixed by next version of Fujitsu-A64FX.
> 
> This fault occurs under a specific hardware condition 
> when a load/store instruction perform an address translation using:
>   case-1  TTBR0_EL1 with TCR_EL1.NFD0 == 1.
>   case-2  TTBR0_EL2 with TCR_EL2.NFD0 == 1.
>   case-3  TTBR1_EL1 with TCR_EL1.NFD1 == 1.
>   case-4  TTBR1_EL2 with TCR_EL2.NFD1 == 1.
> And this fault occurs completely spurious.

So this looks like new information on the hardware behaviour since the
v2 of the patch. Can this fault occur for any type of instruction
accessing the memory or only for SVE instructions?

> Since TCR_ELx.NFD1 is set to '1' at the kernel in versions 
> past 4.17, the case-3 or case-4 may happen.
> 
> This fault can be taken only at stage-1, 
> so this fault is taken from EL0 to EL1/EL2, from EL1 to EL1, 
> or from EL2 to EL2.
> 
> I would like to post a workaround to avoid this problem on 
> existing Fujitsu-A64FX version.

How likely is it to trigger this erratum? In other words, aren't we
better off with a spurious fault that we ignore rather than toggling the
TCR_ELx.NFD1 bit?

> There are 2 points in this workaround.
> Point1: trap from EL1 to EL1, EL2 to EL2
> Set '0' to TCR_ELx.NFD1in kernel-entry, 
> and set '1' in kernel-exit.
> 
> From the view point of ARM specification, there is no problem to 
> reset TCR_ELx.{NFD0,NFD1} while in EL1/EL2, because 
> TCR_ELx.{NFD0,NFD1} controls whether to perform a translation 
> table walk in response to an access from EL0.

The problem is that this bit may be cached in the TLB (I haven't checked
the ARM ARM but that's usually the case with the TCR_ELx bits). If
that's the case, you can't guarantee a change unless you also perform a
TLBI VMALL. Arguably, if Fujitsu's microarchitecture doesn't cache the
NFD bits in the TLB, we could apply the workaround but I'd rather have
the spurious trap if it's not too often.

> I confirmed that:
> ・There is no load/store instruction between 
>   tramp_ventry and setting TCR_ELx.NFD1 to '0'.
> ・There is no load/store instruction between 
>   setting TCR_ELx.NFD1 to '1' and tramp_exit.

Could speculative loads also trigger this? Another option would be to
toggle it during kernel_neon_begin/end (with the caveat of TLBI as
mentioned above).

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2019-01-29 18:10 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-29 12:29 [PATCH v3 0/1] arm64: Add workaround for Fujitsu A64FX erratum 010001 Zhang, Lei
2019-01-29 18:10 ` Catalin Marinas [this message]
2019-01-30 14:56   ` James Morse
2019-02-05 12:49   ` Zhang, Lei
2019-01-30 15:00 ` James Morse
2019-02-01  5:53   ` Zhang, Lei
2019-02-01 10:51     ` Will Deacon
2019-02-05 13:32       ` Zhang, Lei
2019-02-13 13:19         ` Will Deacon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190129181032.GC224095@arrakis.emea.arm.com \
    --to=catalin.marinas@arm.com \
    --cc=james.morse@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=will.deacon@arm.com \
    --cc=zhang.lei@jp.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).