public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed
From: siarhei.siamashka@gmail.com (Siarhei Siamashka)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH 0/4] arm64: advertise availability of CRC and crypto instructions
Date: Fri, 20 Dec 2013 03:35:51 +0200	[thread overview]
Message-ID: <20131220033551.43783672@i7> (raw)
In-Reply-To: <CAKv+Gu_gv1YAM3Fv1StAWvd4Ecvi+kqpcRiMxURsRsBjhR-Mhg@mail.gmail.com>

On Thu, 19 Dec 2013 18:33:45 +0100
Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:

> On 19 December 2013 07:48, Siarhei Siamashka
> <siarhei.siamashka@gmail.com> wrote:
> > On Wed, 18 Dec 2013 22:57:33 +0100
> > Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> >
> >> On 18 December 2013 22:18, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> >> > On Wed, 18 Dec 2013, Ard Biesheuvel wrote:
> >> >> The nice thing about hwcaps is that it is already integrated into the
> >> >> ifunc resolution done by the loader, which makes it very easy and
> >> >> straightforward to offer alternative implementations of library
> >> >> functions based on CPU capabilities.
> >> >
> >> > The library may as well implement its own ifunc that tests the
> >> > instruction while trapping SIGILL.  On those systems with the supported
> >> > instruction there will be no trap.  On those that traps then the
> >> > alternative implementation is going to be much slower anyway.
> >> >
> >>
> >> True. And the trap still only occurs at load time. But I think we
> >> agree it is essentially a poor man's hwcaps.
> >
> > And the hwcaps is essentially a poor man's replacement for a userspace
> > accessible CPUID instruction enjoyed by x86.
> >
> > It's sad to see that the runtime CPU features detection still remains
> > a PITA with AArch64. Basically, it's not enough to know if the
> > instruction is supported or not. Different microarchitectures may
> > various performance quirks for certain instructions. For example,
> > VFPLite in Cortex-A8 is non-pipelined and slow. Cortex-A15 can
> > dual-issue NEON instructions (nice for the code which can enjoy
> > high ILP), but Cortex-A15 NEON instructions have relatively high
> > latency (bad for the code, which is essentially a long dependency
> > chain). The fastest way to read uncached memory for most ARM
> > processors is to use the VFP load multiple instruction with as
> > many registers as possible, but this is slow on Marvell PJ4. And
> > so on.
> >
> 
> You are comparing apples and oranges.
> 
> It is fairly well known that you are better off using the NEON for
> floating point on a Cortex-A8, if you can afford the reduced
> precision. But if you /can't/ afford the reduced precision, you are
> still better off using VFP-lite than using software emulation.

If the reduced precision of 32-bit floats can't be afforded, it is still
sometimes possible to use more accurate fixed point calculations
instead. And do them faster than using VFP-lite. The generic and slow
software emulation of 64-bit doubles is not even an option.

That's exactly the point. If we know more information about the CPU
capabilities, we can select a more suitable implementation at runtime.
Even the implementation, which uses a somewhat different algorithm
for doing the same job.

> The same applies to the Crypto Extensions: it is highly unlikely that
> you will care about the particular implementation of the AES
> instructions if you are faced with the choice of using those
> instructions or using a software implementation. So using hwcaps bits
> for these kinds of features makes perfect sense. (And so does enabling
> the 'has-vfp' bit for VFP-lite)

I'm not opposing the addition of Crypto Extensions support to hwcaps.
Still this just covers only the basic use cases (which is great!) but
is not enough to make everyone happy.

> I do agree with you that the heterogeneity between various ARM
> implementors is a PITA at times, and knowing which CPU exactly you are
> running on is a valid question in those cases

Again, this was exactly the point of my e-mail. And appears that we
agree with each other.

> (btw this applies to SSE on Atom as well).

I'm well aware of the Atom SSSE3 performance issues (the microcoded
PSHUFB instruction in particular). The key difference is that the x86
architecture allows to easily identify the CPU cores.

> But please don't confuse it with the simple presence or absence of
> some CPU extension.

...

-- 
Best regards,
Siarhei Siamashka

  reply	other threads:[~2013-12-20  1:35 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-16 21:04 [PATCH 0/4] arm64: advertise availability of CRC and crypto instructions Ard Biesheuvel
2013-12-16 21:04 ` [PATCH 1/4] arm64: drop redundant macros from read_cpuid() Ard Biesheuvel
2013-12-17 12:04   ` Catalin Marinas
2013-12-17 12:10     ` Will Deacon
2013-12-17 12:12       ` Catalin Marinas
2013-12-16 21:04 ` [PATCH 2/4] arm64: Add hwcaps for crypto and CRC32 extensions Ard Biesheuvel
2013-12-17 12:08   ` Catalin Marinas
2013-12-17 12:11     ` Catalin Marinas
2013-12-16 21:04 ` [PATCH 3/4] ARM: allocate hwcaps bits for v8 crypto extensions Ard Biesheuvel
2013-12-16 21:04 ` [PATCH 4/4] arm64: add 32-bit compat hwcaps " Ard Biesheuvel
2013-12-17 12:25 ` [PATCH 0/4] arm64: advertise availability of CRC and crypto instructions Catalin Marinas
2013-12-18  9:50   ` Ard Biesheuvel
2013-12-18 10:03     ` Russell King - ARM Linux
2013-12-18 10:25       ` Ard Biesheuvel
2013-12-18 10:55         ` Russell King - ARM Linux
2013-12-18 11:15           ` Ard Biesheuvel
2013-12-18 11:27             ` Catalin Marinas
2013-12-18 11:34               ` Catalin Marinas
2013-12-18 11:42               ` Russell King - ARM Linux
2013-12-18 11:59                 ` Ard Biesheuvel
2013-12-18 12:03                 ` Catalin Marinas
2013-12-18 14:27                   ` Christopher Covington
2013-12-18 16:13                     ` Ard Biesheuvel
2013-12-18 17:29                       ` Catalin Marinas
2013-12-18 18:50                         ` Ard Biesheuvel
2013-12-19 11:11                           ` Catalin Marinas
2013-12-18 19:57                       ` Nicolas Pitre
2013-12-18 20:26                         ` Ard Biesheuvel
2013-12-18 21:18                           ` Nicolas Pitre
2013-12-18 21:57                             ` Ard Biesheuvel
2013-12-19  6:48                               ` Siarhei Siamashka
2013-12-19 11:48                                 ` Catalin Marinas
2013-12-20  6:29                                   ` Siarhei Siamashka
2013-12-20 11:27                                     ` Catalin Marinas
2013-12-19 17:33                                 ` Ard Biesheuvel
2013-12-20  1:35                                   ` Siarhei Siamashka [this message]
2013-12-19 18:07                               ` Nicolas Pitre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131220033551.43783672@i7 \
    --to=siarhei.siamashka@gmail.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox