linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Laight <David.Laight@ACULAB.COM>
To: 'Robin Murphy' <robin.murphy@arm.com>,
	Will Deacon <will@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
	"kernel-team@android.com" <kernel-team@android.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Peter Zijlstra <peterz@infradead.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Segher Boessenkool <segher@kernel.crashing.org>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	Luc Van Oostenryck <luc.vanoostenryck@gmail.com>,
	Arnd Bergmann <arnd@arndb.de>,
	Peter Oberparleiter <oberpar@linux.ibm.com>,
	Masahiro Yamada <masahiroy@kernel.org>,
	Nick Desaulniers <ndesaulniers@google.com>
Subject: RE: [PATCH v4 05/11] arm64: csum: Disable KASAN for do_csum()
Date: Fri, 24 Apr 2020 09:41:30 +0000	[thread overview]
Message-ID: <db86e9fa88754d59ac5f8d3f4fe0f9a3@AcuMS.aculab.com> (raw)
In-Reply-To: <6efa0cc1-bd3e-b9b6-4e69-7ac05e6efe35@arm.com>

From: Robin Murphy
> Sent: 22 April 2020 12:02
..
> Sure - I have a nagging feeling that it could still do better WRT
> pipelining the loads anyway, so I'm happy to come back and reconsider
> the local codegen later. It certainly doesn't deserve to stand in the
> way of cross-arch rework.

How fast does that loop actually run?
To my mind it seems to do a lot of operations on each 64bit value.
I'd have thought that a loop based on:
	sum64 = *ptr;
	sum64_high = *ptr++ >> 32;
and then fixing up the result would be faster.

The x86-64 code is also bad!
On intel cpu prior to haswell a simple:
	sum_64 += *ptr32++;
is faster than the current code.
(Although you can do a lot better even on ivy bridge.)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

  reply	other threads:[~2020-04-24  9:41 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-21 15:15 [PATCH v4 00/11] Rework READ_ONCE() to improve codegen Will Deacon
2020-04-21 15:15 ` [PATCH v4 01/11] compiler/gcc: Raise minimum GCC version for kernel builds to 4.8 Will Deacon
2020-04-21 17:15   ` Masahiro Yamada
2020-04-21 15:15 ` [PATCH v4 02/11] netfilter: Avoid assigning 'const' pointer to non-const pointer Will Deacon
2020-04-21 15:15 ` [PATCH v4 03/11] net: tls: " Will Deacon
2020-04-21 15:15 ` [PATCH v4 04/11] fault_inject: Don't rely on "return value" from WRITE_ONCE() Will Deacon
2020-04-21 15:15 ` [PATCH v4 05/11] arm64: csum: Disable KASAN for do_csum() Will Deacon
2020-04-21 15:15   ` Will Deacon
2020-04-22  9:49   ` Mark Rutland
2020-04-22  9:49     ` Mark Rutland
2020-04-22 10:41     ` Will Deacon
2020-04-22 11:01       ` Robin Murphy
2020-04-24  9:41         ` David Laight [this message]
2020-04-24 11:00           ` Robin Murphy
2020-04-24 13:04             ` David Laight
2020-04-24 13:04               ` David Laight
2020-04-21 15:15 ` [PATCH v4 06/11] READ_ONCE: Simplify implementations of {READ,WRITE}_ONCE() Will Deacon
2020-04-21 15:15   ` Will Deacon
2020-04-22  9:51   ` Mark Rutland
2020-04-21 15:15 ` [PATCH v4 07/11] READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses Will Deacon
2020-04-21 15:15   ` Will Deacon
2020-04-24 16:31   ` Jann Horn
2020-04-24 17:11     ` Will Deacon
2020-04-24 17:43       ` Peter Zijlstra
2020-04-21 15:15 ` [PATCH v4 08/11] READ_ONCE: Drop pointer qualifiers when reading from scalar types Will Deacon
2020-04-21 15:15   ` Will Deacon
2020-04-22 10:25   ` Rasmus Villemoes
2020-04-22 11:48     ` Segher Boessenkool
2020-04-22 11:48       ` Segher Boessenkool
2020-04-22 13:11       ` Will Deacon
2020-04-22 13:11         ` Will Deacon
2020-04-22 14:54   ` Will Deacon
2020-04-21 15:15 ` [PATCH v4 09/11] locking/barriers: Use '__unqual_scalar_typeof' for load-acquire macros Will Deacon
2020-04-21 15:15   ` Will Deacon
2020-04-21 15:15 ` [PATCH v4 10/11] arm64: barrier: Use '__unqual_scalar_typeof' for acquire/release macros Will Deacon
2020-04-21 15:15 ` [PATCH v4 11/11] gcov: Remove old GCC 3.4 support Will Deacon
2020-04-21 15:15   ` Will Deacon
2020-04-21 17:19   ` Masahiro Yamada
2020-04-21 18:42 ` [PATCH v4 00/11] Rework READ_ONCE() to improve codegen Linus Torvalds
2020-04-21 18:42   ` Linus Torvalds
2020-04-22  8:18   ` Will Deacon
2020-04-22 11:37     ` Peter Zijlstra
2020-04-22 12:26       ` Will Deacon
2020-04-24 13:42         ` Will Deacon
2020-04-24 15:54           ` Marco Elver
2020-04-24 16:52             ` Will Deacon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=db86e9fa88754d59ac5f8d3f4fe0f9a3@AcuMS.aculab.com \
    --to=david.laight@aculab.com \
    --cc=arnd@arndb.de \
    --cc=borntraeger@de.ibm.com \
    --cc=kernel-team@android.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luc.vanoostenryck@gmail.com \
    --cc=mark.rutland@arm.com \
    --cc=masahiroy@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=ndesaulniers@google.com \
    --cc=oberpar@linux.ibm.com \
    --cc=peterz@infradead.org \
    --cc=robin.murphy@arm.com \
    --cc=segher@kernel.crashing.org \
    --cc=torvalds@linux-foundation.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).