All of lore.kernel.org
 help / color / mirror / Atom feed
From: david laight <david.laight@runbox.com>
To: Eric Biggers <ebiggers@kernel.org>
Cc: linux-crypto@vger.kernel.org, linux-kernel@vger.kernel.org,
	Ard Biesheuvel <ardb@kernel.org>,
	"Jason A . Donenfeld" <Jason@zx2c4.com>,
	Herbert Xu <herbert@gondor.apana.org.au>,
	Thorsten Blum <thorsten.blum@linux.dev>,
	Nathan Chancellor <nathan@kernel.org>,
	Nick Desaulniers <nick.desaulniers+lkml@gmail.com>,
	Bill Wendling <morbo@google.com>,
	Justin Stitt <justinstitt@google.com>,
	David Sterba <dsterba@suse.com>,
	llvm@lists.linux.dev, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] lib/crypto: blake2b: Roll up BLAKE2b round loop on 32-bit
Date: Fri, 26 Dec 2025 20:24:33 +0000	[thread overview]
Message-ID: <20251226202433.107af09a@pumpkin> (raw)
In-Reply-To: <20251205201411.GA1954@quark>

On Fri, 5 Dec 2025 12:14:11 -0800
Eric Biggers <ebiggers@kernel.org> wrote:

> On Fri, Dec 05, 2025 at 02:16:44PM +0000, david laight wrote:
> > Note that executing two G() in parallel probably requires the source
> > interleave the instructions for the two G() rather than relying on the
> > cpu's 'out of order execution' to do all the work
> > (Intel cpu might manage it...).  
> 
> I actually tried that earlier, and it didn't help.  Either the compiler
> interleaved the calculations already, or the CPU did, or both.
> 
> It definitely could use some more investigation to better understand
> exactly what is going on, though.
> 
> You're welcome to take a closer look, if you're interested.

I had a quick look at the objdump output for the 'not unrolled loop'
of blake2s on x86-64 compiled with gcc 12.2.
The generated code seemed reasonable.
A single register tracked the array of offsets for the data buffer.
So on x86 there was a read of the offset then nn(%rsp,%reg,4) to
get the value (%reg,8 for blake2b).
There weren't many spills to stack, I suspect that 14 of the v[]
were assigned to registers - but didn't analyse the entire loop.
The fully unrolled loop is harder to read, but one of the v[] still
needs spilling to stack.

Each 1/2G has at least one memory read and seven ALU operations.
The Intel cpu (Haswell onwards) can execute 4 ALU instructions
every clock - so however well the multiple G get scheduled each
1/2G will be (pretty much) two clocks.
That really means it should be possible to include the second
memory read (for the not-unrolled loop) without slowing things down.
Even if the nn(%rsp,%reg,8) needs an extra ALU operations the change
shouldn't be massive.

Which makes be wonder whether the slowdown for rolling-up the loop
is due to data cache effects rather than actual ALU instructions.

Of course this is x86 and the nn(%rsp,%reg,8) addressing mode helps.
Otherwise you'd want to multiply the offsets by 8 and, ideally, add
in the stack offset of the data[] array allowing the simpler (%sp,%reg)
addressing mode.

I've still not done any timings, on holiday with the wrong computers.

	David


> 
> - Eric
> 


      parent reply	other threads:[~2025-12-26 20:25 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-03 19:06 [PATCH] lib/crypto: blake2b: Roll up BLAKE2b round loop on 32-bit Eric Biggers
2025-12-04  9:05 ` Ard Biesheuvel
2025-12-04 17:56 ` Jason A. Donenfeld
2025-12-05  4:58   ` Eric Biggers
2025-12-05 14:16 ` david laight
2025-12-05 20:14   ` Eric Biggers
2025-12-05 22:04     ` david laight
2025-12-26 20:24     ` david laight [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251226202433.107af09a@pumpkin \
    --to=david.laight@runbox.com \
    --cc=Jason@zx2c4.com \
    --cc=ardb@kernel.org \
    --cc=dsterba@suse.com \
    --cc=ebiggers@kernel.org \
    --cc=herbert@gondor.apana.org.au \
    --cc=justinstitt@google.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=llvm@lists.linux.dev \
    --cc=morbo@google.com \
    --cc=nathan@kernel.org \
    --cc=nick.desaulniers+lkml@gmail.com \
    --cc=thorsten.blum@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.