linux-crypto.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Biggers <ebiggers3@gmail.com>
To: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au
Subject: Re: [RFC PATCH] crypto: algapi - make crypto_xor() and crypto_inc() alignment agnostic
Date: Wed, 1 Feb 2017 22:47:16 -0800	[thread overview]
Message-ID: <20170202064716.GB582@zzz> (raw)
In-Reply-To: <1485785489-5116-1-git-send-email-ard.biesheuvel@linaro.org>

On Mon, Jan 30, 2017 at 02:11:29PM +0000, Ard Biesheuvel wrote:
> Instead of unconditionally forcing 4 byte alignment for all generic
> chaining modes that rely on crypto_xor() or crypto_inc() (which may
> result in unnecessary copying of data when the underlying hardware
> can perform unaligned accesses efficiently), make those functions
> deal with unaligned input explicitly, but only if the Kconfig symbol
> HAVE_EFFICIENT_UNALIGNED_ACCESS is set. This will allow us to drop
> the alignmasks from the CBC, CMAC, CTR, CTS, PCBC and SEQIV drivers.
> 
> For crypto_inc(), this simply involves making the 4-byte stride
> conditional on HAVE_EFFICIENT_UNALIGNED_ACCESS being set, given that
> it typically operates on 16 byte buffers.
> 
> For crypto_xor(), an algorithm is implemented that simply runs through
> the input using the largest strides possible if unaligned accesses are
> allowed. If they are not, an optimal sequence of memory accesses is
> emitted that takes the relative alignment of the input buffers into
> account, e.g., if the relative misalignment of dst and src is 4 bytes,
> the entire xor operation will be completed using 4 byte loads and stores
> (modulo unaligned bits at the start and end). Note that all expressions
> involving startalign and misalign are simply eliminated by the compiler
> if HAVE_EFFICIENT_UNALIGNED_ACCESS is defined.
> 

Hi Ard,

This is a good idea, and I think it was error-prone to be requiring 4-byte
alignment always, and also inefficient on many architectures.

The new crypto_inc() looks fine, but the new crypto_xor() is quite complicated.
I'm wondering whether it has to be that way, especially since it seems to most
commonly be used on very small input buffers, e.g. 8 or 16-byte blocks.  There
are a couple trivial ways it could be simplified, e.g. using 'dst' and 'src'
directly instead of 'a' and 'b' (which also seems to improve code generation by
getting rid of the '+= len & ~mask' parts), or using sizeof(long) directly
instead of 'size' and 'mask'.

But also when I tried testing the proposed crypto_xor() on MIPS, it didn't work
correctly on a misaligned buffer.  With startalign=1, it did one iteration of
the following loop and then exited with startalign=0 and entered the "unsigned
long at a time" loop, which is incorrect since at that point the buffers were
not yet fully aligned:

>		do {
>			if (len < sizeof(u8))
>				break;
>
>			if (len >= size && !(startalign & 1) && !(misalign & 1))
>				break;
>
>			*dst++ ^= *src++;
>			len -= sizeof(u8);
>			startalign &= ~sizeof(u8);
>		} while (misalign & 1);

I think it would need to do instead:

		startalign += sizeof(u8);
		startalign %= sizeof(unsigned long);

But I am wondering whether you considered something simpler, using the
get_unaligned/put_unaligned helpers, maybe even using a switch statement for the
last (sizeof(long) - 1) bytes so it can be compiled as a jump table.  Something
like this:

#define xor_unaligned(dst, src) \
        put_unaligned(get_unaligned(dst) ^ get_unaligned(src), (dst))

void crypto_xor(u8 *dst, const u8 *src, unsigned int len)
{
	while (len >= sizeof(unsigned long)) {
		xor_unaligned((unsigned long *)dst, (unsigned long *)src);
		dst += sizeof(unsigned long);
		src += sizeof(unsigned long);
		len -= sizeof(unsigned long);
	}

	switch (len) {
#ifdef CONFIG_64BIT
	case 7:
		dst[6] ^= src[6];
		/* fall through */
	case 6:
		xor_unaligned((u16 *)&dst[4], (u16 *)&src[4]);
		goto len_4;
	case 5:
		dst[4] ^= src[4];
		/* fall through */
	case 4:
	len_4:
		xor_unaligned((u32 *)dst, (u32 *)src);
		break;
#endif
	case 3:
		dst[2] ^= src[2];
		/* fall through */
	case 2:
		xor_unaligned((u16 *)dst, (u16 *)src);
		break;
	case 1:
		dst[0] ^= src[0];
		break;
	}
}

That would seem like a better choice for small buffers, which seems to be the
more common case.  It should generate slightly faster code on architectures with
fast unaligned access like x86_64, while still being sufficient on architectures
without --- perhaps even faster, since it wouldn't have as many branches.

Eric

  reply	other threads:[~2017-02-02  6:47 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-30 14:11 [RFC PATCH] crypto: algapi - make crypto_xor() and crypto_inc() alignment agnostic Ard Biesheuvel
2017-02-02  6:47 ` Eric Biggers [this message]
2017-02-02  7:52   ` Ard Biesheuvel
2017-02-04 23:00   ` Jason A. Donenfeld
2017-02-04 23:10     ` Jason A. Donenfeld
2017-02-05  3:05       ` Eric Biggers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170202064716.GB582@zzz \
    --to=ebiggers3@gmail.com \
    --cc=ard.biesheuvel@linaro.org \
    --cc=herbert@gondor.apana.org.au \
    --cc=linux-crypto@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).