Linux cryptographic layer development

Linux cryptographic layer development
 help / color / mirror / Atom feed

* Re: [PATCH 0/3] crypto: picoxcell - Cleanups removing non-DT code
From: Herbert Xu @ 2017-01-12 16:39 UTC (permalink / raw)
  To: Javier Martinez Canillas
  Cc: linux-kernel, Arnd Bergmann, Jamie Iles, David S. Miller,
	linux-crypto, linux-arm-kernel
In-Reply-To: <1483376819-26726-1-git-send-email-javier@osg.samsung.com>

On Mon, Jan 02, 2017 at 02:06:56PM -0300, Javier Martinez Canillas wrote:
> Hello,
> 
> This small series contains a couple of cleanups that removes some driver's code
> that isn't needed due the driver being for a DT-only platform.
> 
> The changes were suggested by Arnd Bergmann as a response to a previous patch:
> https://lkml.org/lkml/2017/1/2/342
> 
> Patch #1 allows the driver to be built when the COMPILE_TEST option is enabled.
> Patch #2 removes the platform ID table since isn't needed for DT-only drivers.
> Patch #3 removes a wrapper function that's also not needed if driver is DT-only.

All applied.  Thanks.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH] crypto: mediatek: don't return garbage err on successful return
From: Herbert Xu @ 2017-01-12 16:39 UTC (permalink / raw)
  To: Colin King
  Cc: David S . Miller, Matthias Brugger, Ryder Lee, linux-crypto,
	linux-arm-kernel, linux-mediatek, linux-kernel
In-Reply-To: <20170103132122.26900-1-colin.king@canonical.com>

On Tue, Jan 03, 2017 at 01:21:22PM +0000, Colin King wrote:
> From: Colin Ian King <colin.king@canonical.com>
> 
> In the case where keylen <= bs mtk_sha_setkey returns an uninitialized
> return value in err.  Fix this by returning 0 instead of err.
> 
> Issue detected by static analysis with cppcheck.
> 
> Signed-off-by: Colin Ian King <colin.king@canonical.com>

Patch applied.  Thanks.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH v1 3/8] crypto:chcr- Fix key length for RFC4106
From: Herbert Xu @ 2017-01-12 16:42 UTC (permalink / raw)
  To: Harsh Jain; +Cc: hariprasad, netdev, linux-crypto
In-Reply-To: <648404c5-4234-7895-d478-858433b0d093@chelsio.com>

On Thu, Jan 12, 2017 at 10:08:46PM +0530, Harsh Jain wrote:
>
> That case is already handled in next if condition.It will error out with -EINVAL in next condition.
> 
> if (keylen == AES_KEYSIZE_128) {

Good point.  Please split the patches according to whether they
should go into 4.10/4.11 and then resubmit.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH v2 8/8] crypto/testmgr: Allocate only the required output size for hash tests
From: Herbert Xu @ 2017-01-12 16:44 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Daniel Borkmann, Netdev, LKML, Linux Crypto Mailing List,
	Jason A. Donenfeld, Hannes Frederic Sowa, Alexei Starovoitov,
	Eric Dumazet, Eric Biggers, Tom Herbert, David S. Miller,
	Ard Biesheuvel
In-Reply-To: <890f4bdb28a1cf72f6b802b220b35ebaf0f76bb9.1484090585.git.luto@kernel.org>

On Tue, Jan 10, 2017 at 03:24:46PM -0800, Andy Lutomirski wrote:
> There are some hashes (e.g. sha224) that have some internal trickery
> to make sure that only the correct number of output bytes are
> generated.  If something goes wrong, they could potentially overrun
> the output buffer.
> 
> Make the test more robust by allocating only enough space for the
> correct output size so that memory debugging will catch the error if
> the output is overrun.
> 
> Tested by intentionally breaking sha224 to output all 256
> internally-generated bits while running on KASAN.
> 
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Signed-off-by: Andy Lutomirski <luto@kernel.org>

Patch applied.  Thanks.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH 2/2] crypto: mediatek - fix format string for 64-bit builds
From: Herbert Xu @ 2017-01-12 16:44 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: David S. Miller, Matthias Brugger, Ryder Lee, linux-crypto,
	linux-arm-kernel, linux-mediatek, linux-kernel
In-Reply-To: <20170111135601.4047225-1-arnd@arndb.de>

On Wed, Jan 11, 2017 at 02:55:20PM +0100, Arnd Bergmann wrote:
> After I enabled COMPILE_TEST for non-ARM targets, I ran into these
> warnings:
> 
> crypto/mediatek/mtk-aes.c: In function 'mtk_aes_info_map':
> crypto/mediatek/mtk-aes.c:224:28: error: format '%d' expects argument of type 'int', but argument 3 has type 'long unsigned int' [-Werror=format=]
>    dev_err(cryp->dev, "dma %d bytes error\n", sizeof(*info));
> crypto/mediatek/mtk-sha.c:344:28: error: format '%d' expects argument of type 'int', but argument 3 has type 'long unsigned int' [-Werror=format=]
> crypto/mediatek/mtk-sha.c:550:21: error: format '%u' expects argument of type 'unsigned int', but argument 4 has type 'size_t {aka long unsigned int}' [-Werror=format=]
> 
> The correct format for size_t is %zu, so use that in all three
> cases.
> 
> Fixes: 785e5c616c84 ("crypto: mediatek - Add crypto driver support for some MediaTek chips")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Patch applied.  Thanks.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH v2 0/7] crypto: ARM/arm64 - AES and ChaCha20 updates for v4.11
From: Herbert Xu @ 2017-01-12 16:45 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: linux-crypto, linux-arm-kernel
In-Reply-To: <1484152915-26517-1-git-send-email-ard.biesheuvel@linaro.org>

On Wed, Jan 11, 2017 at 04:41:48PM +0000, Ard Biesheuvel wrote:
> This adds ARM and arm64 implementations of ChaCha20, scalar AES and SIMD
> AES (using bit slicing). The SIMD algorithms in this series take advantage
> of the new skcipher walksize attribute to iterate over the input in the most
> efficient manner possible.
> 
> Patch #1 adds a NEON implementation of ChaCha20 for ARM.
> 
> Patch #2 adds a NEON implementation of ChaCha20 for arm64.
> 
> Patch #3 modifies the existing NEON and ARMv8 Crypto Extensions implementations
> of AES-CTR to be available as a synchronous skcipher as well. This is intended
> for the mac80211 code, which uses synchronous encapsulations of ctr(aes)
> [ccm, gcm] in softirq context, during which arm64 supports use of SIMD code.
> 
> Patch #4 adds a scalar implementation of AES for arm64, using the key schedule
> generation routines and lookup tables of the generic code in crypto/aes_generic.
> 
> Patch #5 does the same for ARM, replacing existing scalar code that originated
> in the OpenSSL project, and contains redundant key schedule generation routines
> and lookup tables (and is slightly slower on modern cores)
> 
> Patch #6 replaces the ARM bit sliced NEON code with a new implementation that
> has a number of advantages over the original code (which also originated in the
> OpenSSL project.) The performance should be identical.
> 
> Patch #7 adds a port of the ARM bit-sliced AES code to arm64, in ECB, CBC, CTR
> and XTS modes.
> 
> Due to the size of patch #7, it may be difficult to apply these patches from
> patchwork, so I pushed them here as well:

It seems to have made it.

All applied.  Thanks.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH 1/2] crypto: mediatek - remove ARM dependencies
From: Herbert Xu @ 2017-01-12 16:44 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: David S. Miller, Matthias Brugger, Ryder Lee, linux-crypto,
	linux-kernel, linux-arm-kernel, linux-mediatek
In-Reply-To: <20170111135104.3961730-1-arnd@arndb.de>

On Wed, Jan 11, 2017 at 02:50:19PM +0100, Arnd Bergmann wrote:
> Building the mediatek driver on an older ARM architecture results in a
> harmless warning:
> 
> warning: (ARCH_OMAP2PLUS_TYPICAL && CRYPTO_DEV_MEDIATEK) selects NEON which has unmet direct dependencies (VFPv3 && CPU_V7)
> 
> We could add an explicit dependency on CPU_V7, but it seems nicer to
> open up the build to additional configurations. This replaces the ARM
> optimized algorithm selection with the normal one that all other drivers
> use, and that in turn lets us relax the dependency on ARM and drop
> a number of the unrelated 'select' statements.
> 
> Obviously a real user would still select those other optimized drivers
> as a fallback, but as there is no strict dependency, we can leave that
> up to the user.
> 
> Fixes: 785e5c616c84 ("crypto: mediatek - Add crypto driver support for some MediaTek chips")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Patch applied.  Thanks.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH v2 0/7] crypto: ARM/arm64 - AES and ChaCha20 updates for v4.11
From: Ard Biesheuvel @ 2017-01-12 16:48 UTC (permalink / raw)
  To: Herbert Xu
  Cc: linux-crypto@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <20170112164504.GD20313@gondor.apana.org.au>

On 12 January 2017 at 16:45, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Wed, Jan 11, 2017 at 04:41:48PM +0000, Ard Biesheuvel wrote:
>> This adds ARM and arm64 implementations of ChaCha20, scalar AES and SIMD
>> AES (using bit slicing). The SIMD algorithms in this series take advantage
>> of the new skcipher walksize attribute to iterate over the input in the most
>> efficient manner possible.
>>
>> Patch #1 adds a NEON implementation of ChaCha20 for ARM.
>>
>> Patch #2 adds a NEON implementation of ChaCha20 for arm64.
>>
>> Patch #3 modifies the existing NEON and ARMv8 Crypto Extensions implementations
>> of AES-CTR to be available as a synchronous skcipher as well. This is intended
>> for the mac80211 code, which uses synchronous encapsulations of ctr(aes)
>> [ccm, gcm] in softirq context, during which arm64 supports use of SIMD code.
>>
>> Patch #4 adds a scalar implementation of AES for arm64, using the key schedule
>> generation routines and lookup tables of the generic code in crypto/aes_generic.
>>
>> Patch #5 does the same for ARM, replacing existing scalar code that originated
>> in the OpenSSL project, and contains redundant key schedule generation routines
>> and lookup tables (and is slightly slower on modern cores)
>>
>> Patch #6 replaces the ARM bit sliced NEON code with a new implementation that
>> has a number of advantages over the original code (which also originated in the
>> OpenSSL project.) The performance should be identical.
>>
>> Patch #7 adds a port of the ARM bit-sliced AES code to arm64, in ECB, CBC, CTR
>> and XTS modes.
>>
>> Due to the size of patch #7, it may be difficult to apply these patches from
>> patchwork, so I pushed them here as well:
>
> It seems to have made it.
>
> All applied.  Thanks.

Actually, patch #6 was the huge one not #7, and I don't see it in your tree yet.

https://git.kernel.org/cgit/linux/kernel/git/ardb/linux.git/commit/?h=crypto-arm-v4.11&id=cbf03b255f7c

The order does not matter, though, so could you please put it on top? Thanks.

-- 
Ard.

^ permalink raw reply

* [PATCH 0/4] n2rng: add support for m5/m7 rng register layout
From: Shannon Nelson @ 2017-01-12 18:52 UTC (permalink / raw)
  To: linux-crypto; +Cc: sparclinux, herbert, linux-kernel, Shannon Nelson

Commit c1e9b3b0eea1 ("hwrng: n2 - Attach on T5/M5, T7/M7 SPARC CPUs")
added config strings to enable the random number generator in the sparc
m5 and m7 platforms.  This worked fine for client LDoms, but not for the
primary LDom, or running on bare metal, because the actual rng hardware
layout changed and self-test would now fail, continually spewing error
messages on the console.

This patch series adds correct support for the new rng register layout,
and adds a limiter to the spewing of error messages.

Orabug: 25127795

Shannon Nelson (4):
  n2rng: limit error spewage when self-test fails
  n2rng: add device data descriptions
  n2rng: support new hardware register layout
  n2rng: update version info

 drivers/char/hw_random/n2-drv.c |  204 +++++++++++++++++++++++++++++----------
 drivers/char/hw_random/n2rng.h  |   51 ++++++++--
 2 files changed, 196 insertions(+), 59 deletions(-)

^ permalink raw reply

* [PATCH 1/4] n2rng: limit error spewage when self-test fails
From: Shannon Nelson @ 2017-01-12 18:52 UTC (permalink / raw)
  To: linux-crypto; +Cc: sparclinux, herbert, linux-kernel, Shannon Nelson
In-Reply-To: <1484247169-245086-1-git-send-email-shannon.nelson@oracle.com>

If the self-test fails, it probably won't actually suddenly
start working.  Currently, this causes an endless spew of
error messages on the console and in the logs, so this patch
adds a limiter to the test.

Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
---
 drivers/char/hw_random/n2-drv.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/drivers/char/hw_random/n2-drv.c b/drivers/char/hw_random/n2-drv.c
index 3b06c1d..102560f 100644
--- a/drivers/char/hw_random/n2-drv.c
+++ b/drivers/char/hw_random/n2-drv.c
@@ -589,6 +589,7 @@ static void n2rng_work(struct work_struct *work)
 {
 	struct n2rng *np = container_of(work, struct n2rng, work.work);
 	int err = 0;
+	static int retries = 4;
 
 	if (!(np->flags & N2RNG_FLAG_CONTROL)) {
 		err = n2rng_guest_check(np);
@@ -606,7 +607,9 @@ static void n2rng_work(struct work_struct *work)
 		dev_info(&np->op->dev, "RNG ready\n");
 	}
 
-	if (err && !(np->flags & N2RNG_FLAG_SHUTDOWN))
+	if (--retries == 0)
+		dev_err(&np->op->dev, "Self-test retries failed, RNG not ready\n");
+	else if (err && !(np->flags & N2RNG_FLAG_SHUTDOWN))
 		schedule_delayed_work(&np->work, HZ * 2);
 }
 
-- 
1.7.1

^ permalink raw reply related

* [PATCH 2/4] n2rng: add device data descriptions
From: Shannon Nelson @ 2017-01-12 18:52 UTC (permalink / raw)
  To: linux-crypto; +Cc: sparclinux, herbert, linux-kernel, Shannon Nelson
In-Reply-To: <1484247169-245086-1-git-send-email-shannon.nelson@oracle.com>

Since we're going to need to keep track of more than just one
attribute of the hardware, we'll change the use of the data field
from the match struct from a single flag to a struct pointer.
This patch adds the struct template and initial descriptions.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
---
 drivers/char/hw_random/n2-drv.c |   47 ++++++++++++++++++++++++++++++++------
 drivers/char/hw_random/n2rng.h  |   15 ++++++++++++
 2 files changed, 54 insertions(+), 8 deletions(-)

diff --git a/drivers/char/hw_random/n2-drv.c b/drivers/char/hw_random/n2-drv.c
index 102560f..74c26c7 100644
--- a/drivers/char/hw_random/n2-drv.c
+++ b/drivers/char/hw_random/n2-drv.c
@@ -625,24 +625,23 @@ static void n2rng_driver_version(void)
 static int n2rng_probe(struct platform_device *op)
 {
 	const struct of_device_id *match;
-	int multi_capable;
 	int err = -ENOMEM;
 	struct n2rng *np;
 
 	match = of_match_device(n2rng_match, &op->dev);
 	if (!match)
 		return -EINVAL;
-	multi_capable = (match->data != NULL);
 
 	n2rng_driver_version();
 	np = devm_kzalloc(&op->dev, sizeof(*np), GFP_KERNEL);
 	if (!np)
 		goto out;
 	np->op = op;
+	np->data = (struct n2rng_template *)match->data;
 
 	INIT_DELAYED_WORK(&np->work, n2rng_work);
 
-	if (multi_capable)
+	if (np->data->multi_capable)
 		np->flags |= N2RNG_FLAG_MULTI;
 
 	err = -ENODEV;
@@ -673,8 +672,9 @@ static int n2rng_probe(struct platform_device *op)
 			dev_err(&op->dev, "VF RNG lacks rng-#units property\n");
 			goto out_hvapi_unregister;
 		}
-	} else
+	} else {
 		np->num_units = 1;
+	}
 
 	dev_info(&op->dev, "Registered RNG HVAPI major %lu minor %lu\n",
 		 np->hvapi_major, np->hvapi_minor);
@@ -731,30 +731,61 @@ static int n2rng_remove(struct platform_device *op)
 	return 0;
 }
 
+static struct n2rng_template n2_template = {
+	.id = N2_n2_rng,
+	.multi_capable = 0,
+	.chip_version = 1,
+};
+
+static struct n2rng_template vf_template = {
+	.id = N2_vf_rng,
+	.multi_capable = 1,
+	.chip_version = 1,
+};
+
+static struct n2rng_template kt_template = {
+	.id = N2_kt_rng,
+	.multi_capable = 1,
+	.chip_version = 1,
+};
+
+static struct n2rng_template m4_template = {
+	.id = N2_m4_rng,
+	.multi_capable = 1,
+	.chip_version = 2,
+};
+
+static struct n2rng_template m7_template = {
+	.id = N2_m7_rng,
+	.multi_capable = 1,
+	.chip_version = 2,
+};
+
 static const struct of_device_id n2rng_match[] = {
 	{
 		.name		= "random-number-generator",
 		.compatible	= "SUNW,n2-rng",
+		.data		= &n2_template,
 	},
 	{
 		.name		= "random-number-generator",
 		.compatible	= "SUNW,vf-rng",
-		.data		= (void *) 1,
+		.data		= &vf_template,
 	},
 	{
 		.name		= "random-number-generator",
 		.compatible	= "SUNW,kt-rng",
-		.data		= (void *) 1,
+		.data		= &kt_template,
 	},
 	{
 		.name		= "random-number-generator",
 		.compatible	= "ORCL,m4-rng",
-		.data		= (void *) 1,
+		.data		= &m4_template,
 	},
 	{
 		.name		= "random-number-generator",
 		.compatible	= "ORCL,m7-rng",
-		.data		= (void *) 1,
+		.data		= &m7_template,
 	},
 	{},
 };
diff --git a/drivers/char/hw_random/n2rng.h b/drivers/char/hw_random/n2rng.h
index f244ac8..e41e55a 100644
--- a/drivers/char/hw_random/n2rng.h
+++ b/drivers/char/hw_random/n2rng.h
@@ -60,6 +60,20 @@ extern unsigned long sun4v_rng_data_read_diag_v2(unsigned long data_ra,
 extern unsigned long sun4v_rng_data_read(unsigned long data_ra,
 					 unsigned long *tick_delta);
 
+enum n2rng_compat_id {
+	N2_n2_rng,
+	N2_vf_rng,
+	N2_kt_rng,
+	N2_m4_rng,
+	N2_m7_rng,
+};
+
+struct n2rng_template {
+	enum n2rng_compat_id id;
+	int multi_capable;
+	int chip_version;
+};
+
 struct n2rng_unit {
 	u64			control[HV_RNG_NUM_CONTROL];
 };
@@ -74,6 +88,7 @@ struct n2rng {
 #define N2RNG_FLAG_SHUTDOWN	0x00000010 /* Driver unregistering        */
 #define N2RNG_FLAG_BUFFER_VALID	0x00000020 /* u32 buffer holds valid data */
 
+	struct n2rng_template	*data;
 	int			num_units;
 	struct n2rng_unit	*units;
 
-- 
1.7.1

^ permalink raw reply related

* [PATCH 3/4] n2rng: support new hardware register layout
From: Shannon Nelson @ 2017-01-12 18:52 UTC (permalink / raw)
  To: linux-crypto; +Cc: sparclinux, herbert, linux-kernel, Shannon Nelson
In-Reply-To: <1484247169-245086-1-git-send-email-shannon.nelson@oracle.com>

Add the new register layout constants and the requisite logic
for using them.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
---
 drivers/char/hw_random/n2-drv.c |  144 +++++++++++++++++++++++++++++----------
 drivers/char/hw_random/n2rng.h  |   36 +++++++---
 2 files changed, 134 insertions(+), 46 deletions(-)

diff --git a/drivers/char/hw_random/n2-drv.c b/drivers/char/hw_random/n2-drv.c
index 74c26c7..f0bd5ee 100644
--- a/drivers/char/hw_random/n2-drv.c
+++ b/drivers/char/hw_random/n2-drv.c
@@ -302,26 +302,57 @@ static int n2rng_try_read_ctl(struct n2rng *np)
 	return n2rng_hv_err_trans(hv_err);
 }
 
-#define CONTROL_DEFAULT_BASE		\
-	((2 << RNG_CTL_ASEL_SHIFT) |	\
-	 (N2RNG_ACCUM_CYCLES_DEFAULT << RNG_CTL_WAIT_SHIFT) |	\
-	 RNG_CTL_LFSR)
-
-#define CONTROL_DEFAULT_0		\
-	(CONTROL_DEFAULT_BASE |		\
-	 (1 << RNG_CTL_VCO_SHIFT) |	\
-	 RNG_CTL_ES1)
-#define CONTROL_DEFAULT_1		\
-	(CONTROL_DEFAULT_BASE |		\
-	 (2 << RNG_CTL_VCO_SHIFT) |	\
-	 RNG_CTL_ES2)
-#define CONTROL_DEFAULT_2		\
-	(CONTROL_DEFAULT_BASE |		\
-	 (3 << RNG_CTL_VCO_SHIFT) |	\
-	 RNG_CTL_ES3)
-#define CONTROL_DEFAULT_3		\
-	(CONTROL_DEFAULT_BASE |		\
-	 RNG_CTL_ES1 | RNG_CTL_ES2 | RNG_CTL_ES3)
+static u64 n2rng_control_default(struct n2rng *np, int ctl)
+{
+	u64 val = 0;
+
+	if (np->data->chip_version == 1) {
+		val = ((2 << RNG_v1_CTL_ASEL_SHIFT) |
+			(N2RNG_ACCUM_CYCLES_DEFAULT << RNG_v1_CTL_WAIT_SHIFT) |
+			 RNG_CTL_LFSR);
+
+		switch (ctl) {
+		case 0:
+			val |= (1 << RNG_v1_CTL_VCO_SHIFT) | RNG_CTL_ES1;
+			break;
+		case 1:
+			val |= (2 << RNG_v1_CTL_VCO_SHIFT) | RNG_CTL_ES2;
+			break;
+		case 2:
+			val |= (3 << RNG_v1_CTL_VCO_SHIFT) | RNG_CTL_ES3;
+			break;
+		case 3:
+			val |= RNG_CTL_ES1 | RNG_CTL_ES2 | RNG_CTL_ES3;
+			break;
+		default:
+			break;
+		}
+
+	} else {
+		val = ((2 << RNG_v2_CTL_ASEL_SHIFT) |
+			(N2RNG_ACCUM_CYCLES_DEFAULT << RNG_v2_CTL_WAIT_SHIFT) |
+			 RNG_CTL_LFSR);
+
+		switch (ctl) {
+		case 0:
+			val |= (1 << RNG_v2_CTL_VCO_SHIFT) | RNG_CTL_ES1;
+			break;
+		case 1:
+			val |= (2 << RNG_v2_CTL_VCO_SHIFT) | RNG_CTL_ES2;
+			break;
+		case 2:
+			val |= (3 << RNG_v2_CTL_VCO_SHIFT) | RNG_CTL_ES3;
+			break;
+		case 3:
+			val |= RNG_CTL_ES1 | RNG_CTL_ES2 | RNG_CTL_ES3;
+			break;
+		default:
+			break;
+		}
+	}
+
+	return val;
+}
 
 static void n2rng_control_swstate_init(struct n2rng *np)
 {
@@ -336,10 +367,10 @@ static void n2rng_control_swstate_init(struct n2rng *np)
 	for (i = 0; i < np->num_units; i++) {
 		struct n2rng_unit *up = &np->units[i];
 
-		up->control[0] = CONTROL_DEFAULT_0;
-		up->control[1] = CONTROL_DEFAULT_1;
-		up->control[2] = CONTROL_DEFAULT_2;
-		up->control[3] = CONTROL_DEFAULT_3;
+		up->control[0] = n2rng_control_default(np, 0);
+		up->control[1] = n2rng_control_default(np, 1);
+		up->control[2] = n2rng_control_default(np, 2);
+		up->control[3] = n2rng_control_default(np, 3);
 	}
 
 	np->hv_state = HV_RNG_STATE_UNCONFIGURED;
@@ -399,6 +430,7 @@ static int n2rng_data_read(struct hwrng *rng, u32 *data)
 	} else {
 		int err = n2rng_generic_read_data(ra);
 		if (!err) {
+			np->flags |= N2RNG_FLAG_BUFFER_VALID;
 			np->buffer = np->test_data >> 32;
 			*data = np->test_data & 0xffffffff;
 			len = 4;
@@ -487,9 +519,21 @@ static void n2rng_dump_test_buffer(struct n2rng *np)
 
 static int n2rng_check_selftest_buffer(struct n2rng *np, unsigned long unit)
 {
-	u64 val = SELFTEST_VAL;
+	u64 val;
 	int err, matches, limit;
 
+	switch (np->data->id) {
+	case N2_n2_rng:
+	case N2_vf_rng:
+	case N2_kt_rng:
+	case N2_m4_rng:  /* yes, m4 uses the old value */
+		val = RNG_v1_SELFTEST_VAL;
+		break;
+	default:
+		val = RNG_v2_SELFTEST_VAL;
+		break;
+	}
+
 	matches = 0;
 	for (limit = 0; limit < SELFTEST_LOOPS_MAX; limit++) {
 		matches += n2rng_test_buffer_find(np, val);
@@ -512,14 +556,32 @@ static int n2rng_check_selftest_buffer(struct n2rng *np, unsigned long unit)
 static int n2rng_control_selftest(struct n2rng *np, unsigned long unit)
 {
 	int err;
+	u64 base, base3;
+
+	switch (np->data->id) {
+	case N2_n2_rng:
+	case N2_vf_rng:
+	case N2_kt_rng:
+		base = RNG_v1_CTL_ASEL_NOOUT << RNG_v1_CTL_ASEL_SHIFT;
+		base3 = base | RNG_CTL_LFSR |
+			((RNG_v1_SELFTEST_TICKS - 2) << RNG_v1_CTL_WAIT_SHIFT);
+		break;
+	case N2_m4_rng:
+		base = RNG_v2_CTL_ASEL_NOOUT << RNG_v2_CTL_ASEL_SHIFT;
+		base3 = base | RNG_CTL_LFSR |
+			((RNG_v1_SELFTEST_TICKS - 2) << RNG_v2_CTL_WAIT_SHIFT);
+		break;
+	default:
+		base = RNG_v2_CTL_ASEL_NOOUT << RNG_v2_CTL_ASEL_SHIFT;
+		base3 = base | RNG_CTL_LFSR |
+			(RNG_v2_SELFTEST_TICKS << RNG_v2_CTL_WAIT_SHIFT);
+		break;
+	}
 
-	np->test_control[0] = (0x2 << RNG_CTL_ASEL_SHIFT);
-	np->test_control[1] = (0x2 << RNG_CTL_ASEL_SHIFT);
-	np->test_control[2] = (0x2 << RNG_CTL_ASEL_SHIFT);
-	np->test_control[3] = ((0x2 << RNG_CTL_ASEL_SHIFT) |
-			       RNG_CTL_LFSR |
-			       ((SELFTEST_TICKS - 2) << RNG_CTL_WAIT_SHIFT));
-
+	np->test_control[0] = base;
+	np->test_control[1] = base;
+	np->test_control[2] = base;
+	np->test_control[3] = base3;
 
 	err = n2rng_entropy_diag_read(np, unit, np->test_control,
 				      HV_RNG_STATE_HEALTHCHECK,
@@ -557,11 +619,19 @@ static int n2rng_control_configure_units(struct n2rng *np)
 		struct n2rng_unit *up = &np->units[unit];
 		unsigned long ctl_ra = __pa(&up->control[0]);
 		int esrc;
-		u64 base;
+		u64 base, shift;
 
-		base = ((np->accum_cycles << RNG_CTL_WAIT_SHIFT) |
-			(2 << RNG_CTL_ASEL_SHIFT) |
-			RNG_CTL_LFSR);
+		if (np->data->chip_version == 1) {
+			base = ((np->accum_cycles << RNG_v1_CTL_WAIT_SHIFT) |
+			      (RNG_v1_CTL_ASEL_NOOUT << RNG_v1_CTL_ASEL_SHIFT) |
+			      RNG_CTL_LFSR);
+			shift = RNG_v1_CTL_VCO_SHIFT;
+		} else {
+			base = ((np->accum_cycles << RNG_v2_CTL_WAIT_SHIFT) |
+			      (RNG_v2_CTL_ASEL_NOOUT << RNG_v2_CTL_ASEL_SHIFT) |
+			      RNG_CTL_LFSR);
+			shift = RNG_v2_CTL_VCO_SHIFT;
+		}
 
 		/* XXX This isn't the best.  We should fetch a bunch
 		 * XXX of words using each entropy source combined XXX
@@ -570,7 +640,7 @@ static int n2rng_control_configure_units(struct n2rng *np)
 		 */
 		for (esrc = 0; esrc < 3; esrc++)
 			up->control[esrc] = base |
-				(esrc << RNG_CTL_VCO_SHIFT) |
+				(esrc << shift) |
 				(RNG_CTL_ES1 << esrc);
 
 		up->control[3] = base |
diff --git a/drivers/char/hw_random/n2rng.h b/drivers/char/hw_random/n2rng.h
index e41e55a..6bad6cc 100644
--- a/drivers/char/hw_random/n2rng.h
+++ b/drivers/char/hw_random/n2rng.h
@@ -6,18 +6,34 @@
 #ifndef _N2RNG_H
 #define _N2RNG_H
 
-#define RNG_CTL_WAIT       0x0000000001fffe00ULL /* Minimum wait time       */
-#define RNG_CTL_WAIT_SHIFT 9
-#define RNG_CTL_BYPASS     0x0000000000000100ULL /* VCO voltage source      */
-#define RNG_CTL_VCO        0x00000000000000c0ULL /* VCO rate control        */
-#define RNG_CTL_VCO_SHIFT  6
-#define RNG_CTL_ASEL       0x0000000000000030ULL /* Analog MUX select       */
-#define RNG_CTL_ASEL_SHIFT 4
+/* ver1 devices - n2-rng, vf-rng, kt-rng */
+#define RNG_v1_CTL_WAIT       0x0000000001fffe00ULL /* Minimum wait time    */
+#define RNG_v1_CTL_WAIT_SHIFT 9
+#define RNG_v1_CTL_BYPASS     0x0000000000000100ULL /* VCO voltage source   */
+#define RNG_v1_CTL_VCO        0x00000000000000c0ULL /* VCO rate control     */
+#define RNG_v1_CTL_VCO_SHIFT  6
+#define RNG_v1_CTL_ASEL       0x0000000000000030ULL /* Analog MUX select    */
+#define RNG_v1_CTL_ASEL_SHIFT 4
+#define RNG_v1_CTL_ASEL_NOOUT 2
+
+/* these are the same in v2 as in v1 */
 #define RNG_CTL_LFSR       0x0000000000000008ULL /* Use LFSR or plain shift */
 #define RNG_CTL_ES3        0x0000000000000004ULL /* Enable entropy source 3 */
 #define RNG_CTL_ES2        0x0000000000000002ULL /* Enable entropy source 2 */
 #define RNG_CTL_ES1        0x0000000000000001ULL /* Enable entropy source 1 */
 
+/* ver2 devices - m4-rng, m7-rng */
+#define RNG_v2_CTL_WAIT       0x0000000007fff800ULL /* Minimum wait time    */
+#define RNG_v2_CTL_WAIT_SHIFT 12
+#define RNG_v2_CTL_BYPASS     0x0000000000000400ULL /* VCO voltage source   */
+#define RNG_v2_CTL_VCO        0x0000000000000300ULL /* VCO rate control     */
+#define RNG_v2_CTL_VCO_SHIFT  9
+#define RNG_v2_CTL_PERF       0x0000000000000180ULL /* Perf */
+#define RNG_v2_CTL_ASEL       0x0000000000000070ULL /* Analog MUX select    */
+#define RNG_v2_CTL_ASEL_SHIFT 4
+#define RNG_v2_CTL_ASEL_NOOUT 7
+
+
 #define HV_FAST_RNG_GET_DIAG_CTL	0x130
 #define HV_FAST_RNG_CTL_READ		0x131
 #define HV_FAST_RNG_CTL_WRITE		0x132
@@ -112,8 +128,10 @@ struct n2rng {
 
 	u64			scratch_control[HV_RNG_NUM_CONTROL];
 
-#define SELFTEST_TICKS		38859
-#define SELFTEST_VAL		((u64)0xB8820C7BD387E32C)
+#define RNG_v1_SELFTEST_TICKS	38859
+#define RNG_v1_SELFTEST_VAL	((u64)0xB8820C7BD387E32C)
+#define RNG_v2_SELFTEST_TICKS	64
+#define RNG_v2_SELFTEST_VAL	((u64)0xffffffffffffffff)
 #define SELFTEST_POLY		((u64)0x231DCEE91262B8A3)
 #define SELFTEST_MATCH_GOAL	6
 #define SELFTEST_LOOPS_MAX	40000
-- 
1.7.1

^ permalink raw reply related

* [PATCH 4/4] n2rng: update version info
From: Shannon Nelson @ 2017-01-12 18:52 UTC (permalink / raw)
  To: linux-crypto; +Cc: sparclinux, herbert, linux-kernel, Shannon Nelson
In-Reply-To: <1484247169-245086-1-git-send-email-shannon.nelson@oracle.com>

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
---
 drivers/char/hw_random/n2-drv.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/char/hw_random/n2-drv.c b/drivers/char/hw_random/n2-drv.c
index f0bd5ee..31cbdbb 100644
--- a/drivers/char/hw_random/n2-drv.c
+++ b/drivers/char/hw_random/n2-drv.c
@@ -21,11 +21,11 @@
 
 #define DRV_MODULE_NAME		"n2rng"
 #define PFX DRV_MODULE_NAME	": "
-#define DRV_MODULE_VERSION	"0.2"
-#define DRV_MODULE_RELDATE	"July 27, 2011"
+#define DRV_MODULE_VERSION	"0.3"
+#define DRV_MODULE_RELDATE	"Jan 7, 2017"
 
 static char version[] =
-	DRV_MODULE_NAME ".c:v" DRV_MODULE_VERSION " (" DRV_MODULE_RELDATE ")\n";
+	DRV_MODULE_NAME " v" DRV_MODULE_VERSION " (" DRV_MODULE_RELDATE ")\n";
 
 MODULE_AUTHOR("David S. Miller (davem@davemloft.net)");
 MODULE_DESCRIPTION("Niagara2 RNG driver");
@@ -765,7 +765,7 @@ static int n2rng_probe(struct platform_device *op)
 		  "multi-unit-capable" : "single-unit"),
 		 np->num_units);
 
-	np->hwrng.name = "n2rng";
+	np->hwrng.name = DRV_MODULE_NAME;
 	np->hwrng.data_read = n2rng_data_read;
 	np->hwrng.priv = (unsigned long) np;
 
-- 
1.7.1

^ permalink raw reply related

* [cryptodev:master 43/44] arch/arm/crypto/aes-cipher-core.S:21: Error: selected processor does not support `tt .req ip' in ARM mode
From: kbuild test robot @ 2017-01-12 19:04 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: kbuild-all, linux-crypto, Herbert Xu

[-- Attachment #1: Type: text/plain, Size: 7214 bytes --]

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git master
head:   1abee99eafab67fb1c98f9ecfc43cd5735384a86
commit: 81edb42629758bacdf813dd5e4542ae26e3ad73a [43/44] crypto: arm/aes - replace scalar AES cipher
config: arm-multi_v7_defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        git checkout 81edb42629758bacdf813dd5e4542ae26e3ad73a
        # save the attached .config to linux build tree
        make.cross ARCH=arm 

All errors (new ones prefixed by >>):

   arch/arm/crypto/aes-cipher-core.S: Assembler messages:
>> arch/arm/crypto/aes-cipher-core.S:21: Error: selected processor does not support `tt .req ip' in ARM mode
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `movw tt,#:lower16:crypto_ft_tab'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `movt tt,#:upper16:crypto_ft_tab'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr r8,[tt,r8,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr t0,[tt,t0,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr r9,[tt,r9,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr t1,[tt,t1,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr t2,[tt,t2,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr r10,[tt,r10,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr t0,[tt,t0,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr r11,[tt,r11,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr r10,[tt,r10,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr t0,[tt,t0,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr r11,[tt,r11,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr t1,[tt,t1,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr t2,[tt,t2,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr r5,[tt,r5,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr t0,[tt,t0,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr r6,[tt,r6,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr r4,[tt,r4,lsl#2]'

vim +21 arch/arm/crypto/aes-cipher-core.S

    15		.align		5
    16	
    17		rk		.req	r0
    18		rounds		.req	r1
    19		in		.req	r2
    20		out		.req	r3
  > 21		tt		.req	ip
    22	
    23		t0		.req	lr
    24		t1		.req	r2
    25		t2		.req	r3
    26	
    27		.macro		__select, out, in, idx
    28		.if		__LINUX_ARM_ARCH__ < 7
    29		and		\out, \in, #0xff << (8 * \idx)
    30		.else
    31		ubfx		\out, \in, #(8 * \idx), #8
    32		.endif
    33		.endm
    34	
    35		.macro		__load, out, in, idx
    36		.if		__LINUX_ARM_ARCH__ < 7 && \idx > 0
    37		ldr		\out, [tt, \in, lsr #(8 * \idx) - 2]
    38		.else
    39		ldr		\out, [tt, \in, lsl #2]
    40		.endif
    41		.endm
    42	
    43		.macro		__hround, out0, out1, in0, in1, in2, in3, t3, t4, enc
    44		__select	\out0, \in0, 0
    45		__select	t0, \in1, 1
    46		__load		\out0, \out0, 0
    47		__load		t0, t0, 1
    48	
    49		.if		\enc
    50		__select	\out1, \in1, 0
    51		__select	t1, \in2, 1
    52		.else
    53		__select	\out1, \in3, 0
    54		__select	t1, \in0, 1
    55		.endif
    56		__load		\out1, \out1, 0
    57		__select	t2, \in2, 2
    58		__load		t1, t1, 1
    59		__load		t2, t2, 2
    60	
    61		eor		\out0, \out0, t0, ror #24
    62	
    63		__select	t0, \in3, 3
    64		.if		\enc
    65		__select	\t3, \in3, 2
    66		__select	\t4, \in0, 3
    67		.else
    68		__select	\t3, \in1, 2
    69		__select	\t4, \in2, 3
    70		.endif
    71		__load		\t3, \t3, 2
    72		__load		t0, t0, 3
    73		__load		\t4, \t4, 3
    74	
    75		eor		\out1, \out1, t1, ror #24
    76		eor		\out0, \out0, t2, ror #16
    77		ldm		rk!, {t1, t2}
    78		eor		\out1, \out1, \t3, ror #16
    79		eor		\out0, \out0, t0, ror #8
    80		eor		\out1, \out1, \t4, ror #8
    81		eor		\out0, \out0, t1
    82		eor		\out1, \out1, t2
    83		.endm
    84	
    85		.macro		fround, out0, out1, out2, out3, in0, in1, in2, in3
    86		__hround	\out0, \out1, \in0, \in1, \in2, \in3, \out2, \out3, 1
    87		__hround	\out2, \out3, \in2, \in3, \in0, \in1, \in1, \in2, 1
    88		.endm
    89	
    90		.macro		iround, out0, out1, out2, out3, in0, in1, in2, in3
    91		__hround	\out0, \out1, \in0, \in3, \in2, \in1, \out2, \out3, 0
    92		__hround	\out2, \out3, \in2, \in1, \in0, \in3, \in1, \in0, 0
    93		.endm
    94	
    95		.macro		__rev, out, in
    96		.if		__LINUX_ARM_ARCH__ < 6
    97		lsl		t0, \in, #24
    98		and		t1, \in, #0xff00
    99		and		t2, \in, #0xff0000
   100		orr		\out, t0, \in, lsr #24
   101		orr		\out, \out, t1, lsl #8
   102		orr		\out, \out, t2, lsr #8
   103		.else
   104		rev		\out, \in
   105		.endif
   106		.endm
   107	
   108		.macro		__adrl, out, sym, c
   109		.if		__LINUX_ARM_ARCH__ < 7
   110		ldr\c		\out, =\sym
   111		.else
   112		movw\c		\out, #:lower16:\sym
   113		movt\c		\out, #:upper16:\sym
   114		.endif
   115		.endm
   116	
   117		.macro		do_crypt, round, ttab, ltab
   118		push		{r3-r11, lr}
   119	
   120		ldr		r4, [in]
   121		ldr		r5, [in, #4]
   122		ldr		r6, [in, #8]
   123		ldr		r7, [in, #12]
   124	
   125		ldm		rk!, {r8-r11}
   126	
   127	#ifdef CONFIG_CPU_BIG_ENDIAN
   128		__rev		r4, r4
   129		__rev		r5, r5
   130		__rev		r6, r6
   131		__rev		r7, r7
   132	#endif
   133	
   134		eor		r4, r4, r8
   135		eor		r5, r5, r9
   136		eor		r6, r6, r10
   137		eor		r7, r7, r11
   138	
   139		__adrl		tt, \ttab
   140	
   141		tst		rounds, #2
   142		bne		1f
   143	
   144	0:	\round		r8, r9, r10, r11, r4, r5, r6, r7
   145		\round		r4, r5, r6, r7, r8, r9, r10, r11
   146	
   147	1:	subs		rounds, rounds, #4
   148		\round		r8, r9, r10, r11, r4, r5, r6, r7
   149		__adrl		tt, \ltab, ls
   150		\round		r4, r5, r6, r7, r8, r9, r10, r11
   151		bhi		0b
   152	
   153	#ifdef CONFIG_CPU_BIG_ENDIAN
   154		__rev		r4, r4
   155		__rev		r5, r5
   156		__rev		r6, r6
   157		__rev		r7, r7
   158	#endif
   159	
   160		ldr		out, [sp]
   161	
   162		str		r4, [out]
   163		str		r5, [out, #4]
   164		str		r6, [out, #8]
   165		str		r7, [out, #12]
   166	
   167		pop		{r3-r11, pc}
   168	
   169		.align		3
   170		.ltorg
   171		.endm
   172	
   173	ENTRY(__aes_arm_encrypt)
 > 174		do_crypt	fround, crypto_ft_tab, crypto_fl_tab
   175	ENDPROC(__aes_arm_encrypt)
   176	
   177	ENTRY(__aes_arm_decrypt)
 > 178		do_crypt	iround, crypto_it_tab, crypto_il_tab
   179	ENDPROC(__aes_arm_decrypt)

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 39910 bytes --]

^ permalink raw reply

* Re: x86-64: Maintain 16-byte stack alignment
From: Linus Torvalds @ 2017-01-12 19:51 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Andy Lutomirski, Herbert Xu, Linux Kernel Mailing List,
	Linux Crypto Mailing List, Ingo Molnar, Thomas Gleixner,
	Andy Lutomirski, Ard Biesheuvel
In-Reply-To: <20170112140215.rh247gwk55fjzmg7@treble>

[-- Attachment #1: Type: text/plain, Size: 3363 bytes --]

On Thu, Jan 12, 2017 at 6:02 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>
> Just to clarify, I think you're asking if, for versions of gcc which
> don't support -mpreferred-stack-boundary=3, objtool can analyze all C
> functions to ensure their stacks are 16-byte aligned.
>
> It's certainly possible, but I don't see how that solves the problem.
> The stack will still be misaligned by entry code.  Or am I missing
> something?

I think the argument is that we *could* try to align things, if we
just had some tool that actually then verified that we aren't missing
anything.

I'm not entirely happy with checking the generated code, though,
because as Ingo says, you have a 50:50 chance of just getting it right
by mistake. So I'd much rather have some static tool that checks
things at a code level (ie coccinelle or sparse).

Almost totally untested "sparse" patch appended. The problem with
sparse, obviously, is that few enough people run it, and it gives a
lot of other warnings. But maybe Herbert can test whether this would
actually have caught his situation, doing something like an
allmodconfig build with "C=2" to force a sparse run on everything, and
redirecting the warnings to stderr.

But this patch does seem to give a warning for the patch that Herbert
had, and that caused problems.

And in fact it seems to find a few other possible problems (most, but
not all, in crypto). This run was with the broken chacha20 patch
applied, to verify that I get a warning for that case:

   arch/x86/crypto/chacha20_glue.c:70:13: warning: symbol 'state' has
excessive alignment (16)
   arch/x86/crypto/aesni-intel_glue.c:724:12: warning: symbol 'iv' has
excessive alignment (16)
   arch/x86/crypto/aesni-intel_glue.c:803:12: warning: symbol 'iv' has
excessive alignment (16)
   crypto/shash.c:82:12: warning: symbol 'ubuf' has excessive alignment (16)
   crypto/shash.c:118:12: warning: symbol 'ubuf' has excessive alignment (16)
   drivers/char/hw_random/via-rng.c:89:14: warning: symbol 'buf' has
excessive alignment (16)
   net/bridge/netfilter/ebtables.c:1809:31: warning: symbol 'tinfo'
has excessive alignment (64)
   drivers/crypto/padlock-sha.c:85:14: warning: symbol 'buf' has
excessive alignment (16)
   drivers/crypto/padlock-sha.c:147:14: warning: symbol 'buf' has
excessive alignment (16)
   drivers/crypto/padlock-sha.c:304:12: warning: symbol 'buf' has
excessive alignment (16)
   drivers/crypto/padlock-sha.c:388:12: warning: symbol 'buf' has
excessive alignment (16)
   net/openvswitch/actions.c:797:33: warning: symbol 'ovs_rt' has
excessive alignment (64)
   drivers/net/ethernet/neterion/vxge/vxge-config.c:1006:38: warning:
symbol 'vpath' has excessive alignment (64)

although I think at least some of these happen to be ok.

There are a few places that clearly don't care about exact alignment,
and use "__attribute__((aligned))" without any specific alignment
value.

It's just sparse that thinks that implies 16-byte alignment (it
doesn't, really - it's unspecified, and is telling gcc to use "maximum
useful alignment", so who knows _what_ gcc will assume).

But some of them may well be real issues - if the alignment is about
correctness rather than anything else.

Anyway, the advantage of this kind of source-level check is that it
should really catch things regardless of "luck" wrt alignment.

                    Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/plain, Size: 887 bytes --]

 flow.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/flow.c b/flow.c
index 7db9548..c876869 100644
--- a/flow.c
+++ b/flow.c
@@ -601,6 +601,20 @@ static void simplify_one_symbol(struct entrypoint *ep, struct symbol *sym)
 	unsigned long mod;
 	int all, stores, complex;

+	/*
+	 * Warn about excessive local variable alignment.
+	 *
+	 * This needs to be linked up with some flag to enable
+	 * it, and specify the alignment. The 'max_int_alignment'
+	 * just happens to be what we want for the kernel for x86-64.
+	 */
+	mod = sym->ctype.modifiers;
+	if (!(mod & (MOD_NONLOCAL | MOD_STATIC))) {
+		unsigned int alignment = sym->ctype.alignment;
+		if (alignment > max_int_alignment)
+			warning(sym->pos, "symbol '%s' has excessive alignment (%u)", show_ident(sym->ident), alignment);
+	}
+
 	/* Never used as a symbol? */
 	pseudo = sym->pseudo;
 	if (!pseudo)

^ permalink raw reply related

* Re: x86-64: Maintain 16-byte stack alignment
From: Andy Lutomirski @ 2017-01-12 20:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Josh Poimboeuf, Herbert Xu, Linux Kernel Mailing List,
	Linux Crypto Mailing List, Ingo Molnar, Thomas Gleixner,
	Andy Lutomirski, Ard Biesheuvel
In-Reply-To: <CA+55aFxP+V6Wbq5Xw_NOksiWouEMg4gjBJgeGa-qFyxDMnTmcA@mail.gmail.com>

On Thu, Jan 12, 2017 at 11:51 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Jan 12, 2017 at 6:02 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>>
>> Just to clarify, I think you're asking if, for versions of gcc which
>> don't support -mpreferred-stack-boundary=3, objtool can analyze all C
>> functions to ensure their stacks are 16-byte aligned.
>>
>> It's certainly possible, but I don't see how that solves the problem.
>> The stack will still be misaligned by entry code.  Or am I missing
>> something?
>
> I think the argument is that we *could* try to align things, if we
> just had some tool that actually then verified that we aren't missing
> anything.
>
> I'm not entirely happy with checking the generated code, though,
> because as Ingo says, you have a 50:50 chance of just getting it right
> by mistake. So I'd much rather have some static tool that checks
> things at a code level (ie coccinelle or sparse).

What I meant was checking the entry code to see if it aligns stack
frames, and good luck getting sparse to do that.  Hmm, getting 16-byte
alignment for real may actually be entirely a lost cause.  After all,
I think we have some inline functions that do asm volatile ("call
..."), and I don't see any credible way of forcing alignment short of
generating an entirely new stack frame and aligning that.  Ick.  This
whole situation stinks, and I wish that the gcc developers had been
less daft here in the first place or that we'd noticed and gotten it
fixed much longer ago.

Can we come up with a macro like STACK_ALIGN_16 that turns into
__aligned__(32) on bad gcc versions and combine that with your sparse
patch?

^ permalink raw reply

* Re: x86-64: Maintain 16-byte stack alignment
From: Josh Poimboeuf @ 2017-01-12 20:15 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Herbert Xu, Linux Kernel Mailing List,
	Linux Crypto Mailing List, Ingo Molnar, Thomas Gleixner,
	Andy Lutomirski, Ard Biesheuvel
In-Reply-To: <CALCETrVz-wEFVUwrpS8-Ln9SWnsF5KxkqJC-Br6wJ+e0LGM9UA@mail.gmail.com>

On Thu, Jan 12, 2017 at 12:08:07PM -0800, Andy Lutomirski wrote:
> On Thu, Jan 12, 2017 at 11:51 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > On Thu, Jan 12, 2017 at 6:02 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >>
> >> Just to clarify, I think you're asking if, for versions of gcc which
> >> don't support -mpreferred-stack-boundary=3, objtool can analyze all C
> >> functions to ensure their stacks are 16-byte aligned.
> >>
> >> It's certainly possible, but I don't see how that solves the problem.
> >> The stack will still be misaligned by entry code.  Or am I missing
> >> something?
> >
> > I think the argument is that we *could* try to align things, if we
> > just had some tool that actually then verified that we aren't missing
> > anything.
> >
> > I'm not entirely happy with checking the generated code, though,
> > because as Ingo says, you have a 50:50 chance of just getting it right
> > by mistake. So I'd much rather have some static tool that checks
> > things at a code level (ie coccinelle or sparse).
> 
> What I meant was checking the entry code to see if it aligns stack
> frames, and good luck getting sparse to do that.  Hmm, getting 16-byte
> alignment for real may actually be entirely a lost cause.  After all,
> I think we have some inline functions that do asm volatile ("call
> ..."), and I don't see any credible way of forcing alignment short of
> generating an entirely new stack frame and aligning that.

Actually we already found all such cases and fixed them by forcing a new
stack frame, thanks to objtool.  For example, see 55a76b59b5fe.

> Ick.  This
> whole situation stinks, and I wish that the gcc developers had been
> less daft here in the first place or that we'd noticed and gotten it
> fixed much longer ago.
> 
> Can we come up with a macro like STACK_ALIGN_16 that turns into
> __aligned__(32) on bad gcc versions and combine that with your sparse
> patch?

-- 
Josh

^ permalink raw reply

* Re: [cryptodev:master 43/44] arch/arm/crypto/aes-cipher-core.S:21: Error: selected processor does not support `tt .req ip' in ARM mode
From: Ard Biesheuvel @ 2017-01-12 20:44 UTC (permalink / raw)
  To: kbuild test robot; +Cc: kbuild-all, linux-crypto@vger.kernel.org, Herbert Xu
In-Reply-To: <201701130326.19FpToYy%fengguang.wu@intel.com>

Hi Arnd,

On 12 January 2017 at 19:04, kbuild test robot <fengguang.wu@intel.com> wrote:
> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git master
> head:   1abee99eafab67fb1c98f9ecfc43cd5735384a86
> commit: 81edb42629758bacdf813dd5e4542ae26e3ad73a [43/44] crypto: arm/aes - replace scalar AES cipher
> config: arm-multi_v7_defconfig (attached as .config)
> compiler: arm-linux-gnueabi-gcc (Debian 6.1.1-9) 6.1.1 20160705
> reproduce:
>         wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         git checkout 81edb42629758bacdf813dd5e4542ae26e3ad73a
>         # save the attached .config to linux build tree
>         make.cross ARCH=arm
>
> All errors (new ones prefixed by >>):
>
>    arch/arm/crypto/aes-cipher-core.S: Assembler messages:
>>> arch/arm/crypto/aes-cipher-core.S:21: Error: selected processor does not support `tt .req ip' in ARM mode

Did you ever see this error? This is very odd: .req simply declares an
alias for a register name, and this works fine locally

>>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `movw tt,#:lower16:crypto_ft_tab'
>>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `movt tt,#:upper16:crypto_ft_tab'
>>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr r8,[tt,r8,lsl#2]'
>>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr t0,[tt,t0,lsl#2]'
>>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr r9,[tt,r9,lsl#2]'
>>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr t1,[tt,t1,lsl#2]'
>>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr t2,[tt,t2,lsl#2]'
>>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr r10,[tt,r10,lsl#2]'
>>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr t0,[tt,t0,lsl#2]'
>>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr r11,[tt,r11,lsl#2]'
>>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr r10,[tt,r10,lsl#2]'
>>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr t0,[tt,t0,lsl#2]'
>>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr r11,[tt,r11,lsl#2]'
>>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr t1,[tt,t1,lsl#2]'
>>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr t2,[tt,t2,lsl#2]'
>>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr r5,[tt,r5,lsl#2]'
>>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr t0,[tt,t0,lsl#2]'
>>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr r6,[tt,r6,lsl#2]'
>>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr r4,[tt,r4,lsl#2]'
>
> vim +21 arch/arm/crypto/aes-cipher-core.S
>
>     15          .align          5
>     16
>     17          rk              .req    r0
>     18          rounds          .req    r1
>     19          in              .req    r2
>     20          out             .req    r3
>   > 21          tt              .req    ip
>     22
>     23          t0              .req    lr
>     24          t1              .req    r2
>     25          t2              .req    r3
>     26
>     27          .macro          __select, out, in, idx
>     28          .if             __LINUX_ARM_ARCH__ < 7
>     29          and             \out, \in, #0xff << (8 * \idx)
>     30          .else
>     31          ubfx            \out, \in, #(8 * \idx), #8
>     32          .endif
>     33          .endm
>     34
>     35          .macro          __load, out, in, idx
>     36          .if             __LINUX_ARM_ARCH__ < 7 && \idx > 0
>     37          ldr             \out, [tt, \in, lsr #(8 * \idx) - 2]
>     38          .else
>     39          ldr             \out, [tt, \in, lsl #2]
>     40          .endif
>     41          .endm
>     42
>     43          .macro          __hround, out0, out1, in0, in1, in2, in3, t3, t4, enc
>     44          __select        \out0, \in0, 0
>     45          __select        t0, \in1, 1
>     46          __load          \out0, \out0, 0
>     47          __load          t0, t0, 1
>     48
>     49          .if             \enc
>     50          __select        \out1, \in1, 0
>     51          __select        t1, \in2, 1
>     52          .else
>     53          __select        \out1, \in3, 0
>     54          __select        t1, \in0, 1
>     55          .endif
>     56          __load          \out1, \out1, 0
>     57          __select        t2, \in2, 2
>     58          __load          t1, t1, 1
>     59          __load          t2, t2, 2
>     60
>     61          eor             \out0, \out0, t0, ror #24
>     62
>     63          __select        t0, \in3, 3
>     64          .if             \enc
>     65          __select        \t3, \in3, 2
>     66          __select        \t4, \in0, 3
>     67          .else
>     68          __select        \t3, \in1, 2
>     69          __select        \t4, \in2, 3
>     70          .endif
>     71          __load          \t3, \t3, 2
>     72          __load          t0, t0, 3
>     73          __load          \t4, \t4, 3
>     74
>     75          eor             \out1, \out1, t1, ror #24
>     76          eor             \out0, \out0, t2, ror #16
>     77          ldm             rk!, {t1, t2}
>     78          eor             \out1, \out1, \t3, ror #16
>     79          eor             \out0, \out0, t0, ror #8
>     80          eor             \out1, \out1, \t4, ror #8
>     81          eor             \out0, \out0, t1
>     82          eor             \out1, \out1, t2
>     83          .endm
>     84
>     85          .macro          fround, out0, out1, out2, out3, in0, in1, in2, in3
>     86          __hround        \out0, \out1, \in0, \in1, \in2, \in3, \out2, \out3, 1
>     87          __hround        \out2, \out3, \in2, \in3, \in0, \in1, \in1, \in2, 1
>     88          .endm
>     89
>     90          .macro          iround, out0, out1, out2, out3, in0, in1, in2, in3
>     91          __hround        \out0, \out1, \in0, \in3, \in2, \in1, \out2, \out3, 0
>     92          __hround        \out2, \out3, \in2, \in1, \in0, \in3, \in1, \in0, 0
>     93          .endm
>     94
>     95          .macro          __rev, out, in
>     96          .if             __LINUX_ARM_ARCH__ < 6
>     97          lsl             t0, \in, #24
>     98          and             t1, \in, #0xff00
>     99          and             t2, \in, #0xff0000
>    100          orr             \out, t0, \in, lsr #24
>    101          orr             \out, \out, t1, lsl #8
>    102          orr             \out, \out, t2, lsr #8
>    103          .else
>    104          rev             \out, \in
>    105          .endif
>    106          .endm
>    107
>    108          .macro          __adrl, out, sym, c
>    109          .if             __LINUX_ARM_ARCH__ < 7
>    110          ldr\c           \out, =\sym
>    111          .else
>    112          movw\c          \out, #:lower16:\sym
>    113          movt\c          \out, #:upper16:\sym
>    114          .endif
>    115          .endm
>    116
>    117          .macro          do_crypt, round, ttab, ltab
>    118          push            {r3-r11, lr}
>    119
>    120          ldr             r4, [in]
>    121          ldr             r5, [in, #4]
>    122          ldr             r6, [in, #8]
>    123          ldr             r7, [in, #12]
>    124
>    125          ldm             rk!, {r8-r11}
>    126
>    127  #ifdef CONFIG_CPU_BIG_ENDIAN
>    128          __rev           r4, r4
>    129          __rev           r5, r5
>    130          __rev           r6, r6
>    131          __rev           r7, r7
>    132  #endif
>    133
>    134          eor             r4, r4, r8
>    135          eor             r5, r5, r9
>    136          eor             r6, r6, r10
>    137          eor             r7, r7, r11
>    138
>    139          __adrl          tt, \ttab
>    140
>    141          tst             rounds, #2
>    142          bne             1f
>    143
>    144  0:      \round          r8, r9, r10, r11, r4, r5, r6, r7
>    145          \round          r4, r5, r6, r7, r8, r9, r10, r11
>    146
>    147  1:      subs            rounds, rounds, #4
>    148          \round          r8, r9, r10, r11, r4, r5, r6, r7
>    149          __adrl          tt, \ltab, ls
>    150          \round          r4, r5, r6, r7, r8, r9, r10, r11
>    151          bhi             0b
>    152
>    153  #ifdef CONFIG_CPU_BIG_ENDIAN
>    154          __rev           r4, r4
>    155          __rev           r5, r5
>    156          __rev           r6, r6
>    157          __rev           r7, r7
>    158  #endif
>    159
>    160          ldr             out, [sp]
>    161
>    162          str             r4, [out]
>    163          str             r5, [out, #4]
>    164          str             r6, [out, #8]
>    165          str             r7, [out, #12]
>    166
>    167          pop             {r3-r11, pc}
>    168
>    169          .align          3
>    170          .ltorg
>    171          .endm
>    172
>    173  ENTRY(__aes_arm_encrypt)
>  > 174          do_crypt        fround, crypto_ft_tab, crypto_fl_tab
>    175  ENDPROC(__aes_arm_encrypt)
>    176
>    177  ENTRY(__aes_arm_decrypt)
>  > 178          do_crypt        iround, crypto_it_tab, crypto_il_tab
>    179  ENDPROC(__aes_arm_decrypt)
>
> ---
> 0-DAY kernel test infrastructure                Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply

* Re: x86-64: Maintain 16-byte stack alignment
From: Josh Poimboeuf @ 2017-01-12 20:55 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Herbert Xu, Linux Kernel Mailing List,
	Linux Crypto Mailing List, Ingo Molnar, Thomas Gleixner,
	Andy Lutomirski, Ard Biesheuvel
In-Reply-To: <20170112201511.yj5ekqmj76r2yv6t@treble>

On Thu, Jan 12, 2017 at 02:15:11PM -0600, Josh Poimboeuf wrote:
> On Thu, Jan 12, 2017 at 12:08:07PM -0800, Andy Lutomirski wrote:
> > On Thu, Jan 12, 2017 at 11:51 AM, Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > > On Thu, Jan 12, 2017 at 6:02 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > >>
> > >> Just to clarify, I think you're asking if, for versions of gcc which
> > >> don't support -mpreferred-stack-boundary=3, objtool can analyze all C
> > >> functions to ensure their stacks are 16-byte aligned.
> > >>
> > >> It's certainly possible, but I don't see how that solves the problem.
> > >> The stack will still be misaligned by entry code.  Or am I missing
> > >> something?
> > >
> > > I think the argument is that we *could* try to align things, if we
> > > just had some tool that actually then verified that we aren't missing
> > > anything.
> > >
> > > I'm not entirely happy with checking the generated code, though,
> > > because as Ingo says, you have a 50:50 chance of just getting it right
> > > by mistake. So I'd much rather have some static tool that checks
> > > things at a code level (ie coccinelle or sparse).
> > 
> > What I meant was checking the entry code to see if it aligns stack
> > frames, and good luck getting sparse to do that.  Hmm, getting 16-byte
> > alignment for real may actually be entirely a lost cause.  After all,
> > I think we have some inline functions that do asm volatile ("call
> > ..."), and I don't see any credible way of forcing alignment short of
> > generating an entirely new stack frame and aligning that.
> 
> Actually we already found all such cases and fixed them by forcing a new
> stack frame, thanks to objtool.  For example, see 55a76b59b5fe.
> 
> > Ick.  This
> > whole situation stinks, and I wish that the gcc developers had been
> > less daft here in the first place or that we'd noticed and gotten it
> > fixed much longer ago.
> > 
> > Can we come up with a macro like STACK_ALIGN_16 that turns into
> > __aligned__(32) on bad gcc versions and combine that with your sparse
> > patch?

This could work.  Only concerns I'd have are:

- Are there (or will there be in the future) any asm functions which
  assume a 16-byte aligned stack?  (Seems unlikely.  Stack alignment is
  common in the crypto code but they do the alignment manually.)

- Who's going to run sparse all the time to catch unauthorized users of
  __aligned__(16)?

-- 
Josh

^ permalink raw reply

* Re: x86-64: Maintain 16-byte stack alignment
From: Linus Torvalds @ 2017-01-12 21:40 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Andy Lutomirski, Herbert Xu, Linux Kernel Mailing List,
	Linux Crypto Mailing List, Ingo Molnar, Thomas Gleixner,
	Andy Lutomirski, Ard Biesheuvel
In-Reply-To: <20170112205504.gb6z2w52mektyc73@treble>

On Thu, Jan 12, 2017 at 12:55 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>
> - Who's going to run sparse all the time to catch unauthorized users of
>   __aligned__(16)?

Well, considering that we apparently only have a small handful of
existing users without anybody having ever run any tool at all, I
don't think this is necessarily a huge problem.

One of the build servers could easily add the "make C=2" case to a
build test, and just grep the error reports for the 'excessive
alignment' string. The zero-day build bot already does much fancier
things.

So I don't think it would necessarily be all that hard to get a clean
build, and just say "if you need aligned stack space, you have to do
it yourself by hand".

That saId, if we now always enable frame pointers on x86 (and it has
gotten more and more difficult to avoid it), then the 16-byte
alignment would fairly natural.

The 8-byte alignment mainly makes sense when the basic call sequence
just adds 8 bytes, and you have functions without frames (that still
call other functions).

                       Linus

^ permalink raw reply

* [cryptodev:master 43/44] arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr tt,=crypto_ft_tab'
From: kbuild test robot @ 2017-01-12 23:28 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: kbuild-all, linux-crypto, Herbert Xu

[-- Attachment #1: Type: text/plain, Size: 8989 bytes --]

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git master
head:   1abee99eafab67fb1c98f9ecfc43cd5735384a86
commit: 81edb42629758bacdf813dd5e4542ae26e3ad73a [43/44] crypto: arm/aes - replace scalar AES cipher
config: arm-allmodconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        git checkout 81edb42629758bacdf813dd5e4542ae26e3ad73a
        # save the attached .config to linux build tree
        make.cross ARCH=arm 

All errors (new ones prefixed by >>):

   arch/arm/crypto/aes-cipher-core.S: Assembler messages:
   arch/arm/crypto/aes-cipher-core.S:21: Error: selected processor does not support `tt .req ip' in ARM mode
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr tt,=crypto_ft_tab'
   arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr r8,[tt,r8,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr t0,[tt,t0,lsr#(8*1)-2]'
   arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr r9,[tt,r9,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr t1,[tt,t1,lsr#(8*1)-2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr t2,[tt,t2,lsr#(8*2)-2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr r10,[tt,r10,lsr#(8*2)-2]'
   arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr t0,[tt,t0,lsr#(8*3)-2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr r11,[tt,r11,lsr#(8*3)-2]'
   arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr r10,[tt,r10,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr t0,[tt,t0,lsr#(8*1)-2]'
   arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr r11,[tt,r11,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr t1,[tt,t1,lsr#(8*1)-2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr t2,[tt,t2,lsr#(8*2)-2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr r5,[tt,r5,lsr#(8*2)-2]'
   arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr t0,[tt,t0,lsr#(8*3)-2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr r6,[tt,r6,lsr#(8*3)-2]'
   arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr r4,[tt,r4,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr t0,[tt,t0,lsr#(8*1)-2]'
   arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr r5,[tt,r5,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr t1,[tt,t1,lsr#(8*1)-2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr t2,[tt,t2,lsr#(8*2)-2]'
   arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr r6,[tt,r6,lsr#(8*2)-2]'
   arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr t0,[tt,t0,lsr#(8*3)-2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr r7,[tt,r7,lsr#(8*3)-2]'
   arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr r6,[tt,r6,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr t0,[tt,t0,lsr#(8*1)-2]'
   arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr r7,[tt,r7,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr t1,[tt,t1,lsr#(8*1)-2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr t2,[tt,t2,lsr#(8*2)-2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr r9,[tt,r9,lsr#(8*2)-2]'
   arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr t0,[tt,t0,lsr#(8*3)-2]'
   arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr r10,[tt,r10,lsr#(8*3)-2]'
   arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr r8,[tt,r8,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr t0,[tt,t0,lsr#(8*1)-2]'
   arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr r9,[tt,r9,lsl#2]'

vim +174 arch/arm/crypto/aes-cipher-core.S

    15		.align		5
    16	
    17		rk		.req	r0
    18		rounds		.req	r1
    19		in		.req	r2
    20		out		.req	r3
  > 21		tt		.req	ip
    22	
    23		t0		.req	lr
    24		t1		.req	r2
    25		t2		.req	r3
    26	
    27		.macro		__select, out, in, idx
    28		.if		__LINUX_ARM_ARCH__ < 7
    29		and		\out, \in, #0xff << (8 * \idx)
    30		.else
    31		ubfx		\out, \in, #(8 * \idx), #8
    32		.endif
    33		.endm
    34	
    35		.macro		__load, out, in, idx
    36		.if		__LINUX_ARM_ARCH__ < 7 && \idx > 0
    37		ldr		\out, [tt, \in, lsr #(8 * \idx) - 2]
    38		.else
    39		ldr		\out, [tt, \in, lsl #2]
    40		.endif
    41		.endm
    42	
    43		.macro		__hround, out0, out1, in0, in1, in2, in3, t3, t4, enc
    44		__select	\out0, \in0, 0
    45		__select	t0, \in1, 1
    46		__load		\out0, \out0, 0
    47		__load		t0, t0, 1
    48	
    49		.if		\enc
    50		__select	\out1, \in1, 0
    51		__select	t1, \in2, 1
    52		.else
    53		__select	\out1, \in3, 0
    54		__select	t1, \in0, 1
    55		.endif
    56		__load		\out1, \out1, 0
    57		__select	t2, \in2, 2
    58		__load		t1, t1, 1
    59		__load		t2, t2, 2
    60	
    61		eor		\out0, \out0, t0, ror #24
    62	
    63		__select	t0, \in3, 3
    64		.if		\enc
    65		__select	\t3, \in3, 2
    66		__select	\t4, \in0, 3
    67		.else
    68		__select	\t3, \in1, 2
    69		__select	\t4, \in2, 3
    70		.endif
    71		__load		\t3, \t3, 2
    72		__load		t0, t0, 3
    73		__load		\t4, \t4, 3
    74	
    75		eor		\out1, \out1, t1, ror #24
    76		eor		\out0, \out0, t2, ror #16
    77		ldm		rk!, {t1, t2}
    78		eor		\out1, \out1, \t3, ror #16
    79		eor		\out0, \out0, t0, ror #8
    80		eor		\out1, \out1, \t4, ror #8
    81		eor		\out0, \out0, t1
    82		eor		\out1, \out1, t2
    83		.endm
    84	
    85		.macro		fround, out0, out1, out2, out3, in0, in1, in2, in3
    86		__hround	\out0, \out1, \in0, \in1, \in2, \in3, \out2, \out3, 1
    87		__hround	\out2, \out3, \in2, \in3, \in0, \in1, \in1, \in2, 1
    88		.endm
    89	
    90		.macro		iround, out0, out1, out2, out3, in0, in1, in2, in3
    91		__hround	\out0, \out1, \in0, \in3, \in2, \in1, \out2, \out3, 0
    92		__hround	\out2, \out3, \in2, \in1, \in0, \in3, \in1, \in0, 0
    93		.endm
    94	
    95		.macro		__rev, out, in
    96		.if		__LINUX_ARM_ARCH__ < 6
    97		lsl		t0, \in, #24
    98		and		t1, \in, #0xff00
    99		and		t2, \in, #0xff0000
   100		orr		\out, t0, \in, lsr #24
   101		orr		\out, \out, t1, lsl #8
   102		orr		\out, \out, t2, lsr #8
   103		.else
   104		rev		\out, \in
   105		.endif
   106		.endm
   107	
   108		.macro		__adrl, out, sym, c
   109		.if		__LINUX_ARM_ARCH__ < 7
   110		ldr\c		\out, =\sym
   111		.else
   112		movw\c		\out, #:lower16:\sym
   113		movt\c		\out, #:upper16:\sym
   114		.endif
   115		.endm
   116	
   117		.macro		do_crypt, round, ttab, ltab
   118		push		{r3-r11, lr}
   119	
   120		ldr		r4, [in]
   121		ldr		r5, [in, #4]
   122		ldr		r6, [in, #8]
   123		ldr		r7, [in, #12]
   124	
   125		ldm		rk!, {r8-r11}
   126	
   127	#ifdef CONFIG_CPU_BIG_ENDIAN
   128		__rev		r4, r4
   129		__rev		r5, r5
   130		__rev		r6, r6
   131		__rev		r7, r7
   132	#endif
   133	
   134		eor		r4, r4, r8
   135		eor		r5, r5, r9
   136		eor		r6, r6, r10
   137		eor		r7, r7, r11
   138	
   139		__adrl		tt, \ttab
   140	
   141		tst		rounds, #2
   142		bne		1f
   143	
   144	0:	\round		r8, r9, r10, r11, r4, r5, r6, r7
   145		\round		r4, r5, r6, r7, r8, r9, r10, r11
   146	
   147	1:	subs		rounds, rounds, #4
   148		\round		r8, r9, r10, r11, r4, r5, r6, r7
   149		__adrl		tt, \ltab, ls
   150		\round		r4, r5, r6, r7, r8, r9, r10, r11
   151		bhi		0b
   152	
   153	#ifdef CONFIG_CPU_BIG_ENDIAN
   154		__rev		r4, r4
   155		__rev		r5, r5
   156		__rev		r6, r6
   157		__rev		r7, r7
   158	#endif
   159	
   160		ldr		out, [sp]
   161	
   162		str		r4, [out]
   163		str		r5, [out, #4]
   164		str		r6, [out, #8]
   165		str		r7, [out, #12]
   166	
   167		pop		{r3-r11, pc}
   168	
   169		.align		3
   170		.ltorg
   171		.endm
   172	
   173	ENTRY(__aes_arm_encrypt)
 > 174		do_crypt	fround, crypto_ft_tab, crypto_fl_tab
   175	ENDPROC(__aes_arm_encrypt)
   176	
   177	ENTRY(__aes_arm_decrypt)
 > 178		do_crypt	iround, crypto_it_tab, crypto_il_tab
   179	ENDPROC(__aes_arm_decrypt)

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 60405 bytes --]

^ permalink raw reply

* Re: x86-64: Maintain 16-byte stack alignment
From: Andy Lutomirski @ 2017-01-13  1:46 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Linus Torvalds, Herbert Xu, Linux Kernel Mailing List,
	Linux Crypto Mailing List, Ingo Molnar, Thomas Gleixner,
	Andy Lutomirski, Ard Biesheuvel
In-Reply-To: <20170112201511.yj5ekqmj76r2yv6t@treble>

On Thu, Jan 12, 2017 at 12:15 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Thu, Jan 12, 2017 at 12:08:07PM -0800, Andy Lutomirski wrote:
>> On Thu, Jan 12, 2017 at 11:51 AM, Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>> > On Thu, Jan 12, 2017 at 6:02 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>> >>
>> >> Just to clarify, I think you're asking if, for versions of gcc which
>> >> don't support -mpreferred-stack-boundary=3, objtool can analyze all C
>> >> functions to ensure their stacks are 16-byte aligned.
>> >>
>> >> It's certainly possible, but I don't see how that solves the problem.
>> >> The stack will still be misaligned by entry code.  Or am I missing
>> >> something?
>> >
>> > I think the argument is that we *could* try to align things, if we
>> > just had some tool that actually then verified that we aren't missing
>> > anything.
>> >
>> > I'm not entirely happy with checking the generated code, though,
>> > because as Ingo says, you have a 50:50 chance of just getting it right
>> > by mistake. So I'd much rather have some static tool that checks
>> > things at a code level (ie coccinelle or sparse).
>>
>> What I meant was checking the entry code to see if it aligns stack
>> frames, and good luck getting sparse to do that.  Hmm, getting 16-byte
>> alignment for real may actually be entirely a lost cause.  After all,
>> I think we have some inline functions that do asm volatile ("call
>> ..."), and I don't see any credible way of forcing alignment short of
>> generating an entirely new stack frame and aligning that.
>
> Actually we already found all such cases and fixed them by forcing a new
> stack frame, thanks to objtool.  For example, see 55a76b59b5fe.

What I mean is: what guarantees that the stack is properly aligned for
the subroutine call?  gcc promises to set up a stack frame, but does
it promise that rsp will be properly aligned to call a C function?

^ permalink raw reply

* RE: [PATCH v8 1/1] crypto: add virtio-crypto driver
From: Gonglei (Arei) @ 2017-01-13  1:56 UTC (permalink / raw)
  To: Michael S. Tsirkin, Christian Borntraeger
  Cc: linux-kernel@vger.kernel.org, qemu-devel@nongnu.org,
	virtio-dev@lists.oasis-open.org,
	virtualization@lists.linux-foundation.org,
	linux-crypto@vger.kernel.org, davem@davemloft.net,
	herbert@gondor.apana.org.au, Huangweidong (C), Claudio Fontana,
	Luonengjun, Hanweidong (Randy), Xuquan (Quan Xu),
	Wanzongshun (Vincent), stefanha@redhat.com,
	"Zhoujian (jay, Euler)" <
In-Reply-To: <20170112161729-mutt-send-email-mst@kernel.org>

> 
> On Thu, Jan 12, 2017 at 03:10:25PM +0100, Christian Borntraeger wrote:
> > On 01/10/2017 01:56 PM, Christian Borntraeger wrote:
> > > On 01/10/2017 01:36 PM, Gonglei (Arei) wrote:
> > >> Hi,
> > >>
> > >>>
> > >>> On 12/15/2016 03:03 AM, Gonglei wrote:
> > >>> [...]
> > >>>> +
> > >>>> +static struct crypto_alg virtio_crypto_algs[] = { {
> > >>>> +	.cra_name = "cbc(aes)",
> > >>>> +	.cra_driver_name = "virtio_crypto_aes_cbc",
> > >>>> +	.cra_priority = 501,
> > >>>
> > >>>
> > >>> This is still higher than the hardware-accelerators (like intel aesni or the
> > >>> s390 cpacf functions or the arm hw). aesni and s390/cpacf are supported
> by the
> > >>> hardware virtualization and available to the guests. I do not see a way
> how
> > >>> virtio
> > >>> crypto can be faster than that (in the end it might be cpacf/aesni +
> overhead)
> > >>> instead it will very likely be slower.
> > >>> So we should use a number that is higher than software implementations
> but
> > >>> lower than the hw ones.
> > >>>
> > >>> Just grepping around, the software ones seem be be around 100 and the
> > >>> hardware
> > >>> ones around 200-400. So why was 150 not enough?
> > >>>
> > >> I didn't find a documentation about how we use the priority, and I assumed
> > >> people use virtio-crypto will configure hardware accelerators in the
> > >> host. So I choosed the number which bigger than aesni's priority.
> > >
> > > Yes, but the aesni driver will only bind if there is HW support in the guest.
> > > And if aesni is available in the guest (or the s390 aes function from cpacf)
> > > it will always be faster than the same in the host via virtio.So your priority
> > > should be smaller.
> >
> >
> > any opinion on this?
> 
> Going forward, we might add an emulated aesni device and that might
> become slower than virtio. OTOH if or when this happens, we can solve it
> by adding a priority or a feature flag to virtio to raise its priority.
> 
> So I think I agree with Christian here, let's lower the priority.
> Gonglei, could you send a patch like this?
> 
OK, will do.

Thanks,
-Gonglei

^ permalink raw reply

* Re: x86-64: Maintain 16-byte stack alignment
From: Josh Poimboeuf @ 2017-01-13  3:11 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Herbert Xu, Linux Kernel Mailing List,
	Linux Crypto Mailing List, Ingo Molnar, Thomas Gleixner,
	Andy Lutomirski, Ard Biesheuvel
In-Reply-To: <CALCETrXom8aY2XhpAyOtAwQQYF7wftBHJE_px1xr0iRmcYEJoA@mail.gmail.com>

On Thu, Jan 12, 2017 at 05:46:55PM -0800, Andy Lutomirski wrote:
> On Thu, Jan 12, 2017 at 12:15 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > On Thu, Jan 12, 2017 at 12:08:07PM -0800, Andy Lutomirski wrote:
> >> On Thu, Jan 12, 2017 at 11:51 AM, Linus Torvalds
> >> <torvalds@linux-foundation.org> wrote:
> >> > On Thu, Jan 12, 2017 at 6:02 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >> >>
> >> >> Just to clarify, I think you're asking if, for versions of gcc which
> >> >> don't support -mpreferred-stack-boundary=3, objtool can analyze all C
> >> >> functions to ensure their stacks are 16-byte aligned.
> >> >>
> >> >> It's certainly possible, but I don't see how that solves the problem.
> >> >> The stack will still be misaligned by entry code.  Or am I missing
> >> >> something?
> >> >
> >> > I think the argument is that we *could* try to align things, if we
> >> > just had some tool that actually then verified that we aren't missing
> >> > anything.
> >> >
> >> > I'm not entirely happy with checking the generated code, though,
> >> > because as Ingo says, you have a 50:50 chance of just getting it right
> >> > by mistake. So I'd much rather have some static tool that checks
> >> > things at a code level (ie coccinelle or sparse).
> >>
> >> What I meant was checking the entry code to see if it aligns stack
> >> frames, and good luck getting sparse to do that.  Hmm, getting 16-byte
> >> alignment for real may actually be entirely a lost cause.  After all,
> >> I think we have some inline functions that do asm volatile ("call
> >> ..."), and I don't see any credible way of forcing alignment short of
> >> generating an entirely new stack frame and aligning that.
> >
> > Actually we already found all such cases and fixed them by forcing a new
> > stack frame, thanks to objtool.  For example, see 55a76b59b5fe.
> 
> What I mean is: what guarantees that the stack is properly aligned for
> the subroutine call?  gcc promises to set up a stack frame, but does
> it promise that rsp will be properly aligned to call a C function?

Yes, I did an experiment and you're right.  I had naively assumed that
all stack frames would be aligned.

-- 
Josh

^ permalink raw reply

* Re: [RFC PATCH 5/6] crypto: aesni-intel - Add bulk request support
From: Eric Biggers @ 2017-01-13  3:19 UTC (permalink / raw)
  To: Ondrej Mosnacek
  Cc: Herbert Xu, linux-crypto, dm-devel, Mike Snitzer, Milan Broz,
	Mikulas Patocka, Binoy Jayan
In-Reply-To: <c32a28630157c619ac2a7c851be586e72f193c68.1484215956.git.omosnacek@gmail.com>

On Thu, Jan 12, 2017 at 01:59:57PM +0100, Ondrej Mosnacek wrote:
> This patch implements bulk request handling in the AES-NI crypto drivers.
> The major advantage of this is that with bulk requests, the kernel_fpu_*
> functions (which are usually quite slow) are now called only once for the whole
> request.
> 

Hi Ondrej,

To what extent does the performance benefit of this patchset result from just
the reduced numbers of calls to kernel_fpu_begin() and kernel_fpu_end()?

If it's most of the benefit, would it make any sense to optimize
kernel_fpu_begin() and kernel_fpu_end() instead?

And if there are other examples besides kernel_fpu_begin/kernel_fpu_end where
the bulk API would provide a significant performance boost, can you mention
them?

Interestingly, the arm64 equivalent to kernel_fpu_begin()
(kernel_neon_begin_partial() in arch/arm64/kernel/fpsimd.c) appears to have an
optimization where the SIMD registers aren't saved if they were already saved.
I wonder why something similar isn't done on x86.

Eric

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox