Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed
* RE: [PATCH v3 2/3] PCI: Allow ATS to be always on for pre-CXL devices
From: Tian, Kevin @ 2026-03-31  8:24 UTC (permalink / raw)
  To: Nicolin Chen, jgg@nvidia.com, will@kernel.org,
	robin.murphy@arm.com, bhelgaas@google.com
  Cc: joro@8bytes.org, praan@google.com, baolu.lu@linux.intel.com,
	miko.lenczewski@arm.com, linux-arm-kernel@lists.infradead.org,
	iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
	linux-pci@vger.kernel.org, Williams, Dan J,
	jonathan.cameron@huawei.com, Vikram Sethi,
	linux-cxl@vger.kernel.org
In-Reply-To: <c715b10b49d50eea5429454108d4221c1a78efaf.1772833963.git.nicolinc@nvidia.com>

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Saturday, March 7, 2026 7:41 AM
> 
> Some NVIDIA GPU/NIC devices, although don't implement the CXL config
> space,
> they have many CXL-like properties. Call this kind "pre-CXL".
> 
> Similar to CXL.cache capaiblity, these pre-CXL devices also require the ATS

s/capaiblity/capability/

> function even when their RIDs are IOMMU bypassed, i.e. keep ATS "always
> on"
> v.s. "on demand" when a non-zero PASID line gets enabled in SVA use cases.
> 
> Introduce pci_dev_specific_ats_always_on() quirk function to scan a list of
> IDs for these device. Then, include it pci_ats_always_on().

"include it *in* pci_ats_always_on()"

> +
> +/* Some pre-CXL devices require ATS on the RID when it is IOMMU-
> bypassed */
> +bool pci_dev_specific_ats_always_on(struct pci_dev *pdev)

clearer to remove "on the RID ...". 

"always on" implies no condition required. and adding IOMMU bypass
info there is confusing.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>


^ permalink raw reply

* [PATCH v2 3/3] crypto: atmel-sha204a - fix non-blocking read logic
From: Lothar Rubusch @ 2026-03-31  8:21 UTC (permalink / raw)
  To: herbert, davem, nicolas.ferre, alexandre.belloni, claudiu.beznea,
	ardb, linusw
  Cc: linux-crypto, linux-arm-kernel, linux-kernel, l.rubusch
In-Reply-To: <20260331082105.697468-1-l.rubusch@gmail.com>

The non-blocking path was (also) failing to provide valid entropy
due to improper buffer management and a lack of hardware execution
time.

Ensure cmd.msecs (30ms) and cmd.rxsize (35ms) are initialized before
enqueuing the background work. Fix the data offset to skip the
1-byte hardware count header when copying bits to the caller. Correctly
return 0 (busy) to the hwrng core while hardware execution is in
progress, preventing zero-filled buffers, which was the situation
before.

With this fix applied, tests will look similar to this:
$ socat -u OPEN:/dev/hwrng,nonblock - | head -c 32 | hexdump -C
00000000  23 cc 42 3c 90 b1 38 fc  54 37 35 4b 09 c5 e1 0d  |#.B<..8.T75K....|
2026/03/23 14:30:18 socat[858] E read(5, 0x55be363000, 8192): Resource temporarily unavailable
00000010  73 3b af d9 02 70 76 bd  2d 59 4b 12 01 ac ae 2b  |s;...pv.-YK....+|
00000020

Fixes: da001fb651b0 ("crypto: atmel-i2c - add support for SHA204A random number generator")
Signed-off-by: Lothar Rubusch <l.rubusch@gmail.com>
---
 drivers/crypto/atmel-sha204a.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/crypto/atmel-sha204a.c b/drivers/crypto/atmel-sha204a.c
index 350ba8618c69..c0a1d34bbd9e 100644
--- a/drivers/crypto/atmel-sha204a.c
+++ b/drivers/crypto/atmel-sha204a.c
@@ -32,7 +32,6 @@ static void atmel_sha204a_rng_done(struct atmel_i2c_work_data *work_data,
 				     "i2c transaction failed (%d)\n",
 				     status);
 		kfree(work_data);
-		rng->priv = 0;
 		atomic_dec(&i2c_priv->tfm_count);
 		return;
 	}
@@ -49,20 +48,19 @@ static int atmel_sha204a_rng_read_nonblocking(struct hwrng *rng, void *data,
 
 	i2c_priv = container_of(rng, struct atmel_i2c_client_priv, hwrng);
 
-	/* Verify if data available from last run */
 	if (rng->priv) {
 		work_data = (struct atmel_i2c_work_data *)rng->priv;
-		max = min(sizeof(work_data->cmd.data), max);
-		memcpy(data, &work_data->cmd.data, max);
+		max = min_t(size_t, ATMEL_RNG_BLOCK_SIZE, max);
+		memcpy(data, &work_data->cmd.data[1], max);
 
-		/* Now, free memory */
+		/* Free memory and clear the in-flight flag */
 		kfree(work_data);
 		rng->priv = 0;
 		atomic_dec(&i2c_priv->tfm_count);
 		return max;
 	}
 
-	/* When a request is still in-flight but not processed */
+	/* If a request is still in-flight, return 0 (busy) */
 	if (atomic_read(&i2c_priv->tfm_count) > 0)
 		return 0;
 
@@ -76,8 +74,14 @@ static int atmel_sha204a_rng_read_nonblocking(struct hwrng *rng, void *data,
 	work_data->client = i2c_priv->client;
 
 	atmel_i2c_init_random_cmd(&work_data->cmd);
+
+	/* Set the execution time for the RNG command (from datasheet) */
+	work_data->cmd.msecs = ATMEL_RNG_EXEC_TIME;
+	work_data->cmd.rxsize = RANDOM_RSP_SIZE;
+
 	atmel_i2c_enqueue(work_data, atmel_sha204a_rng_done, rng);
 
+	/* Return 0 to indicate 'busy', data will be ready on next call */
 	return 0;
 }
 
-- 
2.39.5



^ permalink raw reply related

* [PATCH v2 2/3] crypto: atmel-sha204a - fix truncated 32-byte blocking read
From: Lothar Rubusch @ 2026-03-31  8:21 UTC (permalink / raw)
  To: herbert, davem, nicolas.ferre, alexandre.belloni, claudiu.beznea,
	ardb, linusw
  Cc: linux-crypto, linux-arm-kernel, linux-kernel, l.rubusch
In-Reply-To: <20260331082105.697468-1-l.rubusch@gmail.com>

The ATSHA204A returns a 35-byte packet consisting of a 1-byte count,
32 bytes of entropy, and a 2-byte CRC. The current blocking read
implementation was incorrectly copying data starting from the
count byte, leading to offset data and truncated entropy.

Additionally, the chip requires significant execution time to
generate random numbers, going by the datasheet. Reading the I2C bus
too early results in the chip NACK-ing or returning a partial buffer
followed by zeros.

Verification:
Tests before showed repeadetly reading only 8 bytes of entropy:
$ head -c 32 /dev/hwrng | hexdump -C
00000000  02 28 85 b3 47 40 f2 ee  00 00 00 00 00 00 00 00  |.(..G@..........|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000020

After this patch applied, the result will be as follows:
$ head -c 32 /dev/hwrng | hexdump -C
00000000  5a fc 3f 13 14 68 fe 06  68 0a bd 04 83 6e 09 69  |Z.?..h..h....n.i|
00000010  75 ff cf 87 10 84 3b c9  c1 df ae eb 45 53 4c c3  |u.....;.....ESL.|
00000020

Fix these issues by:
Increase cmd.msecs to 30ms to provide sufficient execution time. Then
set cmd.rxsize to RANDOM_RSP_SIZE (35 bytes) to capture the entire
hardware response. Eventually, correct the memcpy() offset to index 1 of
the data buffer to skip the count byte and retrieve exactly 32 bytes of
entropy.

Fixes: da001fb651b0 ("crypto: atmel-i2c - add support for SHA204A random number generator")
Signed-off-by: Lothar Rubusch <l.rubusch@gmail.com>
---
 drivers/crypto/atmel-sha204a.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/crypto/atmel-sha204a.c b/drivers/crypto/atmel-sha204a.c
index 1baf4750d311..350ba8618c69 100644
--- a/drivers/crypto/atmel-sha204a.c
+++ b/drivers/crypto/atmel-sha204a.c
@@ -18,6 +18,9 @@
 #include <linux/workqueue.h>
 #include "atmel-i2c.h"
 
+#define ATMEL_RNG_BLOCK_SIZE 32
+#define ATMEL_RNG_EXEC_TIME 30
+
 static void atmel_sha204a_rng_done(struct atmel_i2c_work_data *work_data,
 				   void *areq, int status)
 {
@@ -91,13 +94,15 @@ static int atmel_sha204a_rng_read(struct hwrng *rng, void *data, size_t max,
 	i2c_priv = container_of(rng, struct atmel_i2c_client_priv, hwrng);
 
 	atmel_i2c_init_random_cmd(&cmd);
+	cmd.msecs = ATMEL_RNG_EXEC_TIME;
+	cmd.rxsize = RANDOM_RSP_SIZE;
 
 	ret = atmel_i2c_send_receive(i2c_priv->client, &cmd);
 	if (ret)
 		return ret;
 
-	max = min(sizeof(cmd.data), max);
-	memcpy(data, cmd.data, max);
+	max = min_t(size_t, ATMEL_RNG_BLOCK_SIZE, max);
+	memcpy(data, &cmd.data[1], max);
 
 	return max;
 }
-- 
2.39.5



^ permalink raw reply related

* [PATCH v2 1/3] crypto: atmel-sha204a - fix memory leak at non-blocking RNG work_data
From: Lothar Rubusch @ 2026-03-31  8:21 UTC (permalink / raw)
  To: herbert, davem, nicolas.ferre, alexandre.belloni, claudiu.beznea,
	ardb, linusw
  Cc: linux-crypto, linux-arm-kernel, linux-kernel, l.rubusch
In-Reply-To: <20260331082105.697468-1-l.rubusch@gmail.com>

The driver allocated memory for work_data in the non-blocking read
path but never free'd it again. After first read-out the memory pointer
seemed to be recycled and never was allocated again, due to some errors
in the logic, so that the leak was not growing.

Add kfree(work_data) in the completion callback on error. then add
kfree(work_data) after the data is consumed in the subsequent read
call. Finally ensure atomic_dec() is called only after the data has
been consumed or an error occurred to prevent race conditions.

Fixes: da001fb651b0 ("crypto: atmel-i2c - add support for SHA204A random number generator")
Signed-off-by: Lothar Rubusch <l.rubusch@gmail.com>
---
 drivers/crypto/atmel-sha204a.c | 44 +++++++++++++++++++++-------------
 1 file changed, 27 insertions(+), 17 deletions(-)

diff --git a/drivers/crypto/atmel-sha204a.c b/drivers/crypto/atmel-sha204a.c
index 98d1023007e3..1baf4750d311 100644
--- a/drivers/crypto/atmel-sha204a.c
+++ b/drivers/crypto/atmel-sha204a.c
@@ -24,15 +24,20 @@ static void atmel_sha204a_rng_done(struct atmel_i2c_work_data *work_data,
 	struct atmel_i2c_client_priv *i2c_priv = work_data->ctx;
 	struct hwrng *rng = areq;
 
-	if (status)
+	if (status) {
 		dev_warn_ratelimited(&i2c_priv->client->dev,
 				     "i2c transaction failed (%d)\n",
 				     status);
+		kfree(work_data);
+		rng->priv = 0;
+		atomic_dec(&i2c_priv->tfm_count);
+		return;
+	}
 
 	rng->priv = (unsigned long)work_data;
-	atomic_dec(&i2c_priv->tfm_count);
 }
 
+
 static int atmel_sha204a_rng_read_nonblocking(struct hwrng *rng, void *data,
 					      size_t max)
 {
@@ -41,31 +46,36 @@ static int atmel_sha204a_rng_read_nonblocking(struct hwrng *rng, void *data,
 
 	i2c_priv = container_of(rng, struct atmel_i2c_client_priv, hwrng);
 
-	/* keep maximum 1 asynchronous read in flight at any time */
-	if (!atomic_add_unless(&i2c_priv->tfm_count, 1, 1))
-		return 0;
-
+	/* Verify if data available from last run */
 	if (rng->priv) {
 		work_data = (struct atmel_i2c_work_data *)rng->priv;
 		max = min(sizeof(work_data->cmd.data), max);
 		memcpy(data, &work_data->cmd.data, max);
-		rng->priv = 0;
-	} else {
-		work_data = kmalloc_obj(*work_data, GFP_ATOMIC);
-		if (!work_data) {
-			atomic_dec(&i2c_priv->tfm_count);
-			return -ENOMEM;
-		}
-		work_data->ctx = i2c_priv;
-		work_data->client = i2c_priv->client;
 
-		max = 0;
+		/* Now, free memory */
+		kfree(work_data);
+		rng->priv = 0;
+		atomic_dec(&i2c_priv->tfm_count);
+		return max;
 	}
 
+	/* When a request is still in-flight but not processed */
+	if (atomic_read(&i2c_priv->tfm_count) > 0)
+		return 0;
+
+	/* Start a new request */
+	work_data = kmalloc_obj(*work_data, GFP_ATOMIC);
+	if (!work_data)
+		return -ENOMEM;
+
+	atomic_inc(&i2c_priv->tfm_count);
+	work_data->ctx = i2c_priv;
+	work_data->client = i2c_priv->client;
+
 	atmel_i2c_init_random_cmd(&work_data->cmd);
 	atmel_i2c_enqueue(work_data, atmel_sha204a_rng_done, rng);
 
-	return max;
+	return 0;
 }
 
 static int atmel_sha204a_rng_read(struct hwrng *rng, void *data, size_t max,
-- 
2.39.5



^ permalink raw reply related

* [PATCH v2 0/3] crypto: atmel-sha204a - multiple RNG fixes
From: Lothar Rubusch @ 2026-03-31  8:21 UTC (permalink / raw)
  To: herbert, davem, nicolas.ferre, alexandre.belloni, claudiu.beznea,
	ardb, linusw
  Cc: linux-crypto, linux-arm-kernel, linux-kernel, l.rubusch

When testing the RNG functionality on the Atmel SHA204a hardware, I
found the following issues: rngtest reported failures and hexdump
reveiled only the first 8 bytes out of 32 provided actually entropy.

Having a closer look into it, I found a (small) memory leak, missing
to free work_data, miss-reading of the count field into the entropy
fields and parts of the 32 random bytes staying 0 due to reading the
slow i2c device.

The series proposes fixes and how fixed functionality can be/was
verified. Executing rngtest afterward showed a decent result, due
to the i2c bus a bit slow.

All setups require selecting the Atmel-sha204a as active RNG.
$ cat /sys/class/misc/hw_random/rng_available
    3f104000.rng 1-0064 none

$ echo 1-0064 > /sys/class/misc/hw_random/rng_current

$ cat /sys/class/misc/hw_random/rng_current
    1-0064

Testing RNG properties currently shows problematic results:
$ rngtest < /dev/hwrng
    rngtest 2.6
    Copyright (c) 2004 by Henrique de Moraes Holschuh
    This is free software; see the source for copying conditions.  There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

    rngtest: starting FIPS tests...
    rngtest: bits received from input: 1040032
    rngtest: FIPS 140-2 successes: 0
    rngtest: FIPS 140-2 failures: 52
    rngtest: FIPS 140-2(2001-10-10) Monobit: 52
    rngtest: FIPS 140-2(2001-10-10) Poker: 52
    rngtest: FIPS 140-2(2001-10-10) Runs: 52
    rngtest: FIPS 140-2(2001-10-10) Long run: 52
    rngtest: FIPS 140-2(2001-10-10) Continuous run: 52
    rngtest: input channel speed: (min=7.631; avg=7.804; max=7.827)Kibits/s
    rngtest: FIPS tests speed: (min=32.273; avg=32.701; max=33.056)Mibits/s
    rngtest: Program run time: 130177956 microseconds

Signed-off-by: Lothar Rubusch <l.rubusch@gmail.com>
---
v1 -> v2: Removal of C++ style comment (I saw it too late, sry for that)
---
Lothar Rubusch (3):
  crypto: atmel-sha204a - fix memory leak at non-blocking RNG work_data
  crypto: atmel-sha204a - fix truncated 32-byte blocking read
  crypto: atmel-sha204a - fix non-blocking read logic

 drivers/crypto/atmel-sha204a.c | 61 ++++++++++++++++++++++------------
 1 file changed, 40 insertions(+), 21 deletions(-)


base-commit: 5c52607c43c397b79a9852ce33fc61de58c3645c
-- 
2.39.5



^ permalink raw reply

* Re: [PATCH 5/5] lib/crc: arm: Enable arm64's NEON intrinsics implementation of crc64
From: Ard Biesheuvel @ 2026-03-31  8:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-crypto, linux-arm-kernel, Demian Shulhan, Eric Biggers
In-Reply-To: <actuDkpYbzLj0sI8@infradead.org>


On Tue, 31 Mar 2026, at 08:47, Christoph Hellwig wrote:
>>  	depends on CRC64 && CRC_OPTIMIZATIONS
>> +	default y if ARM && KERNEL_MODE_NEON && !(CPU_BIG_ENDIAN && CC_IS_CLANG)
>
> It would be useful to throw in a comment here why it is disabled for
> big-endian on clang.
>

Ack.

>> +#define crc64_be_arch crc64_be_generic
>> +
>> +static inline u64 crc64_nvme_arch(u64 crc, const u8 *p, size_t len)
>> +{
>> +	if (len >= 128 && static_branch_likely(&have_pmull) &&
>> +	    likely(may_use_simd())) {
>> +		do {
>> +			size_t chunk = min_t(size_t, len & ~15, SZ_4K);
>> +
>> +			scoped_ksimd()
>> +				crc = crc64_nvme_arm64_c(crc, p, chunk);
>> +
>> +			p += chunk;
>> +			len -= chunk;
>> +		} while (len >= 128);
>> +	}
>
> From reading the earlier patches, I'll assume arm SIMD code is
> non-preemptable and thus you want the chunking here?  Maybe add
> a little comment explaining that?

Indeed.


^ permalink raw reply

* RE: [PATCH v3 1/3] PCI: Allow ATS to be always on for CXL.cache capable devices
From: Tian, Kevin @ 2026-03-31  8:19 UTC (permalink / raw)
  To: Nicolin Chen, jgg@nvidia.com, will@kernel.org,
	robin.murphy@arm.com, bhelgaas@google.com
  Cc: joro@8bytes.org, praan@google.com, baolu.lu@linux.intel.com,
	miko.lenczewski@arm.com, linux-arm-kernel@lists.infradead.org,
	iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
	linux-pci@vger.kernel.org, Williams, Dan J,
	jonathan.cameron@huawei.com, Vikram Sethi,
	linux-cxl@vger.kernel.org
In-Reply-To: <a0dd3e4cc5260f55bbec5b3ed6791def33028735.1772833963.git.nicolinc@nvidia.com>

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Saturday, March 7, 2026 7:41 AM
> 
> Controlled by the IOMMU driver, ATS is usually enabled "on demand" when
> a
> device requests a translation service from its associated IOMMU HW running
> on the channel of a given PASID. This is working even when a device has no
> translation on its RID (i.e., the RID is IOMMU bypassed).

ATS is usually enabled "on demand" when a given PASID on the device
is attached to an I/O page table. Above sounds like there will be a software
action to enable ATS upon a device translation request.

> 
> However, certain PCIe devices require non-PASID ATS on their RID even
> when
> the RID is IOMMU bypassed. Call this "always on".
> 
> For instance, the CXL spec notes in "3.2.5.13 Memory Type on CXL.cache":
> "To source requests on CXL.cache, devices need to get the Host Physical
> Address (HPA) from the Host by means of an ATS request on CXL.io."
> 
> In other words, the CXL.cache capability requires ATS; otherwise, it can't
> access host physical memory.
> 
> Introduce a new pci_ats_always_on() helper for the IOMMU driver to scan a
> PCI device and shift ATS policies between "on demand" and "always on".
> 
> Add the support for CXL.cache devices first. Pre-CXL devices will be added
> in quirks.c file.
> 
> Note that pci_ats_always_on() validates against pci_ats_supported(), so we
> ensure that untrusted devices (e.g. external ports) will not be always on.
> This maintains the existing ATS security policy regarding potential side-
> channel attacks via ATS.
> 
> Cc: linux-cxl@vger.kernel.org
> Suggested-by: Vikram Sethi <vsethi@nvidia.com>
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

with a nit:

> +/*
> + * CXL r4.0, sec 3.2.5.13 Memory Type on CXL.cache notes: to source
> requests on
> + * CXL.cache, devices need to get the Host Physical Address (HPA) from the
> Host
> + * by means of an ATS request on CXL.io.
> + *
> + * In other world, CXL.cache devices cannot access host physical memory
> without
> + * ATS.
> + */

s/world/words/


^ permalink raw reply

* Re: [GIT PULL 6/7] arm64: tegra: Device tree changes for v7.1-rc1
From: Thierry Reding @ 2026-03-31  8:13 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: arm, soc, Thierry Reding, Jon Hunter, linux-tegra,
	linux-arm-kernel
In-Reply-To: <63b6c9da-4c0e-497c-a2a6-8aa5e74e2adb@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 1228 bytes --]

On Tue, Mar 31, 2026 at 09:59:07AM +0200, Krzysztof Kozlowski wrote:
> On 29/03/2026 17:10, Thierry Reding wrote:
> > From: Thierry Reding <thierry.reding@gmail.com>
> > 
> > Hi ARM SoC maintainers,
> > 
> > The following changes since commit 6de23f81a5e08be8fbf5e8d7e9febc72a5b5f27f:
> > 
> >   Linux 7.0-rc1 (2026-02-22 13:18:59 -0800)
> > 
> 
> I guess related to my question why patches were applied one day after
> the list:
> 
> Days in linux-next:
> ----------------------------------------
>  0 | ++++++++ (8)
> 
> Commits with 0 days in linux-next (8 of 8: 100.0%):
> ...
> 
> So you exposed soc tree to all sort of integration issues. No, please
> keep them for some days in the next before you send them to soc, to
> allow people to test and eventually complain/report issues.

Most issues would've been caught by daily bots already. A lot of these
probably were in linux-next but changed SHAs because I rebased them on
top of the PCI bindings patch to keep the shared branch as small as
possible.

I also do fairly extensive build testing on my side before sending those
pull requests, so I don't think I've exposed the SoC tree to an unfair
amount of integration issues.

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [PATCH v3] ARM: dts: aspeed: anacapa: Add eeprom device node for NFC adaptor board
From: Carl Lee via B4 Relay @ 2026-03-31  8:02 UTC (permalink / raw)
  To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Joel Stanley,
	Andrew Jeffery
  Cc: devicetree, linux-arm-kernel, linux-aspeed, linux-kernel,
	carl.lee, peter.shen, colin.huang2

From: Carl Lee <carl.lee@amd.com>

Add eeprom device node for NFC adaptor board FRU.

Signed-off-by: Carl Lee <carl.lee@amd.com>
---
Add eeprom device node to store FRU data for NFC adapter
board on Anacapa platform.
---
Changes in v3:
- Fix node ordering to follow ascending unit address
- Update commit message to match actual changes
- Link to v2: https://lore.kernel.org/r/20260309-arm-dts-aspeed-anacapa-add-eeprom-device-v2-1-91c7dde4b79d@amd.com

Changes in v2:
- Remove PRoT module eeprom commit since it is already included in another series under review.
- Only include NFC adapter board eeprom node.
- Link to v1: https://lore.kernel.org/r/20260309-arm-dts-aspeed-anacapa-add-eeprom-device-v1-0-45092310e0e6@amd.com
---
 arch/arm/boot/dts/aspeed/aspeed-bmc-facebook-anacapa.dts | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/arm/boot/dts/aspeed/aspeed-bmc-facebook-anacapa.dts b/arch/arm/boot/dts/aspeed/aspeed-bmc-facebook-anacapa.dts
index 2cb7bd128d24..57fd81e931d6 100644
--- a/arch/arm/boot/dts/aspeed/aspeed-bmc-facebook-anacapa.dts
+++ b/arch/arm/boot/dts/aspeed/aspeed-bmc-facebook-anacapa.dts
@@ -824,6 +824,11 @@ nfc@28 {
 
 				enable-gpios = <&sgpiom0 241 GPIO_ACTIVE_HIGH>;
 			};
+
+			eeprom@50 {
+				compatible = "atmel,24c128";
+				reg = <0x50>;
+			};
 		};
 	};
 };

---
base-commit: a0ae2a256046c0c5d3778d1a194ff2e171f16e5f
change-id: 20260309-arm-dts-aspeed-anacapa-add-eeprom-device-a1aabe06a35b

Best regards,
-- 
Carl Lee <carl.lee@amd.com>




^ permalink raw reply related

* Re: [GIT PULL 6/7] arm64: tegra: Device tree changes for v7.1-rc1
From: Krzysztof Kozlowski @ 2026-03-31  8:00 UTC (permalink / raw)
  To: Thierry Reding
  Cc: arm, soc, Thierry Reding, Jon Hunter, linux-tegra,
	linux-arm-kernel
In-Reply-To: <act77WcvwYedN0Q8@orome>

On 31/03/2026 09:53, Thierry Reding wrote:
> On Mon, Mar 30, 2026 at 01:45:24PM +0200, Krzysztof Kozlowski wrote:
>> On 29/03/2026 17:10, Thierry Reding wrote:
>>> From: Thierry Reding <thierry.reding@gmail.com>
>>>
>>> Hi ARM SoC maintainers,
>>>
>>> The following changes since commit 6de23f81a5e08be8fbf5e8d7e9febc72a5b5f27f:
>>>
>>>   Linux 7.0-rc1 (2026-02-22 13:18:59 -0800)
>>>
>>> are available in the Git repository at:
>>>
>>>   git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux.git tags/tegra-for-7.1-arm64-dt
>>>
>>> for you to fetch changes up to c70e6bc11d2008fbb19695394b69fd941ab39030:
>>>
>>>   arm64: tegra: Add Tegra264 GPIO controllers (2026-03-28 01:36:46 +0100)
>>>
>>> Thanks,
>>> Thierry
>>>
>>> ----------------------------------------------------------------
>>> arm64: tegra: Device tree changes for v7.1-rc1
>>>
>>> Various fixes and new additions across a number of devices. GPIO and PCI
>>> are enabled on Tegra264 and the Jetson AGX Thor Developer Kit, allowing
>>> it to boot via network and mass storage.
>>>
>>> ----------------------------------------------------------------
>>> Diogo Ivo (1):
>>>       arm64: tegra: smaug: Enable SPI-NOR flash
>>>
>>> Jon Hunter (1):
>>>       arm64: tegra: Fix RTC aliases
>>>
>>> Prathamesh Shete (1):
>>>       arm64: tegra: Add Tegra264 GPIO controllers
>>>
>>> Thierry Reding (6):
>>>       dt-bindings: pci: Document the NVIDIA Tegra264 PCIe controller
>>
>>
>> This is unreviewed/unacked binding where PCI maintainers had 1 day to
>> react to your v3.
> 
> Rob gave a reviewed-by on this about a week ago:
> 
>   https://lore.kernel.org/linux-tegra/177440189257.2451552.18196101830235626115.robh@kernel.org/

Rob, although knows a lot about PCI, is not a formally a PCI subsystem
maintainer.

> 
> In my experience the PCI maintainers typically defer review of the DT
> bindings to DT maintainers, so I considered Rob's R-b sufficient.

Sure and they acknowledge this, that review is done and patch can go
other way, with "Ack".

Where is the Ack?

> 
>>                   Maybe they had more time for previous versions, but
>> nevertheless it is also part of other patchset, so it will get into the
>> kernel other tree and nothing on v3 posting:
>> https://lore.kernel.org/all/20260326135855.2795149-4-thierry.reding@kernel.org/
>> gives hints that there will be cross tree merge.
> 
> Maybe look at the cover letter:
> 
>   https://lore.kernel.org/all/20260326135855.2795149-1-thierry.reding@kernel.org/
> 
> I clearly pointed out the build dependencies and suggested a shared
> branch to resolve them in both trees. Given that the bindings were

No problem, that's a valid solution. Can you point me with a lore link
to the shared branch posting (these tags/pull requests must be posted on
the lists)? Or to an ack from PCI maintainers?

The commit itself does not have an Ack, but maybe was just missed.

> reviewed by Rob and they are needed in both the subsystem tree
> (according to your own rules) as well as the DT tree (for validation),
> I included the bindings in the shared branch as well.



Best regards,
Krzysztof


^ permalink raw reply

* Re: [DMARC error]Re: [PATCH 0/2] Add PWM support Amlogic S7 S7D S6
From: Xianwei Zhao @ 2026-03-31  7:59 UTC (permalink / raw)
  To: George Stark, Martin Blumenstingl
  Cc: Uwe Kleine-König, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Heiner Kallweit, Neil Armstrong, Kevin Hilman,
	Jerome Brunet, linux-pwm, devicetree, linux-kernel,
	linux-arm-kernel, linux-amlogic, Junyi Zhao
In-Reply-To: <4a9c726a-d580-4b0b-9530-228b58389c80@salutedevices.com>

Hi George,

On 2026/3/31 15:33, George Stark wrote:
> Hello Martin, Xianwei
> 
> 
> On 3/31/26 10:10, Xianwei Zhao wrote:
>> Hi Martin,
>>      I confirmed with Junyi Zhao that the current implementation counts
>> from zero, so this submission is correct.
>> We agree this should be fixed and will address it in a follow-up patch.
>> Thanks for pointing it out.
>>
>> On 2026/3/31 05:54, Martin Blumenstingl wrote:
>>> Hi Xianwei Zhao,
>>>
>>> thanks for your contribution!
>>>
>>> On Thu, Mar 26, 2026 at 7:35 AM Xianwei Zhao via B4 Relay
>>> <devnull+xianwei.zhao.amlogic.com@kernel.org>  wrote:
>>>> Add bindings and driver support Amlogic S7/S7D/S6 SoCs.
>>> There is an old report that got lost, stating that the current
> 
> Xianwei Zhao thanks for the confirmation.
> I am the author of the old report and the corresponding patch and it's
> not lost. So if the patch is correct I'll be glad to add relevant
> tested-by tags.
> 

I will use your patch and won't send a separate one.
Do you mean I should add a Tested-by tag to your patch?

>>> pwm-meson driver has an off-by-one error with the hi and lo fields:
>>> [0]
>>> Since you are working on bringing up a new platform: is this something
>>> you can verify in your lab?
>>> To be clear: I'm not expecting you to work on this ad-hoc or bring a
>>> patch into this series. However, it would be great if you could verify
>>> if the findings from [0] are correct and send an updated patch in
>>> future.
>>>
>>> Thank you and best regards
>>> Martin 


^ permalink raw reply

* Re: [GIT PULL 6/7] arm64: tegra: Device tree changes for v7.1-rc1
From: Krzysztof Kozlowski @ 2026-03-31  7:59 UTC (permalink / raw)
  To: Thierry Reding, arm, soc
  Cc: Thierry Reding, Jon Hunter, linux-tegra, linux-arm-kernel
In-Reply-To: <20260329151045.1443133-6-thierry.reding@kernel.org>

On 29/03/2026 17:10, Thierry Reding wrote:
> From: Thierry Reding <thierry.reding@gmail.com>
> 
> Hi ARM SoC maintainers,
> 
> The following changes since commit 6de23f81a5e08be8fbf5e8d7e9febc72a5b5f27f:
> 
>   Linux 7.0-rc1 (2026-02-22 13:18:59 -0800)
> 

I guess related to my question why patches were applied one day after
the list:

Days in linux-next:
----------------------------------------
 0 | ++++++++ (8)

Commits with 0 days in linux-next (8 of 8: 100.0%):
...

So you exposed soc tree to all sort of integration issues. No, please
keep them for some days in the next before you send them to soc, to
allow people to test and eventually complain/report issues.


Best regards,
Krzysztof


^ permalink raw reply

* Re: [PATCH] iommu/rockchip: fix page table allocation flags for v2 IOMMU
From: Shawn Lin @ 2026-03-31  7:57 UTC (permalink / raw)
  To: Midgy BALON
  Cc: shawn.lin, joro, will, robin.murphy, heiko, jonas,
	linux-arm-kernel, linux-rockchip, linux-kernel, stable, iommu,
	Simon Xue
In-Reply-To: <20260331075010.1463-1-midgy971@gmail.com>

+ Simon

在 2026/03/31 星期二 15:50, Midgy BALON 写道:
> commit 2a7e6400f72b ("iommu: rockchip: Allocate tables from all
> available memory for IOMMU v2") removed GFP_DMA32 from
> iommu_data_ops_v2, reasoning that RK356x and RK3588 IOMMU v2 hardware
> supports up to 40-bit physical addresses for page tables.  However, the
> RK3568 IOMMU page-table walker uses a 32-bit AXI bus: it cannot access
> physical addresses above 4 GB regardless of the address encoding range.
> 
> On boards with more than 4 GB of RAM (e.g. 8 GB LPDDR4X), removing
> GFP_DMA32 causes two distinct failure modes:
> 
> 1. Direct allocation above 4 GB: iommu_alloc_pages_sz() may return
>     memory above 0x100000000.  The hardware page-table walker issues a
>     bus error trying to dereference those addresses, causing an IOMMU
>     fault on the first DMA transaction.
> 
> 2. SWIOTLB bounce-buffer poisoning: without GFP_DMA32, page tables land
>     above the SWIOTLB window.  dma_map_single() with DMA_BIT_MASK(32)
>     then bounces them into a buffer below 4 GB.  rk_dte_get_page_table()
>     returns phys_to_virt() of the bounce buffer address; PTEs are written
>     there; the next dma_sync_single_for_device(DMA_TO_DEVICE) copies the
>     original (zero) data back over the bounce buffer, silently erasing the
>     freshly written PTEs.  The IOMMU faults because every PTE reads as zero.
> 
> Restore GFP_DMA32 (and DMA_BIT_MASK(32)) for iommu_data_ops_v2, which
> currently only serves "rockchip,rk3568-iommu" in mainline.
> 
> Tested on Radxa ROCK 3B (RK3568, 8 GB LPDDR4X):
>    - MobileNetV1 via RKNN: 5.8 ms/inference (IOMMU mode)
>    - YOLOv5s 640x640 via RKNN: ~57 ms/inference (IOMMU mode)
>    - No IOMMU faults, correct inference results
> 
> Fixes: 2a7e6400f72b ("iommu: rockchip: Allocate tables from all available memory for IOMMU v2")
> Cc: stable@vger.kernel.org
> Cc: Jonas Karlman <jonas@kwiboo.se>
> Signed-off-by: Midgy BALON <midgy971@gmail.com>
> ---
>   drivers/iommu/rockchip-iommu.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
> index 85f3667e797..8b45db29471 100644
> --- a/drivers/iommu/rockchip-iommu.c
> +++ b/drivers/iommu/rockchip-iommu.c
> @@ -1358,8 +1358,8 @@ static struct rk_iommu_ops iommu_data_ops_v2 = {
>   	.pt_address = &rk_dte_pt_address_v2,
>   	.mk_dtentries = &rk_mk_dte_v2,
>   	.mk_ptentries = &rk_mk_pte_v2,
> -	.dma_bit_mask = DMA_BIT_MASK(40),
> -	.gfp_flags = 0,
> +	.dma_bit_mask = DMA_BIT_MASK(32),
> +	.gfp_flags = GFP_DMA32,
>   };
>   
>   static const struct of_device_id rk_iommu_dt_ids[] = {


^ permalink raw reply

* Re: [GIT PULL 6/7] arm64: tegra: Device tree changes for v7.1-rc1
From: Thierry Reding @ 2026-03-31  7:53 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: arm, soc, Thierry Reding, Jon Hunter, linux-tegra,
	linux-arm-kernel
In-Reply-To: <7b9bc5d1-7a1d-456c-b280-5f4dc969609d@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 2635 bytes --]

On Mon, Mar 30, 2026 at 01:45:24PM +0200, Krzysztof Kozlowski wrote:
> On 29/03/2026 17:10, Thierry Reding wrote:
> > From: Thierry Reding <thierry.reding@gmail.com>
> > 
> > Hi ARM SoC maintainers,
> > 
> > The following changes since commit 6de23f81a5e08be8fbf5e8d7e9febc72a5b5f27f:
> > 
> >   Linux 7.0-rc1 (2026-02-22 13:18:59 -0800)
> > 
> > are available in the Git repository at:
> > 
> >   git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux.git tags/tegra-for-7.1-arm64-dt
> > 
> > for you to fetch changes up to c70e6bc11d2008fbb19695394b69fd941ab39030:
> > 
> >   arm64: tegra: Add Tegra264 GPIO controllers (2026-03-28 01:36:46 +0100)
> > 
> > Thanks,
> > Thierry
> > 
> > ----------------------------------------------------------------
> > arm64: tegra: Device tree changes for v7.1-rc1
> > 
> > Various fixes and new additions across a number of devices. GPIO and PCI
> > are enabled on Tegra264 and the Jetson AGX Thor Developer Kit, allowing
> > it to boot via network and mass storage.
> > 
> > ----------------------------------------------------------------
> > Diogo Ivo (1):
> >       arm64: tegra: smaug: Enable SPI-NOR flash
> > 
> > Jon Hunter (1):
> >       arm64: tegra: Fix RTC aliases
> > 
> > Prathamesh Shete (1):
> >       arm64: tegra: Add Tegra264 GPIO controllers
> > 
> > Thierry Reding (6):
> >       dt-bindings: pci: Document the NVIDIA Tegra264 PCIe controller
> 
> 
> This is unreviewed/unacked binding where PCI maintainers had 1 day to
> react to your v3.

Rob gave a reviewed-by on this about a week ago:

  https://lore.kernel.org/linux-tegra/177440189257.2451552.18196101830235626115.robh@kernel.org/

In my experience the PCI maintainers typically defer review of the DT
bindings to DT maintainers, so I considered Rob's R-b sufficient.

>                   Maybe they had more time for previous versions, but
> nevertheless it is also part of other patchset, so it will get into the
> kernel other tree and nothing on v3 posting:
> https://lore.kernel.org/all/20260326135855.2795149-4-thierry.reding@kernel.org/
> gives hints that there will be cross tree merge.

Maybe look at the cover letter:

  https://lore.kernel.org/all/20260326135855.2795149-1-thierry.reding@kernel.org/

I clearly pointed out the build dependencies and suggested a shared
branch to resolve them in both trees. Given that the bindings were
reviewed by Rob and they are needed in both the subsystem tree
(according to your own rules) as well as the DT tree (for validation),
I included the bindings in the shared branch as well.

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [PATCH] iommu/rockchip: fix page table allocation flags for v2 IOMMU
From: Midgy BALON @ 2026-03-31  7:50 UTC (permalink / raw)
  To: iommu
  Cc: joro, will, robin.murphy, heiko, jonas, linux-arm-kernel,
	linux-rockchip, linux-kernel, stable, Midgy BALON

commit 2a7e6400f72b ("iommu: rockchip: Allocate tables from all
available memory for IOMMU v2") removed GFP_DMA32 from
iommu_data_ops_v2, reasoning that RK356x and RK3588 IOMMU v2 hardware
supports up to 40-bit physical addresses for page tables.  However, the
RK3568 IOMMU page-table walker uses a 32-bit AXI bus: it cannot access
physical addresses above 4 GB regardless of the address encoding range.

On boards with more than 4 GB of RAM (e.g. 8 GB LPDDR4X), removing
GFP_DMA32 causes two distinct failure modes:

1. Direct allocation above 4 GB: iommu_alloc_pages_sz() may return
   memory above 0x100000000.  The hardware page-table walker issues a
   bus error trying to dereference those addresses, causing an IOMMU
   fault on the first DMA transaction.

2. SWIOTLB bounce-buffer poisoning: without GFP_DMA32, page tables land
   above the SWIOTLB window.  dma_map_single() with DMA_BIT_MASK(32)
   then bounces them into a buffer below 4 GB.  rk_dte_get_page_table()
   returns phys_to_virt() of the bounce buffer address; PTEs are written
   there; the next dma_sync_single_for_device(DMA_TO_DEVICE) copies the
   original (zero) data back over the bounce buffer, silently erasing the
   freshly written PTEs.  The IOMMU faults because every PTE reads as zero.

Restore GFP_DMA32 (and DMA_BIT_MASK(32)) for iommu_data_ops_v2, which
currently only serves "rockchip,rk3568-iommu" in mainline.

Tested on Radxa ROCK 3B (RK3568, 8 GB LPDDR4X):
  - MobileNetV1 via RKNN: 5.8 ms/inference (IOMMU mode)
  - YOLOv5s 640x640 via RKNN: ~57 ms/inference (IOMMU mode)
  - No IOMMU faults, correct inference results

Fixes: 2a7e6400f72b ("iommu: rockchip: Allocate tables from all available memory for IOMMU v2")
Cc: stable@vger.kernel.org
Cc: Jonas Karlman <jonas@kwiboo.se>
Signed-off-by: Midgy BALON <midgy971@gmail.com>
---
 drivers/iommu/rockchip-iommu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
index 85f3667e797..8b45db29471 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -1358,8 +1358,8 @@ static struct rk_iommu_ops iommu_data_ops_v2 = {
 	.pt_address = &rk_dte_pt_address_v2,
 	.mk_dtentries = &rk_mk_dte_v2,
 	.mk_ptentries = &rk_mk_pte_v2,
-	.dma_bit_mask = DMA_BIT_MASK(40),
-	.gfp_flags = 0,
+	.dma_bit_mask = DMA_BIT_MASK(32),
+	.gfp_flags = GFP_DMA32,
 };
 
 static const struct of_device_id rk_iommu_dt_ids[] = {
-- 
2.30.2



^ permalink raw reply related

* [PATCH v2 4/5] xor/arm64: Use shared NEON intrinsics implementation from 32-bit ARM
From: Ard Biesheuvel @ 2026-03-31  7:49 UTC (permalink / raw)
  To: linux-raid
  Cc: linux-arm-kernel, linux-crypto, Ard Biesheuvel, Christoph Hellwig,
	Russell King, Arnd Bergmann, Eric Biggers
In-Reply-To: <20260331074940.55502-7-ardb+git@google.com>

From: Ard Biesheuvel <ardb@kernel.org>

Tweak the arm64 code so that the pure NEON intrinsics implementation of
XOR is shared between arm64 and ARM.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 lib/raid/xor/Makefile         |   3 +-
 lib/raid/xor/arm/xor-neon.c   |   4 +
 lib/raid/xor/arm64/xor-neon.c | 172 +-------------------
 3 files changed, 9 insertions(+), 170 deletions(-)

diff --git a/lib/raid/xor/Makefile b/lib/raid/xor/Makefile
index 4d633dfd5b90..b27bf5156784 100644
--- a/lib/raid/xor/Makefile
+++ b/lib/raid/xor/Makefile
@@ -19,7 +19,8 @@ xor-$(CONFIG_ARM)		+= arm/xor.o
 ifeq ($(CONFIG_ARM),y)
 xor-$(CONFIG_KERNEL_MODE_NEON)	+= arm/xor-neon.o arm/xor-neon-glue.o
 endif
-xor-$(CONFIG_ARM64)		+= arm64/xor-neon.o arm64/xor-neon-glue.o
+xor-$(CONFIG_ARM64)		+= arm/xor-neon.o arm64/xor-neon.o \
+				   arm64/xor-neon-glue.o
 xor-$(CONFIG_CPU_HAS_LSX)	+= loongarch/xor_simd.o
 xor-$(CONFIG_CPU_HAS_LSX)	+= loongarch/xor_simd_glue.o
 xor-$(CONFIG_ALTIVEC)		+= powerpc/xor_vmx.o powerpc/xor_vmx_glue.o
diff --git a/lib/raid/xor/arm/xor-neon.c b/lib/raid/xor/arm/xor-neon.c
index a3e2b4af8d36..c7c3cf634e23 100644
--- a/lib/raid/xor/arm/xor-neon.c
+++ b/lib/raid/xor/arm/xor-neon.c
@@ -173,3 +173,7 @@ static void __xor_neon_5(unsigned long bytes, unsigned long * __restrict p1,
 
 __DO_XOR_BLOCKS(neon_inner, __xor_neon_2, __xor_neon_3, __xor_neon_4,
 		__xor_neon_5);
+
+#ifdef CONFIG_ARM64
+extern typeof(__xor_neon_2) __xor_eor3_2 __alias(__xor_neon_2);
+#endif
diff --git a/lib/raid/xor/arm64/xor-neon.c b/lib/raid/xor/arm64/xor-neon.c
index 97ef3cb92496..e44016c363f1 100644
--- a/lib/raid/xor/arm64/xor-neon.c
+++ b/lib/raid/xor/arm64/xor-neon.c
@@ -1,8 +1,4 @@
 // SPDX-License-Identifier: GPL-2.0-only
-/*
- * Authors: Jackie Liu <liuyun01@kylinos.cn>
- * Copyright (C) 2018,Tianjin KYLIN Information Technology Co., Ltd.
- */
 
 #include <linux/cache.h>
 #include <asm/neon-intrinsics.h>
@@ -10,170 +6,8 @@
 #include "xor_arch.h"
 #include "xor-neon.h"
 
-static void __xor_neon_2(unsigned long bytes, unsigned long * __restrict p1,
-		const unsigned long * __restrict p2)
-{
-	uint64_t *dp1 = (uint64_t *)p1;
-	uint64_t *dp2 = (uint64_t *)p2;
-
-	register uint64x2_t v0, v1, v2, v3;
-	long lines = bytes / (sizeof(uint64x2_t) * 4);
-
-	do {
-		/* p1 ^= p2 */
-		v0 = veorq_u64(vld1q_u64(dp1 +  0), vld1q_u64(dp2 +  0));
-		v1 = veorq_u64(vld1q_u64(dp1 +  2), vld1q_u64(dp2 +  2));
-		v2 = veorq_u64(vld1q_u64(dp1 +  4), vld1q_u64(dp2 +  4));
-		v3 = veorq_u64(vld1q_u64(dp1 +  6), vld1q_u64(dp2 +  6));
-
-		/* store */
-		vst1q_u64(dp1 +  0, v0);
-		vst1q_u64(dp1 +  2, v1);
-		vst1q_u64(dp1 +  4, v2);
-		vst1q_u64(dp1 +  6, v3);
-
-		dp1 += 8;
-		dp2 += 8;
-	} while (--lines > 0);
-}
-
-static void __xor_neon_3(unsigned long bytes, unsigned long * __restrict p1,
-		const unsigned long * __restrict p2,
-		const unsigned long * __restrict p3)
-{
-	uint64_t *dp1 = (uint64_t *)p1;
-	uint64_t *dp2 = (uint64_t *)p2;
-	uint64_t *dp3 = (uint64_t *)p3;
-
-	register uint64x2_t v0, v1, v2, v3;
-	long lines = bytes / (sizeof(uint64x2_t) * 4);
-
-	do {
-		/* p1 ^= p2 */
-		v0 = veorq_u64(vld1q_u64(dp1 +  0), vld1q_u64(dp2 +  0));
-		v1 = veorq_u64(vld1q_u64(dp1 +  2), vld1q_u64(dp2 +  2));
-		v2 = veorq_u64(vld1q_u64(dp1 +  4), vld1q_u64(dp2 +  4));
-		v3 = veorq_u64(vld1q_u64(dp1 +  6), vld1q_u64(dp2 +  6));
-
-		/* p1 ^= p3 */
-		v0 = veorq_u64(v0, vld1q_u64(dp3 +  0));
-		v1 = veorq_u64(v1, vld1q_u64(dp3 +  2));
-		v2 = veorq_u64(v2, vld1q_u64(dp3 +  4));
-		v3 = veorq_u64(v3, vld1q_u64(dp3 +  6));
-
-		/* store */
-		vst1q_u64(dp1 +  0, v0);
-		vst1q_u64(dp1 +  2, v1);
-		vst1q_u64(dp1 +  4, v2);
-		vst1q_u64(dp1 +  6, v3);
-
-		dp1 += 8;
-		dp2 += 8;
-		dp3 += 8;
-	} while (--lines > 0);
-}
-
-static void __xor_neon_4(unsigned long bytes, unsigned long * __restrict p1,
-		const unsigned long * __restrict p2,
-		const unsigned long * __restrict p3,
-		const unsigned long * __restrict p4)
-{
-	uint64_t *dp1 = (uint64_t *)p1;
-	uint64_t *dp2 = (uint64_t *)p2;
-	uint64_t *dp3 = (uint64_t *)p3;
-	uint64_t *dp4 = (uint64_t *)p4;
-
-	register uint64x2_t v0, v1, v2, v3;
-	long lines = bytes / (sizeof(uint64x2_t) * 4);
-
-	do {
-		/* p1 ^= p2 */
-		v0 = veorq_u64(vld1q_u64(dp1 +  0), vld1q_u64(dp2 +  0));
-		v1 = veorq_u64(vld1q_u64(dp1 +  2), vld1q_u64(dp2 +  2));
-		v2 = veorq_u64(vld1q_u64(dp1 +  4), vld1q_u64(dp2 +  4));
-		v3 = veorq_u64(vld1q_u64(dp1 +  6), vld1q_u64(dp2 +  6));
-
-		/* p1 ^= p3 */
-		v0 = veorq_u64(v0, vld1q_u64(dp3 +  0));
-		v1 = veorq_u64(v1, vld1q_u64(dp3 +  2));
-		v2 = veorq_u64(v2, vld1q_u64(dp3 +  4));
-		v3 = veorq_u64(v3, vld1q_u64(dp3 +  6));
-
-		/* p1 ^= p4 */
-		v0 = veorq_u64(v0, vld1q_u64(dp4 +  0));
-		v1 = veorq_u64(v1, vld1q_u64(dp4 +  2));
-		v2 = veorq_u64(v2, vld1q_u64(dp4 +  4));
-		v3 = veorq_u64(v3, vld1q_u64(dp4 +  6));
-
-		/* store */
-		vst1q_u64(dp1 +  0, v0);
-		vst1q_u64(dp1 +  2, v1);
-		vst1q_u64(dp1 +  4, v2);
-		vst1q_u64(dp1 +  6, v3);
-
-		dp1 += 8;
-		dp2 += 8;
-		dp3 += 8;
-		dp4 += 8;
-	} while (--lines > 0);
-}
-
-static void __xor_neon_5(unsigned long bytes, unsigned long * __restrict p1,
-		const unsigned long * __restrict p2,
-		const unsigned long * __restrict p3,
-		const unsigned long * __restrict p4,
-		const unsigned long * __restrict p5)
-{
-	uint64_t *dp1 = (uint64_t *)p1;
-	uint64_t *dp2 = (uint64_t *)p2;
-	uint64_t *dp3 = (uint64_t *)p3;
-	uint64_t *dp4 = (uint64_t *)p4;
-	uint64_t *dp5 = (uint64_t *)p5;
-
-	register uint64x2_t v0, v1, v2, v3;
-	long lines = bytes / (sizeof(uint64x2_t) * 4);
-
-	do {
-		/* p1 ^= p2 */
-		v0 = veorq_u64(vld1q_u64(dp1 +  0), vld1q_u64(dp2 +  0));
-		v1 = veorq_u64(vld1q_u64(dp1 +  2), vld1q_u64(dp2 +  2));
-		v2 = veorq_u64(vld1q_u64(dp1 +  4), vld1q_u64(dp2 +  4));
-		v3 = veorq_u64(vld1q_u64(dp1 +  6), vld1q_u64(dp2 +  6));
-
-		/* p1 ^= p3 */
-		v0 = veorq_u64(v0, vld1q_u64(dp3 +  0));
-		v1 = veorq_u64(v1, vld1q_u64(dp3 +  2));
-		v2 = veorq_u64(v2, vld1q_u64(dp3 +  4));
-		v3 = veorq_u64(v3, vld1q_u64(dp3 +  6));
-
-		/* p1 ^= p4 */
-		v0 = veorq_u64(v0, vld1q_u64(dp4 +  0));
-		v1 = veorq_u64(v1, vld1q_u64(dp4 +  2));
-		v2 = veorq_u64(v2, vld1q_u64(dp4 +  4));
-		v3 = veorq_u64(v3, vld1q_u64(dp4 +  6));
-
-		/* p1 ^= p5 */
-		v0 = veorq_u64(v0, vld1q_u64(dp5 +  0));
-		v1 = veorq_u64(v1, vld1q_u64(dp5 +  2));
-		v2 = veorq_u64(v2, vld1q_u64(dp5 +  4));
-		v3 = veorq_u64(v3, vld1q_u64(dp5 +  6));
-
-		/* store */
-		vst1q_u64(dp1 +  0, v0);
-		vst1q_u64(dp1 +  2, v1);
-		vst1q_u64(dp1 +  4, v2);
-		vst1q_u64(dp1 +  6, v3);
-
-		dp1 += 8;
-		dp2 += 8;
-		dp3 += 8;
-		dp4 += 8;
-		dp5 += 8;
-	} while (--lines > 0);
-}
-
-__DO_XOR_BLOCKS(neon_inner, __xor_neon_2, __xor_neon_3, __xor_neon_4,
-		__xor_neon_5);
+extern void __xor_eor3_2(unsigned long bytes, unsigned long * __restrict p1,
+		const unsigned long * __restrict p2);
 
 static inline uint64x2_t eor3(uint64x2_t p, uint64x2_t q, uint64x2_t r)
 {
@@ -308,5 +142,5 @@ static void __xor_eor3_5(unsigned long bytes, unsigned long * __restrict p1,
 	} while (--lines > 0);
 }
 
-__DO_XOR_BLOCKS(eor3_inner, __xor_neon_2, __xor_eor3_3, __xor_eor3_4,
+__DO_XOR_BLOCKS(eor3_inner, __xor_eor3_2, __xor_eor3_3, __xor_eor3_4,
 		__xor_eor3_5);
-- 
2.53.0.1018.g2bb0e51243-goog



^ permalink raw reply related

* [PATCH v2 3/5] xor/arm: Replace vectorized implementation with arm64's intrinsics
From: Ard Biesheuvel @ 2026-03-31  7:49 UTC (permalink / raw)
  To: linux-raid
  Cc: linux-arm-kernel, linux-crypto, Ard Biesheuvel, Christoph Hellwig,
	Russell King, Arnd Bergmann, Eric Biggers
In-Reply-To: <20260331074940.55502-7-ardb+git@google.com>

From: Ard Biesheuvel <ardb@kernel.org>

Drop the XOR implementation generated by the vectorizer: this has always
been a bit of a hack, and now that arm64 has an intrinsics version that
works on ARM too, let's use that instead.

So copy the part of the arm64 code that can be shared (so not the EOR3
version). The arm64 code will be updated in a subsequent patch to share
this implementation.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 lib/raid/xor/arm/xor-neon.c | 183 ++++++++++++++++++--
 lib/raid/xor/arm/xor-neon.h |   7 +
 lib/raid/xor/arm/xor_arch.h |   7 +-
 lib/raid/xor/xor-8regs.c    |   2 -
 4 files changed, 174 insertions(+), 25 deletions(-)

diff --git a/lib/raid/xor/arm/xor-neon.c b/lib/raid/xor/arm/xor-neon.c
index 23147e3a7904..a3e2b4af8d36 100644
--- a/lib/raid/xor/arm/xor-neon.c
+++ b/lib/raid/xor/arm/xor-neon.c
@@ -1,26 +1,175 @@
 // SPDX-License-Identifier: GPL-2.0-only
 /*
- * Copyright (C) 2013 Linaro Ltd <ard.biesheuvel@linaro.org>
+ * Authors: Jackie Liu <liuyun01@kylinos.cn>
+ * Copyright (C) 2018,Tianjin KYLIN Information Technology Co., Ltd.
  */
 
 #include "xor_impl.h"
-#include "xor_arch.h"
+#include "xor-neon.h"
 
-#ifndef __ARM_NEON__
-#error You should compile this file with '-march=armv7-a -mfloat-abi=softfp -mfpu=neon'
-#endif
+#include <asm/neon-intrinsics.h>
 
-/*
- * Pull in the reference implementations while instructing GCC (through
- * -ftree-vectorize) to attempt to exploit implicit parallelism and emit
- * NEON instructions. Clang does this by default at O2 so no pragma is
- * needed.
- */
-#ifdef CONFIG_CC_IS_GCC
-#pragma GCC optimize "tree-vectorize"
-#endif
+static void __xor_neon_2(unsigned long bytes, unsigned long * __restrict p1,
+		const unsigned long * __restrict p2)
+{
+	uint64_t *dp1 = (uint64_t *)p1;
+	uint64_t *dp2 = (uint64_t *)p2;
+
+	register uint64x2_t v0, v1, v2, v3;
+	long lines = bytes / (sizeof(uint64x2_t) * 4);
+
+	do {
+		/* p1 ^= p2 */
+		v0 = veorq_u64(vld1q_u64(dp1 +  0), vld1q_u64(dp2 +  0));
+		v1 = veorq_u64(vld1q_u64(dp1 +  2), vld1q_u64(dp2 +  2));
+		v2 = veorq_u64(vld1q_u64(dp1 +  4), vld1q_u64(dp2 +  4));
+		v3 = veorq_u64(vld1q_u64(dp1 +  6), vld1q_u64(dp2 +  6));
+
+		/* store */
+		vst1q_u64(dp1 +  0, v0);
+		vst1q_u64(dp1 +  2, v1);
+		vst1q_u64(dp1 +  4, v2);
+		vst1q_u64(dp1 +  6, v3);
+
+		dp1 += 8;
+		dp2 += 8;
+	} while (--lines > 0);
+}
+
+static void __xor_neon_3(unsigned long bytes, unsigned long * __restrict p1,
+		const unsigned long * __restrict p2,
+		const unsigned long * __restrict p3)
+{
+	uint64_t *dp1 = (uint64_t *)p1;
+	uint64_t *dp2 = (uint64_t *)p2;
+	uint64_t *dp3 = (uint64_t *)p3;
+
+	register uint64x2_t v0, v1, v2, v3;
+	long lines = bytes / (sizeof(uint64x2_t) * 4);
+
+	do {
+		/* p1 ^= p2 */
+		v0 = veorq_u64(vld1q_u64(dp1 +  0), vld1q_u64(dp2 +  0));
+		v1 = veorq_u64(vld1q_u64(dp1 +  2), vld1q_u64(dp2 +  2));
+		v2 = veorq_u64(vld1q_u64(dp1 +  4), vld1q_u64(dp2 +  4));
+		v3 = veorq_u64(vld1q_u64(dp1 +  6), vld1q_u64(dp2 +  6));
+
+		/* p1 ^= p3 */
+		v0 = veorq_u64(v0, vld1q_u64(dp3 +  0));
+		v1 = veorq_u64(v1, vld1q_u64(dp3 +  2));
+		v2 = veorq_u64(v2, vld1q_u64(dp3 +  4));
+		v3 = veorq_u64(v3, vld1q_u64(dp3 +  6));
+
+		/* store */
+		vst1q_u64(dp1 +  0, v0);
+		vst1q_u64(dp1 +  2, v1);
+		vst1q_u64(dp1 +  4, v2);
+		vst1q_u64(dp1 +  6, v3);
+
+		dp1 += 8;
+		dp2 += 8;
+		dp3 += 8;
+	} while (--lines > 0);
+}
+
+static void __xor_neon_4(unsigned long bytes, unsigned long * __restrict p1,
+		const unsigned long * __restrict p2,
+		const unsigned long * __restrict p3,
+		const unsigned long * __restrict p4)
+{
+	uint64_t *dp1 = (uint64_t *)p1;
+	uint64_t *dp2 = (uint64_t *)p2;
+	uint64_t *dp3 = (uint64_t *)p3;
+	uint64_t *dp4 = (uint64_t *)p4;
+
+	register uint64x2_t v0, v1, v2, v3;
+	long lines = bytes / (sizeof(uint64x2_t) * 4);
+
+	do {
+		/* p1 ^= p2 */
+		v0 = veorq_u64(vld1q_u64(dp1 +  0), vld1q_u64(dp2 +  0));
+		v1 = veorq_u64(vld1q_u64(dp1 +  2), vld1q_u64(dp2 +  2));
+		v2 = veorq_u64(vld1q_u64(dp1 +  4), vld1q_u64(dp2 +  4));
+		v3 = veorq_u64(vld1q_u64(dp1 +  6), vld1q_u64(dp2 +  6));
+
+		/* p1 ^= p3 */
+		v0 = veorq_u64(v0, vld1q_u64(dp3 +  0));
+		v1 = veorq_u64(v1, vld1q_u64(dp3 +  2));
+		v2 = veorq_u64(v2, vld1q_u64(dp3 +  4));
+		v3 = veorq_u64(v3, vld1q_u64(dp3 +  6));
+
+		/* p1 ^= p4 */
+		v0 = veorq_u64(v0, vld1q_u64(dp4 +  0));
+		v1 = veorq_u64(v1, vld1q_u64(dp4 +  2));
+		v2 = veorq_u64(v2, vld1q_u64(dp4 +  4));
+		v3 = veorq_u64(v3, vld1q_u64(dp4 +  6));
+
+		/* store */
+		vst1q_u64(dp1 +  0, v0);
+		vst1q_u64(dp1 +  2, v1);
+		vst1q_u64(dp1 +  4, v2);
+		vst1q_u64(dp1 +  6, v3);
+
+		dp1 += 8;
+		dp2 += 8;
+		dp3 += 8;
+		dp4 += 8;
+	} while (--lines > 0);
+}
+
+static void __xor_neon_5(unsigned long bytes, unsigned long * __restrict p1,
+		const unsigned long * __restrict p2,
+		const unsigned long * __restrict p3,
+		const unsigned long * __restrict p4,
+		const unsigned long * __restrict p5)
+{
+	uint64_t *dp1 = (uint64_t *)p1;
+	uint64_t *dp2 = (uint64_t *)p2;
+	uint64_t *dp3 = (uint64_t *)p3;
+	uint64_t *dp4 = (uint64_t *)p4;
+	uint64_t *dp5 = (uint64_t *)p5;
+
+	register uint64x2_t v0, v1, v2, v3;
+	long lines = bytes / (sizeof(uint64x2_t) * 4);
+
+	do {
+		/* p1 ^= p2 */
+		v0 = veorq_u64(vld1q_u64(dp1 +  0), vld1q_u64(dp2 +  0));
+		v1 = veorq_u64(vld1q_u64(dp1 +  2), vld1q_u64(dp2 +  2));
+		v2 = veorq_u64(vld1q_u64(dp1 +  4), vld1q_u64(dp2 +  4));
+		v3 = veorq_u64(vld1q_u64(dp1 +  6), vld1q_u64(dp2 +  6));
+
+		/* p1 ^= p3 */
+		v0 = veorq_u64(v0, vld1q_u64(dp3 +  0));
+		v1 = veorq_u64(v1, vld1q_u64(dp3 +  2));
+		v2 = veorq_u64(v2, vld1q_u64(dp3 +  4));
+		v3 = veorq_u64(v3, vld1q_u64(dp3 +  6));
+
+		/* p1 ^= p4 */
+		v0 = veorq_u64(v0, vld1q_u64(dp4 +  0));
+		v1 = veorq_u64(v1, vld1q_u64(dp4 +  2));
+		v2 = veorq_u64(v2, vld1q_u64(dp4 +  4));
+		v3 = veorq_u64(v3, vld1q_u64(dp4 +  6));
+
+		/* p1 ^= p5 */
+		v0 = veorq_u64(v0, vld1q_u64(dp5 +  0));
+		v1 = veorq_u64(v1, vld1q_u64(dp5 +  2));
+		v2 = veorq_u64(v2, vld1q_u64(dp5 +  4));
+		v3 = veorq_u64(v3, vld1q_u64(dp5 +  6));
+
+		/* store */
+		vst1q_u64(dp1 +  0, v0);
+		vst1q_u64(dp1 +  2, v1);
+		vst1q_u64(dp1 +  4, v2);
+		vst1q_u64(dp1 +  6, v3);
 
-#define NO_TEMPLATE
-#include "../xor-8regs.c"
+		dp1 += 8;
+		dp2 += 8;
+		dp3 += 8;
+		dp4 += 8;
+		dp5 += 8;
+	} while (--lines > 0);
+}
 
-__DO_XOR_BLOCKS(neon_inner, xor_8regs_2, xor_8regs_3, xor_8regs_4, xor_8regs_5);
+__DO_XOR_BLOCKS(neon_inner, __xor_neon_2, __xor_neon_3, __xor_neon_4,
+		__xor_neon_5);
diff --git a/lib/raid/xor/arm/xor-neon.h b/lib/raid/xor/arm/xor-neon.h
new file mode 100644
index 000000000000..406e0356f05b
--- /dev/null
+++ b/lib/raid/xor/arm/xor-neon.h
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+extern struct xor_block_template xor_block_arm4regs;
+extern struct xor_block_template xor_block_neon;
+
+void xor_gen_neon_inner(void *dest, void **srcs, unsigned int src_cnt,
+		unsigned int bytes);
diff --git a/lib/raid/xor/arm/xor_arch.h b/lib/raid/xor/arm/xor_arch.h
index 775ff835df65..f1ddb64fe62a 100644
--- a/lib/raid/xor/arm/xor_arch.h
+++ b/lib/raid/xor/arm/xor_arch.h
@@ -3,12 +3,7 @@
  *  Copyright (C) 2001 Russell King
  */
 #include <asm/neon.h>
-
-extern struct xor_block_template xor_block_arm4regs;
-extern struct xor_block_template xor_block_neon;
-
-void xor_gen_neon_inner(void *dest, void **srcs, unsigned int src_cnt,
-		unsigned int bytes);
+#include "xor-neon.h"
 
 static __always_inline void __init arch_xor_init(void)
 {
diff --git a/lib/raid/xor/xor-8regs.c b/lib/raid/xor/xor-8regs.c
index 1edaed8acffe..46b3c8bdc27f 100644
--- a/lib/raid/xor/xor-8regs.c
+++ b/lib/raid/xor/xor-8regs.c
@@ -93,11 +93,9 @@ xor_8regs_5(unsigned long bytes, unsigned long * __restrict p1,
 	} while (--lines > 0);
 }
 
-#ifndef NO_TEMPLATE
 DO_XOR_BLOCKS(8regs, xor_8regs_2, xor_8regs_3, xor_8regs_4, xor_8regs_5);
 
 struct xor_block_template xor_block_8regs = {
 	.name		= "8regs",
 	.xor_gen	= xor_gen_8regs,
 };
-#endif /* NO_TEMPLATE */
-- 
2.53.0.1018.g2bb0e51243-goog



^ permalink raw reply related

* [PATCH v2 5/5] ARM: Remove hacked-up asm/types.h header
From: Ard Biesheuvel @ 2026-03-31  7:49 UTC (permalink / raw)
  To: linux-raid
  Cc: linux-arm-kernel, linux-crypto, Ard Biesheuvel, Christoph Hellwig,
	Russell King, Arnd Bergmann, Eric Biggers
In-Reply-To: <20260331074940.55502-7-ardb+git@google.com>

From: Ard Biesheuvel <ardb@kernel.org>

ARM has a special version of asm/types.h which contains overrides for
certain #define's related to the C types used to back C99 types such as
uint32_t and uintptr_t.

This is only needed when pulling in system headers such as stdint.h
during the build, and this only happens when using NEON intrinsics,
for which there is now a dedicated header file.

So drop this header entirely, and revert to the asm-generic one.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm/include/uapi/asm/types.h | 41 --------------------
 1 file changed, 41 deletions(-)

diff --git a/arch/arm/include/uapi/asm/types.h b/arch/arm/include/uapi/asm/types.h
deleted file mode 100644
index 1a667bc26510..000000000000
--- a/arch/arm/include/uapi/asm/types.h
+++ /dev/null
@@ -1,41 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
-#ifndef _UAPI_ASM_TYPES_H
-#define _UAPI_ASM_TYPES_H
-
-#include <asm-generic/int-ll64.h>
-
-/*
- * The C99 types uintXX_t that are usually defined in 'stdint.h' are not as
- * unambiguous on ARM as you would expect. For the types below, there is a
- * difference on ARM between GCC built for bare metal ARM, GCC built for glibc
- * and the kernel itself, which results in build errors if you try to build with
- * -ffreestanding and include 'stdint.h' (such as when you include 'arm_neon.h'
- * in order to use NEON intrinsics)
- *
- * As the typedefs for these types in 'stdint.h' are based on builtin defines
- * supplied by GCC, we can tweak these to align with the kernel's idea of those
- * types, so 'linux/types.h' and 'stdint.h' can be safely included from the same
- * source file (provided that -ffreestanding is used).
- *
- *                    int32_t         uint32_t               uintptr_t
- * bare metal GCC     long            unsigned long          unsigned int
- * glibc GCC          int             unsigned int           unsigned int
- * kernel             int             unsigned int           unsigned long
- */
-
-#ifdef __INT32_TYPE__
-#undef __INT32_TYPE__
-#define __INT32_TYPE__		int
-#endif
-
-#ifdef __UINT32_TYPE__
-#undef __UINT32_TYPE__
-#define __UINT32_TYPE__	unsigned int
-#endif
-
-#ifdef __UINTPTR_TYPE__
-#undef __UINTPTR_TYPE__
-#define __UINTPTR_TYPE__	unsigned long
-#endif
-
-#endif /* _UAPI_ASM_TYPES_H */
-- 
2.53.0.1018.g2bb0e51243-goog



^ permalink raw reply related

* [PATCH v2 2/5] crypto: aegis128 - Use neon-intrinsics.h on ARM too
From: Ard Biesheuvel @ 2026-03-31  7:49 UTC (permalink / raw)
  To: linux-raid
  Cc: linux-arm-kernel, linux-crypto, Ard Biesheuvel, Christoph Hellwig,
	Russell King, Arnd Bergmann, Eric Biggers
In-Reply-To: <20260331074940.55502-7-ardb+git@google.com>

From: Ard Biesheuvel <ardb@kernel.org>

Use the asm/neon-intrinsics.h header on ARM as well as arm64, so that
the calling code does not have to know the difference.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 crypto/aegis128-neon-inner.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/crypto/aegis128-neon-inner.c b/crypto/aegis128-neon-inner.c
index b6a52a386b22..56b534eeb680 100644
--- a/crypto/aegis128-neon-inner.c
+++ b/crypto/aegis128-neon-inner.c
@@ -3,13 +3,11 @@
  * Copyright (C) 2019 Linaro, Ltd. <ard.biesheuvel@linaro.org>
  */
 
-#ifdef CONFIG_ARM64
 #include <asm/neon-intrinsics.h>
 
+#ifdef CONFIG_ARM64
 #define AES_ROUND	"aese %0.16b, %1.16b \n\t aesmc %0.16b, %0.16b"
 #else
-#include <arm_neon.h>
-
 #define AES_ROUND	"aese.8 %q0, %q1 \n\t aesmc.8 %q0, %q0"
 #endif
 
-- 
2.53.0.1018.g2bb0e51243-goog



^ permalink raw reply related

* [PATCH v2 1/5] ARM: Add a neon-intrinsics.h header like on arm64
From: Ard Biesheuvel @ 2026-03-31  7:49 UTC (permalink / raw)
  To: linux-raid
  Cc: linux-arm-kernel, linux-crypto, Ard Biesheuvel, Christoph Hellwig,
	Russell King, Arnd Bergmann, Eric Biggers
In-Reply-To: <20260331074940.55502-7-ardb+git@google.com>

From: Ard Biesheuvel <ardb@kernel.org>

Add a header asm/neon-intrinsics.h similar to the one that arm64 has.
This makes it possible for NEON intrinsics code to be shared seamlessly
between ARM and arm64.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 Documentation/arch/arm/kernel_mode_neon.rst |  4 +-
 arch/arm/include/asm/neon-intrinsics.h      | 64 ++++++++++++++++++++
 2 files changed, 67 insertions(+), 1 deletion(-)

diff --git a/Documentation/arch/arm/kernel_mode_neon.rst b/Documentation/arch/arm/kernel_mode_neon.rst
index 9bfb71a2a9b9..1efb6d35b7bd 100644
--- a/Documentation/arch/arm/kernel_mode_neon.rst
+++ b/Documentation/arch/arm/kernel_mode_neon.rst
@@ -121,4 +121,6 @@ observe the following in addition to the rules above:
 * Compile the unit containing the NEON intrinsics with '-ffreestanding' so GCC
   uses its builtin version of <stdint.h> (this is a C99 header which the kernel
   does not supply);
-* Include <arm_neon.h> last, or at least after <linux/types.h>
+* Do not include <arm_neon.h> directly: instead, include <asm/neon-intrinsics.h>,
+  which tweaks some macro definitions so that system headers can be included
+  safely.
diff --git a/arch/arm/include/asm/neon-intrinsics.h b/arch/arm/include/asm/neon-intrinsics.h
new file mode 100644
index 000000000000..3fe0b5ab9659
--- /dev/null
+++ b/arch/arm/include/asm/neon-intrinsics.h
@@ -0,0 +1,64 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef __ASM_NEON_INTRINSICS_H
+#define __ASM_NEON_INTRINSICS_H
+
+#ifndef __ARM_NEON__
+#error You should compile this file with '-march=armv7-a -mfloat-abi=softfp -mfpu=neon'
+#endif
+
+#include <asm-generic/int-ll64.h>
+
+/*
+ * The C99 types uintXX_t that are usually defined in 'stdint.h' are not as
+ * unambiguous on ARM as you would expect. For the types below, there is a
+ * difference on ARM between GCC built for bare metal ARM, GCC built for glibc
+ * and the kernel itself, which results in build errors if you try to build
+ * with -ffreestanding and include 'stdint.h' (such as when you include
+ * 'arm_neon.h' in order to use NEON intrinsics)
+ *
+ * As the typedefs for these types in 'stdint.h' are based on builtin defines
+ * supplied by GCC, we can tweak these to align with the kernel's idea of those
+ * types, so 'linux/types.h' and 'stdint.h' can be safely included from the
+ * same source file (provided that -ffreestanding is used).
+ *
+ *                    int32_t     uint32_t          intptr_t     uintptr_t
+ * bare metal GCC     long        unsigned long     int          unsigned int
+ * glibc GCC          int         unsigned int      int          unsigned int
+ * kernel             int         unsigned int      long         unsigned long
+ */
+
+#ifdef __INT32_TYPE__
+#undef __INT32_TYPE__
+#define __INT32_TYPE__		int
+#endif
+
+#ifdef __UINT32_TYPE__
+#undef __UINT32_TYPE__
+#define __UINT32_TYPE__		unsigned int
+#endif
+
+#ifdef __INTPTR_TYPE__
+#undef __INTPTR_TYPE__
+#define __INTPTR_TYPE__		long
+#endif
+
+#ifdef __UINTPTR_TYPE__
+#undef __UINTPTR_TYPE__
+#define __UINTPTR_TYPE__	unsigned long
+#endif
+
+/*
+ * genksyms chokes on the ARM NEON instrinsics system header, but we
+ * don't export anything it defines anyway, so just disregard when
+ * genksyms execute.
+ */
+#ifndef __GENKSYMS__
+#include <arm_neon.h>
+#endif
+
+#ifdef CONFIG_CC_IS_CLANG
+#pragma clang diagnostic ignored "-Wincompatible-pointer-types"
+#endif
+
+#endif /* __ASM_NEON_INTRINSICS_H */
-- 
2.53.0.1018.g2bb0e51243-goog



^ permalink raw reply related

* [PATCH v2 0/5] xor/arm: Replace vectorized version with intrinsics
From: Ard Biesheuvel @ 2026-03-31  7:49 UTC (permalink / raw)
  To: linux-raid
  Cc: linux-arm-kernel, linux-crypto, Ard Biesheuvel, Christoph Hellwig,
	Russell King, Arnd Bergmann, Eric Biggers

From: Ard Biesheuvel <ardb@kernel.org>

Replace the compiler vectorized XOR implementation for ARM with the
existing NEON intrinsics implementation used by arm64. This is slightly
faster, and allows some minor cleanups of the type hacks in the headers
now that intrinsics are the only C code permitted to use FP/SIMD
instructions.

Changes since v1:
- Update kernel_mode_neon.rst to state that arm_neon.h must not be
  included directly, but the new asm/neon-intrinsics.h should be used
  instead
- Avoid #include's of .c files - instead, build arm/xor-neon.c for arm64
  as a separate compilation unit, and export the symbol that is shared
  between the EOR and EOR3 implementations.

Performance (QEMU mach-virt VM running on Synquacer [Cortex-A53 @ 1 GHz]

Before:

[    3.519687] xor: measuring software checksum speed
[    3.521725]    neon            :  1660 MB/sec
[    3.524733]    32regs          :  1105 MB/sec
[    3.527751]    8regs           :  1098 MB/sec
[    3.529911]    arm4regs        :  1540 MB/sec

After:

[    3.517654] xor: measuring software checksum speed
[    3.519454]    neon            :  1896 MB/sec
[    3.522499]    32regs          :  1090 MB/sec
[    3.525560]    8regs           :  1083 MB/sec
[    3.527700]    arm4regs        :  1556 MB/sec

This applies onto Christoph's XOR cleanup series.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Eric Biggers <ebiggers@kernel.org>

Ard Biesheuvel (5):
  ARM: Add a neon-intrinsics.h header like on arm64
  crypto: aegis128 - Use neon-intrinsics.h on ARM too
  xor/arm: Replace vectorized implementation with arm64's intrinsics
  xor/arm64: Use shared NEON intrinsics implementation from 32-bit ARM
  ARM: Remove hacked-up asm/types.h header

 Documentation/arch/arm/kernel_mode_neon.rst |   4 +-
 arch/arm/include/asm/neon-intrinsics.h      |  64 +++++++
 arch/arm/include/uapi/asm/types.h           |  41 -----
 crypto/aegis128-neon-inner.c                |   4 +-
 lib/raid/xor/Makefile                       |   3 +-
 lib/raid/xor/arm/xor-neon.c                 | 187 ++++++++++++++++++--
 lib/raid/xor/arm/xor-neon.h                 |   7 +
 lib/raid/xor/arm/xor_arch.h                 |   7 +-
 lib/raid/xor/arm64/xor-neon.c               | 172 +-----------------
 lib/raid/xor/xor-8regs.c                    |   2 -
 10 files changed, 251 insertions(+), 240 deletions(-)
 create mode 100644 arch/arm/include/asm/neon-intrinsics.h
 delete mode 100644 arch/arm/include/uapi/asm/types.h
 create mode 100644 lib/raid/xor/arm/xor-neon.h

-- 
2.53.0.1018.g2bb0e51243-goog



^ permalink raw reply

* Re: [PATCH v8 01/10] dt-bindings: mfd: add support for the NXP SIUL2 module
From: Khristine Andreea Barbulescu @ 2026-03-31  7:48 UTC (permalink / raw)
  To: Krzysztof Kozlowski, Arnd Bergmann, Ghennadi Procopciuc
  Cc: Linus Walleij, Bartosz Golaszewski, Krzysztof Kozlowski,
	Conor Dooley, Chester Lin, Matthias Brugger, Ghennadi Procopciuc,
	Larisa Grigore, Lee Jones, Shawn Guo, Sascha Hauer, Fabio Estevam,
	Aisheng Dong, Jacky Bai, Greg Kroah-Hartman, Rafael J . Wysocki,
	Alberto Ruiz, Christophe Lizzi, devicetree, Enric Balletbo,
	Eric Chanudet, imx, linux-arm-kernel, open list:GPIO SUBSYSTEM,
	linux-kernel, NXP S32 Linux Team, Pengutronix Kernel Team,
	Vincent Guittot, Rob Herring
In-Reply-To: <f3ff461b-edd7-423a-ac99-e70145aaaaea@kernel.org>

On 3/23/2026 10:07 AM, Krzysztof Kozlowski wrote:
> On 23/03/2026 08:57, Khristine Andreea Barbulescu wrote:
>> On 3/14/2026 9:31 AM, Arnd Bergmann wrote:
>>> On Fri, Mar 13, 2026, at 18:10, Krzysztof Kozlowski wrote:
>>>> On 25/02/2026 10:40, Ghennadi Procopciuc wrote:
>>>>> On 2/23/2026 3:14 PM, Krzysztof Kozlowski wrote:
>>>>>>> there are no resources allocated specifically for nodes like
>>>>>>> "nxp,s32g-siul2-syscfg". Their consumers are the pinctrl/gpio
>>>>>>> driver and other drivers that read SoC‑specific information from
>>>>>>> those shared registers.
>>>>>>>  
>>>>>>> My alternative is to keep two separate syscon providers for the
>>>>>>
>>>>>> You got review already.
>>>>>>
>>>>> I still believe that nvmem is a suitable and accurate mechanism for
>>>>> describing SoC‑specific identification information, as originally
>>>>> proposed in [0], assuming the necessary adjustments are made.
>>>>>
>>>>> More specifically, instead of modeling software-defined cells, the nvmem
>>>>> layout would describe the actual hardware registers backing this
>>>>> information. One advantage of this approach is that consumer nodes (for
>>>>> example PCIe, Ethernet, or other IPs that need SoC identification data)
>>>>> can reference these registers using the standard nvmem-cells /
>>>>> nvmem-cell-names mechanism, without introducing custom, per-subsystem
>>>>> bindings.
>>>>
>>>> nvmem is applicable only if this is NVMEM. Information about the soc is
>>>> not NVMEM, unless this are blow out fuses / efuse. Does not look like,
>>>> because SoC information is set probably during design phase, not board
>>>> assembly.
>>>
>>> Agreed, nvmem clearly makes no sense here, the patch description
>>> appears to accurately describe the MMIO area as hardware registers
>>> with a fixed meaning rather than a convention for how the
>>> memory is being used.
>>>
>>> That said, there is probably room for improvement, since some of
>>> the register contents are read-only and could just be accessed
>>> by the boot firmware in order to move the information into more
>>> regular DT properties instead of defining bindings for drivers
>>> to access the information in raw form.
>>>
>>>     Arnd
>>
>> Hi Krzysztof & Arnd,
>>
>> Assuming we drop the syscon approach entirely, for the SerDes
>> presence information we could follow Arnd’s suggestion and have
>> it provided by the boot firmware instead of exposing it through SIUL2.
> 
> I think there is misunderstanding. By dropping syscon nodes, I meant to
> drop the nodes. Remove them. It implies that whatever their contain must
> go somewhere, right? Because your hardware is fixed and you cannot drop
> it from the hardware, right?
> 
> So their parent node is the syscon.
> 
> Best regards,
> Krzysztof


Hi Krzysztof & Arnd,

Following your suggestions, I reworked the DT so that the SIUL2
register regions are now described directly on the parent node, and
the separate syscon child nodes are removed.
 
The node would look like this:
    siul2: siul2@4009c000 {
        compatible = "nxp,s32g2-siul2";
        reg = <0x4009c000 0x179c>,
              <0x44010000 0x17b0>;
        reg-names = "siul20", "siul21";
 
        pinctrl: pinctrl {
            compatible = "nxp,s32g-siul2-pinctrl";
            gpio-controller;
            #gpio-cells = <2>;
            gpio-ranges = <&pinctrl 0 0 102>, <&pinctrl 112 112 79>;
            interrupt-controller;
            #interrupt-cells = <2>;
            interrupts = <GIC_SPI 210 IRQ_TYPE_LEVEL_HIGH>;
 
            jtag_pins: jtag-pins {
                ...
            };
        };
    };
 
With the current layout, the SIUL2 node itself now contains the two
MMIO ranges directly, while the remaining child node is only the
pinctrl/GPIO function.
 
I am wondering whether it still makes sense to keep the MFD approach
at all. In the current form, the node no longer models multiple
separate child providers such as the previous syscon children, but
rather a single parent node containing the whole register space
together with the pinctrl/GPIO.
 
Would you recommend dropping the MFD layer entirely and having
the pinctrl/GPIO driver bind directly to the parent `siul2@...`
node instead?
 
Please let me know whether this is the direction you would prefer,
or if you still see value in keeping the current MFD based approach.

Best regards,
Khristine


^ permalink raw reply

* Re: [GIT PULL 4/7] ARM: tegra: Device tree changes for v7.1-rc1
From: Krzysztof Kozlowski @ 2026-03-31  7:42 UTC (permalink / raw)
  To: Thierry Reding
  Cc: arm, soc, Thierry Reding, Jon Hunter, linux-tegra,
	linux-arm-kernel
In-Reply-To: <act5kGG-4mZl0j3p@orome>

On 31/03/2026 09:38, Thierry Reding wrote:
>> Why does the DTS branch has mach code? Tag message mentions legacy
>> cleanup only and such cleanup should not cause mixing independent
>> hardware description (DTS) with drivers.
> 
> The DT additions for PAZ00 replace the legacy code, so it makes sense to
> replace it in one patch, otherwise we'd be introducing a bisectability
> problem.

OK, please mention it in the tag message in the future.

Best regards,
Krzysztof


^ permalink raw reply

* Re: [GIT PULL 4/7] ARM: tegra: Device tree changes for v7.1-rc1
From: Thierry Reding @ 2026-03-31  7:38 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: arm, soc, Thierry Reding, Jon Hunter, linux-tegra,
	linux-arm-kernel
In-Reply-To: <058d79b7-3d4c-4f0a-a95f-b2e3582a4fa7@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 2148 bytes --]

On Mon, Mar 30, 2026 at 01:46:32PM +0200, Krzysztof Kozlowski wrote:
> On 29/03/2026 17:10, Thierry Reding wrote:
> > ----------------------------------------------------------------
> > ARM: tegra: Device tree changes for v7.1-rc1
> > 
> > Various improvements for Tegra114 boards, as well as some legacy cleanup
> > for PAZ00 and Transformers devices.
> > 
> > ----------------------------------------------------------------
> > Dmitry Torokhov (1):
> >       ARM: tegra: paz00: Configure WiFi rfkill switch through device tree
> > 
> > Svyatoslav Ryhel (8):
> >       ARM: tegra: Add SOCTHERM support on Tegra114
> >       ARM: tn7: Adjust panel node
> >       ARM: tegra: lg-x3: Add panel and bridge nodes
> >       ARM: tegra: lg-x3: Add USB and power related nodes
> >       ARM: tegra: lg-x3: Add node for capacitive buttons
> >       ARM: tegra: Add ACTMON node to Tegra114 device tree
> >       ARM: tegra: Add External Memory Controller node on Tegra114
> >       ARM: tegra: transformers: Add connector node
> > 
> >  arch/arm/boot/dts/nvidia/tegra114-tn7.dts        |  13 +-
> >  arch/arm/boot/dts/nvidia/tegra114.dtsi           | 221 +++++++++++++++++++++++
> >  arch/arm/boot/dts/nvidia/tegra20-paz00.dts       |   8 +
> >  arch/arm/boot/dts/nvidia/tegra30-asus-tf600t.dts |  21 ++-
> >  arch/arm/boot/dts/nvidia/tegra30-lg-p880.dts     |  23 +++
> >  arch/arm/boot/dts/nvidia/tegra30-lg-p895.dts     |  33 ++++
> >  arch/arm/boot/dts/nvidia/tegra30-lg-x3.dtsi      | 174 +++++++++++++++++-
> >  arch/arm/mach-tegra/Makefile                     |   2 -
> >  arch/arm/mach-tegra/board-paz00.c                |  56 ------
> >  arch/arm/mach-tegra/board.h                      |   2 -
> >  arch/arm/mach-tegra/tegra.c                      |   4 -
> 
> Why does the DTS branch has mach code? Tag message mentions legacy
> cleanup only and such cleanup should not cause mixing independent
> hardware description (DTS) with drivers.

The DT additions for PAZ00 replace the legacy code, so it makes sense to
replace it in one patch, otherwise we'd be introducing a bisectability
problem.

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [DMARC error]Re: [PATCH 0/2] Add PWM support Amlogic S7 S7D S6
From: George Stark @ 2026-03-31  7:33 UTC (permalink / raw)
  To: Xianwei Zhao, Martin Blumenstingl
  Cc: Uwe Kleine-König, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Heiner Kallweit, Neil Armstrong, Kevin Hilman,
	Jerome Brunet, linux-pwm, devicetree, linux-kernel,
	linux-arm-kernel, linux-amlogic, Junyi Zhao
In-Reply-To: <70a637b1-a76a-470c-9a97-0b4599a40a1c@amlogic.com>

Hello Martin, Xianwei


On 3/31/26 10:10, Xianwei Zhao wrote:
> Hi Martin,
>      I confirmed with Junyi Zhao that the current implementation counts 
> from zero, so this submission is correct.
> We agree this should be fixed and will address it in a follow-up patch.
> Thanks for pointing it out.
> 
> On 2026/3/31 05:54, Martin Blumenstingl wrote:
>> Hi Xianwei Zhao,
>>
>> thanks for your contribution!
>>
>> On Thu, Mar 26, 2026 at 7:35 AM Xianwei Zhao via B4 Relay
>> <devnull+xianwei.zhao.amlogic.com@kernel.org>  wrote:
>>> Add bindings and driver support Amlogic S7/S7D/S6 SoCs.
>> There is an old report that got lost, stating that the current

Xianwei Zhao thanks for the confirmation.
I am the author of the old report and the corresponding patch and it's 
not lost. So if the patch is correct I'll be glad to add relevant 
tested-by tags.

>> pwm-meson driver has an off-by-one error with the hi and lo fields:
>> [0]
>> Since you are working on bringing up a new platform: is this something
>> you can verify in your lab?
>> To be clear: I'm not expecting you to work on this ad-hoc or bring a
>> patch into this series. However, it would be great if you could verify
>> if the findings from [0] are correct and send an updated patch in
>> future.
>>
>> Thank you and best regards
>> Martin
> 
> _______________________________________________
> linux-amlogic mailing list
> linux-amlogic@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-amlogic

-- 
Best regards
George


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox