Re: [PATCH v11] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH v11] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests
       [not found] <20240229-fix_sparse_errors_checksum_tests-v11-1-f608d9ec7574@rivosinc.com>
@ 2024-03-01  7:17 ` Christophe Leroy
  2024-03-01 17:09   ` Charlie Jenkins
       [not found] ` <62b69aaf-7633-4bd8-aefe-5ba47147dba7@roeck-us.net>
  1 sibling, 1 reply; 7+ messages in thread
From: Christophe Leroy @ 2024-03-01  7:17 UTC (permalink / raw)
  To: Charlie Jenkins, Guenter Roeck, David Laight, Palmer Dabbelt,
	Andrew Morton, Helge Deller, James E.J. Bottomley, Parisc List,
	Arnd Bergmann, Geert Uytterhoeven, Russell King
  Cc: linux-kernel@vger.kernel.org, Palmer Dabbelt, Linux ARM,
	netdev@vger.kernel.org

+CC netdev ARM Russell

Le 29/02/2024 à 23:46, Charlie Jenkins a écrit :
> The test cases for ip_fast_csum and csum_ipv6_magic were not properly
> aligning the IP header, which were causing failures on architectures
> that do not support misaligned accesses like some ARM platforms. To
> solve this, align the data along (14 + NET_IP_ALIGN) bytes which is the
> standard alignment of an IP header and must be supported by the
> architecture.

In your description, please provide more details on platforms that have 
a problem, what the problem is exactly (Failed calculation, slowliness, 
kernel Oops, panic, ....) on each platform.

And please copy maintainers and lists of platforms your are specifically 
addressing with this change. And as this is network related, netdev list 
should have been copied as well.

I still think that your patch is not the good approach, it looks like 
you are ignoring all the discussion. Below is a quote of what Geert said 
and I fully agree with that:

	IMHO the tests should validate the expected functionality.  If a test
	fails, either functionality is missing or behaves wrong, or the test
	is wrong.

	What is the point of writing tests for a core functionality like network
	checksumming that do not match the expected functionality?

So we all agree that there is something to fix, because today's test 
does odd-address accesses which is unexpected for those functions, but 
2-byte alignments should be supported hence tested by the test. Limiting 
the test to a 16-bytes alignment deeply reduces the usefullness of the test.

Christophe
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v11] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests
  2024-03-01  7:17 ` [PATCH v11] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests Christophe Leroy
@ 2024-03-01 17:09   ` Charlie Jenkins
  2024-03-01 17:24     ` David Laight
  0 siblings, 1 reply; 7+ messages in thread
From: Charlie Jenkins @ 2024-03-01 17:09 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Guenter Roeck, David Laight, Palmer Dabbelt, Andrew Morton,
	Helge Deller, James E.J. Bottomley, Parisc List, Arnd Bergmann,
	Geert Uytterhoeven, Russell King, linux-kernel@vger.kernel.org,
	Palmer Dabbelt, Linux ARM, netdev@vger.kernel.org

On Fri, Mar 01, 2024 at 07:17:38AM +0000, Christophe Leroy wrote:
> +CC netdev ARM Russell
> 
> Le 29/02/2024 à 23:46, Charlie Jenkins a écrit :
> > The test cases for ip_fast_csum and csum_ipv6_magic were not properly
> > aligning the IP header, which were causing failures on architectures
> > that do not support misaligned accesses like some ARM platforms. To
> > solve this, align the data along (14 + NET_IP_ALIGN) bytes which is the
> > standard alignment of an IP header and must be supported by the
> > architecture.
> 
> In your description, please provide more details on platforms that have 
> a problem, what the problem is exactly (Failed calculation, slowliness, 
> kernel Oops, panic, ....) on each platform.
> 
> And please copy maintainers and lists of platforms your are specifically 
> addressing with this change. And as this is network related, netdev list 
> should have been copied as well.
> 
> I still think that your patch is not the good approach, it looks like 
> you are ignoring all the discussion. Below is a quote of what Geert said 
> and I fully agree with that:
> 
> 	IMHO the tests should validate the expected functionality.  If a test
> 	fails, either functionality is missing or behaves wrong, or the test
> 	is wrong.
> 
> 	What is the point of writing tests for a core functionality like network
> 	checksumming that do not match the expected functionality?
> 
> 
> So we all agree that there is something to fix, because today's test 
> does odd-address accesses which is unexpected for those functions, but 
> 2-byte alignments should be supported hence tested by the test. Limiting 
> the test to a 16-bytes alignment deeply reduces the usefullness of the test.
> 

Maybe I am lost in the conversations. This isn't limited to 16-bytes
alignment? It aligns along 14 + NET_IP_ALIGN. That is 16 on some
platforms and 14 on platforms where unaligned accesses are desired.
These functions are expected to be called with this offset. Testing with
any other alignment is not the expected behavior. These tests are
testing the expected functionality.

- Charlie

> Christophe

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH v11] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests
  2024-03-01 17:09   ` Charlie Jenkins
@ 2024-03-01 17:24     ` David Laight
  2024-03-01 17:30       ` Charlie Jenkins
  0 siblings, 1 reply; 7+ messages in thread
From: David Laight @ 2024-03-01 17:24 UTC (permalink / raw)
  To: 'Charlie Jenkins', Christophe Leroy
  Cc: Guenter Roeck, Palmer Dabbelt, Andrew Morton, Helge Deller,
	James E.J. Bottomley, Parisc List, Arnd Bergmann,
	Geert Uytterhoeven, Russell King, linux-kernel@vger.kernel.org,
	Palmer Dabbelt, Linux ARM, netdev@vger.kernel.org

From: Charlie Jenkins
> Sent: 01 March 2024 17:09
> 
> On Fri, Mar 01, 2024 at 07:17:38AM +0000, Christophe Leroy wrote:
> > +CC netdev ARM Russell
> >
> > Le 29/02/2024 à 23:46, Charlie Jenkins a écrit :
> > > The test cases for ip_fast_csum and csum_ipv6_magic were not properly
> > > aligning the IP header, which were causing failures on architectures
> > > that do not support misaligned accesses like some ARM platforms. To
> > > solve this, align the data along (14 + NET_IP_ALIGN) bytes which is the
> > > standard alignment of an IP header and must be supported by the
> > > architecture.
> >
> > In your description, please provide more details on platforms that have
> > a problem, what the problem is exactly (Failed calculation, slowliness,
> > kernel Oops, panic, ....) on each platform.
> >
> > And please copy maintainers and lists of platforms your are specifically
> > addressing with this change. And as this is network related, netdev list
> > should have been copied as well.
> >
> > I still think that your patch is not the good approach, it looks like
> > you are ignoring all the discussion. Below is a quote of what Geert said
> > and I fully agree with that:
> >
> > 	IMHO the tests should validate the expected functionality.  If a test
> > 	fails, either functionality is missing or behaves wrong, or the test
> > 	is wrong.
> >
> > 	What is the point of writing tests for a core functionality like network
> > 	checksumming that do not match the expected functionality?
> >
> >
> > So we all agree that there is something to fix, because today's test
> > does odd-address accesses which is unexpected for those functions, but
> > 2-byte alignments should be supported hence tested by the test. Limiting
> > the test to a 16-bytes alignment deeply reduces the usefullness of the test.
> >
> 
> Maybe I am lost in the conversations. This isn't limited to 16-bytes
> alignment? It aligns along 14 + NET_IP_ALIGN. That is 16 on some
> platforms and 14 on platforms where unaligned accesses are desired.
> These functions are expected to be called with this offset. Testing with
> any other alignment is not the expected behavior. These tests are
> testing the expected functionality.

Aligned received frames can have a 4 byte VLAN header (or two) removed.
So the alignment of the IP header is either 4n or 4n+2.
If the cpu fault misaligned accesses you really want the alignment
to be 4n.

You pretty much never want to trap and fixup a misaligned access.
Especially in the network stack.
I suspect it is better to do a realignment copy of the entire frame.
At some point the data will be copied again, although you may want
a CBU (crystal ball unit) to decide whether to align on an 8n
or 8n+4 boundary to optimise a later copy.

CPU that support misaligned transfers just make coders sloppy :-)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v11] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests
  2024-03-01 17:24     ` David Laight
@ 2024-03-01 17:30       ` Charlie Jenkins
  0 siblings, 0 replies; 7+ messages in thread
From: Charlie Jenkins @ 2024-03-01 17:30 UTC (permalink / raw)
  To: David Laight
  Cc: Christophe Leroy, Guenter Roeck, Palmer Dabbelt, Andrew Morton,
	Helge Deller, James E.J. Bottomley, Parisc List, Arnd Bergmann,
	Geert Uytterhoeven, Russell King, linux-kernel@vger.kernel.org,
	Palmer Dabbelt, Linux ARM, netdev@vger.kernel.org

On Fri, Mar 01, 2024 at 05:24:39PM +0000, David Laight wrote:
> From: Charlie Jenkins
> > Sent: 01 March 2024 17:09
> > 
> > On Fri, Mar 01, 2024 at 07:17:38AM +0000, Christophe Leroy wrote:
> > > +CC netdev ARM Russell
> > >
> > > Le 29/02/2024 à 23:46, Charlie Jenkins a écrit :
> > > > The test cases for ip_fast_csum and csum_ipv6_magic were not properly
> > > > aligning the IP header, which were causing failures on architectures
> > > > that do not support misaligned accesses like some ARM platforms. To
> > > > solve this, align the data along (14 + NET_IP_ALIGN) bytes which is the
> > > > standard alignment of an IP header and must be supported by the
> > > > architecture.
> > >
> > > In your description, please provide more details on platforms that have
> > > a problem, what the problem is exactly (Failed calculation, slowliness,
> > > kernel Oops, panic, ....) on each platform.
> > >
> > > And please copy maintainers and lists of platforms your are specifically
> > > addressing with this change. And as this is network related, netdev list
> > > should have been copied as well.
> > >
> > > I still think that your patch is not the good approach, it looks like
> > > you are ignoring all the discussion. Below is a quote of what Geert said
> > > and I fully agree with that:
> > >
> > > 	IMHO the tests should validate the expected functionality.  If a test
> > > 	fails, either functionality is missing or behaves wrong, or the test
> > > 	is wrong.
> > >
> > > 	What is the point of writing tests for a core functionality like network
> > > 	checksumming that do not match the expected functionality?
> > >
> > >
> > > So we all agree that there is something to fix, because today's test
> > > does odd-address accesses which is unexpected for those functions, but
> > > 2-byte alignments should be supported hence tested by the test. Limiting
> > > the test to a 16-bytes alignment deeply reduces the usefullness of the test.
> > >
> > 
> > Maybe I am lost in the conversations. This isn't limited to 16-bytes
> > alignment? It aligns along 14 + NET_IP_ALIGN. That is 16 on some
> > platforms and 14 on platforms where unaligned accesses are desired.
> > These functions are expected to be called with this offset. Testing with
> > any other alignment is not the expected behavior. These tests are
> > testing the expected functionality.
> 
> Aligned received frames can have a 4 byte VLAN header (or two) removed.
> So the alignment of the IP header is either 4n or 4n+2.
> If the cpu fault misaligned accesses you really want the alignment
> to be 4n.
> 
> You pretty much never want to trap and fixup a misaligned access.
> Especially in the network stack.
> I suspect it is better to do a realignment copy of the entire frame.
> At some point the data will be copied again, although you may want
> a CBU (crystal ball unit) to decide whether to align on an 8n
> or 8n+4 boundary to optimise a later copy.
> 
> CPU that support misaligned transfers just make coders sloppy :-)
> 
> 	David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
> 

Can you elaborate on how exactly you suggest the tests to be changed to
accomidate what you are saying here? I don't understand how what I have
proposed doesn't represent the use case of these functions.

- Charlie


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

[parent not found: <62b69aaf-7633-4bd8-aefe-5ba47147dba7@roeck-us.net>]

[parent not found: <f422742a-4c86-4cb0-a4f7-a62f0310eb23@csgroup.eu>]

[parent not found: <6df98c91-26b1-497a-9202-18bf86c0130d@roeck-us.net>]

* Re: [PATCH v11] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests
       [not found]     ` <6df98c91-26b1-497a-9202-18bf86c0130d@roeck-us.net>
@ 2024-03-04 11:39       ` Christophe Leroy
  2024-03-04 13:39         ` Arnd Bergmann
  0 siblings, 1 reply; 7+ messages in thread
From: Christophe Leroy @ 2024-03-04 11:39 UTC (permalink / raw)
  To: Guenter Roeck, Russell King
  Cc: linux-kernel@vger.kernel.org, Palmer Dabbelt, David Laight,
	Charlie Jenkins, James E.J. Bottomley, Helge Deller,
	Palmer Dabbelt, Geert Uytterhoeven, Arnd Bergmann, Andrew Morton,
	Parisc List, Linux ARM

Hi Russell and Guenter,

Le 03/03/2024 à 16:26, Guenter Roeck a écrit :
> On 3/3/24 02:20, Christophe Leroy wrote:
>>
>>
>> Le 01/03/2024 à 19:32, Guenter Roeck a écrit :
>>> This leaves the mps2-an385:mps2_defconfig crash, which is avoided by
>>> this patch.
>>> My understanding, which may be wrong, is that arm images with thumb
>>> instructions
>>> do not support unaligned accesses (maybe I should say do not support
>>> unaligned
>>> accesses with the mps2-an385 qemu emulation; I did not test with real
>>> hardware,
>>> after all).

...

>>
>> Can you tell how to proceed ?
>>
> 
> You can't run it directly. mps2-an385 is one of the platforms where
> the qemu maintainers insisted that qemu shall not initialize the CPU.
> You have to provide a shim such as
> https://github.com/groeck/linux-build-test/blob/master/rootfs/arm/mps2-boot.axf
> as bios. You also have to provide the dtb file.
> 
> On top of that, you would need a customized version of qemu which
> actually reads the command line, the bios file, and the dtb. See
> https://github.com/groeck/linux-build-test/tree/master/qemu
> branch v8.2.1-local or v8.1.5-local.
> 

Many thanks for your guidance. So, I did the test and what I can say:

ip_fast_csum() works whatever the alignment is.

csum_ipv6_magic() is the problem with unaligned ipv6 source or 
destination addresses:

[    0.503757] KTAP version 1
[    0.503854] 1..1
[    0.504156]     KTAP version 1
[    0.504251]     # Subtest: checksum
[    0.504563]     # module: checksum_kunit
[    0.504730]     1..5
[    0.546418]     ok 1 test_csum_fixed_random_inputs
[    0.627853]     ok 2 test_csum_all_carry_inputs
[    0.704918]     ok 3 test_csum_no_carry_inputs
[    0.705845]     ok 4 test_ip_fast_csum
[    0.706320]
[    0.706320] Unhandled exception: IPSR = 00000006 LR = fffffff1
[    0.706796] CPU: 0 PID: 28 Comm: kunit_try_catch Tainted: G 
       N 6.8.0-rc1-00609-g9c0b7a2e25f0 #649
[    0.707177] Hardware name: Generic DT based system
[    0.707400] PC is at __csum_ipv6_magic+0x8/0xb4
[    0.708170] LR is at test_csum_ipv6_magic+0x3d/0xa4
[    0.708415] pc : [<211b0da8>]    lr : [<210e3bf5>]    psr: 0100020b
[    0.708692] sp : 2153debc  ip : 46c7f0d2  fp : 00000000
[    0.708919] r10: 00000000  r9 : 2141dc48  r8 : 211e0e20
[    0.709148] r7 : 00003085  r6 : 00000001  r5 : 2141dd24  r4 : 211e0c2e
[    0.709422] r3 : 2c000000  r2 : 1ac7f0d2  r1 : 211e0c19  r0 : 211e0c09
[    0.709704] xPSR: 0100020b


I don't know much about ARM instruction set, seems like the ldr 
instruction used in ip_fast_csum() doesn't mind unaligned accesses while 
ldmia instruction used in csum_ipv6_magic() minds. Or is it a wrong 
behaviour of QEMU ?

If I change the test as follows to only use word aligned IPv6 addresses, 
it works:

diff --git a/lib/checksum_kunit.c b/lib/checksum_kunit.c
index 225bb7701460..4d86fc8ccd78 100644
--- a/lib/checksum_kunit.c
+++ b/lib/checksum_kunit.c
@@ -607,7 +607,7 @@ static void test_csum_ipv6_magic(struct kunit *test)
  	const int csum_offset = sizeof(struct in6_addr) + sizeof(struct 
in6_addr) +
  			    sizeof(int) + sizeof(char);

-	for (int i = 0; i < NUM_IPv6_TESTS; i++) {
+	for (int i = 0; i < NUM_IPv6_TESTS; i += 4) {
  		saddr = (const struct in6_addr *)(random_buf + i);
  		daddr = (const struct in6_addr *)(random_buf + i +
  						  daddr_offset);


If I change csum_ipv6_magic() as follows to use instruction ldr instead 
of ldmia, it also works without any change to the test:

diff --git a/arch/arm/lib/csumipv6.S b/arch/arm/lib/csumipv6.S
index 3559d515144c..a312d0836b95 100644
--- a/arch/arm/lib/csumipv6.S
+++ b/arch/arm/lib/csumipv6.S
@@ -12,12 +12,18 @@
  ENTRY(__csum_ipv6_magic)
  		str	lr, [sp, #-4]!
  		adds	ip, r2, r3
-		ldmia	r1, {r1 - r3, lr}
+		ldr	r2, [r1], #4
+		ldr	r3, [r1], #4
+		ldr	lr, [r1], #4
+		ldr	r1, [r1]
  		adcs	ip, ip, r1
  		adcs	ip, ip, r2
  		adcs	ip, ip, r3
  		adcs	ip, ip, lr
-		ldmia	r0, {r0 - r3}
+		ldr	r1, [r0], #4
+		ldr	r2, [r0], #4
+		ldr	r3, [r0], #4
+		ldr	r0, [r0]
  		adcs	r0, ip, r0
  		adcs	r0, r0, r1
  		adcs	r0, r0, r2


So now we are back to the initial question, should checksumming on 
unaligned addresses be supported or not ?

Russell I understand from previous answers from you that half-word 
alignment should be supported, in that case should ARM version of 
csum_ipv6_magic() be modified ? In that case can you propose the most 
optimised fix ?

If not, then the test has to be fixed to only use word-aligned IPv6 
addresses.

Thanks
Christophe
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v11] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests
  2024-03-04 11:39       ` Christophe Leroy
@ 2024-03-04 13:39         ` Arnd Bergmann
  2024-03-05  9:27           ` David Laight
  0 siblings, 1 reply; 7+ messages in thread
From: Arnd Bergmann @ 2024-03-04 13:39 UTC (permalink / raw)
  To: Christophe Leroy, Guenter Roeck, Russell King
  Cc: linux-kernel@vger.kernel.org, Palmer Dabbelt, David Laight,
	Charlie Jenkins, James E . J . Bottomley, Helge Deller,
	Palmer Dabbelt, Geert Uytterhoeven, Andrew Morton, Parisc List,
	Linux ARM

On Mon, Mar 4, 2024, at 12:39, Christophe Leroy wrote:
> Le 03/03/2024 à 16:26, Guenter Roeck a écrit :
>> On 3/3/24 02:20, Christophe Leroy wrote:
>
> I don't know much about ARM instruction set, seems like the ldr 
> instruction used in ip_fast_csum() doesn't mind unaligned accesses while 
> ldmia instruction used in csum_ipv6_magic() minds. Or is it a wrong 
> behaviour of QEMU ?

Correct.

On ARMv6 and newer, accessing normal unaligned memory with ldr/str
does not trap, and that covers most unaligned accesses.

Some of the cases that don't allow unaligned access include:

- ARMv4/ARMv5 cannot access unaligned memory with the same
  instructions. Apparently the same is true for ARMv7-M.

- multi-word accesses (ldrd/strd and ldm/stm) require 32-bit
  alignment. These are generated for most 64-bit variables
  and some arrays

- unaligned access on MMIO registers (__iomem pointers)
  always trap

- atomic access (ldrex/strex) requires aligned data

- The C standard disallows casting to a type with larger
  alignment requirements, and gcc is known to produce
  code that doesn't work with this (and other) undefined
  behavior.

> If I change the test as follows to only use word aligned IPv6 addresses, 
> it works:
>
> diff --git a/lib/checksum_kunit.c b/lib/checksum_kunit.c
> index 225bb7701460..4d86fc8ccd78 100644
> --- a/lib/checksum_kunit.c
> +++ b/lib/checksum_kunit.c
> @@ -607,7 +607,7 @@ static void test_csum_ipv6_magic(struct kunit *test)
>   	const int csum_offset = sizeof(struct in6_addr) + sizeof(struct 
> in6_addr) +
>   			    sizeof(int) + sizeof(char);
>
> -	for (int i = 0; i < NUM_IPv6_TESTS; i++) {
> +	for (int i = 0; i < NUM_IPv6_TESTS; i += 4) {
>   		saddr = (const struct in6_addr *)(random_buf + i);
>   		daddr = (const struct in6_addr *)(random_buf + i +
>   						  daddr_offset);
>
>
> If I change csum_ipv6_magic() as follows to use instruction ldr instead 
> of ldmia, it also works without any change to the test:
>
> diff --git a/arch/arm/lib/csumipv6.S b/arch/arm/lib/csumipv6.S
> index 3559d515144c..a312d0836b95 100644
> --- a/arch/arm/lib/csumipv6.S
> +++ b/arch/arm/lib/csumipv6.S
> @@ -12,12 +12,18 @@
>   ENTRY(__csum_ipv6_magic)
>   		str	lr, [sp, #-4]!
>   		adds	ip, r2, r3
> -		ldmia	r1, {r1 - r3, lr}
> +		ldr	r2, [r1], #4
> +		ldr	r3, [r1], #4
> +		ldr	lr, [r1], #4
> +		ldr	r1, [r1]
>
> So now we are back to the initial question, should checksumming on 
> unaligned addresses be supported or not ?
>
> Russell I understand from previous answers from you that half-word 
> alignment should be supported, in that case should ARM version of 
> csum_ipv6_magic() be modified ? In that case can you propose the most 
> optimised fix ?

The csumipv6.S code predates ARMv6 and is indeed suboptimal on v6/v7
processors with unaligned ipv6 headers. Your workaround looks like
it should be much better, but it would at the same time make the
ARMv5 case much more expensive because it traps four times instead
of just one.

> If not, then the test has to be fixed to only use word-aligned IPv6 
> addresses.

Because of the gcc issue I mentioned, net/ipv6/ip6_checksum.c
and anything else that accesses misaligned ipv6 headers may need
to be changed as well. Marking in6_addr as '__packed __aligned(2)'
should be sufficient for that. This will prevent gcc from issuing
ldm or ldrd on ARMv6+ as well as making optimization based on
the two lower bits of the address being zero on x86 and others.
The downside is that it forces 16-bit loads and stores to be
used on architectures that don't have efficient unaligned
access (armv5, alpha, mips, sparc and xtensa among others)
even when the IP headers are fully aligned.

     Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH v11] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests
  2024-03-04 13:39         ` Arnd Bergmann
@ 2024-03-05  9:27           ` David Laight
  0 siblings, 0 replies; 7+ messages in thread
From: David Laight @ 2024-03-05  9:27 UTC (permalink / raw)
  To: 'Arnd Bergmann', Christophe Leroy, Guenter Roeck,
	Russell King
  Cc: linux-kernel@vger.kernel.org, Palmer Dabbelt, Charlie Jenkins,
	James E . J . Bottomley, Helge Deller, Palmer Dabbelt,
	Geert Uytterhoeven, Andrew Morton, Parisc List, Linux ARM

From: Arnd Bergmann
> Sent: 04 March 2024 13:40
...
> > If not, then the test has to be fixed to only use word-aligned IPv6
> > addresses.
> 
> Because of the gcc issue I mentioned, net/ipv6/ip6_checksum.c
> and anything else that accesses misaligned ipv6 headers may need
> to be changed as well. Marking in6_addr as '__packed __aligned(2)'
> should be sufficient for that. This will prevent gcc from issuing
> ldm or ldrd on ARMv6+ as well as making optimization based on
> the two lower bits of the address being zero on x86 and others.

Eh? x86 pretty much doesn't care unless you are using AVX.

> The downside is that it forces 16-bit loads and stores to be
> used on architectures that don't have efficient unaligned
> access (armv5, alpha, mips, sparc and xtensa among others)
> even when the IP headers are fully aligned.

Aren't the later accesses to the header also going to fault?
IIRC there is an skb_pull() call to ensure all the IP header
is in the linear skb fragment?
Perhaps there should be an skb_pull_aligned() that will ensure
the data is 32bit aligned on systems where the misaligned accesses
fault?

There might still need to be something to stop gcc generating
ldm/ldrd which can fault on systems where a normal register
read wouldn't.

Do any recent arm cpu have the strongarm 'feature' than ldm
always took 16 clocks?

	David
 

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-03-05  9:27 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20240229-fix_sparse_errors_checksum_tests-v11-1-f608d9ec7574@rivosinc.com>
2024-03-01  7:17 ` [PATCH v11] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests Christophe Leroy
2024-03-01 17:09   ` Charlie Jenkins
2024-03-01 17:24     ` David Laight
2024-03-01 17:30       ` Charlie Jenkins
     [not found] ` <62b69aaf-7633-4bd8-aefe-5ba47147dba7@roeck-us.net>
     [not found]   ` <f422742a-4c86-4cb0-a4f7-a62f0310eb23@csgroup.eu>
     [not found]     ` <6df98c91-26b1-497a-9202-18bf86c0130d@roeck-us.net>
2024-03-04 11:39       ` Christophe Leroy
2024-03-04 13:39         ` Arnd Bergmann
2024-03-05  9:27           ` David Laight

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).