From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3FC47313557; Wed, 26 Nov 2025 20:15:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764188110; cv=none; b=lvWvCSx7mvSpCijAYVknLmm09jrf9nDP6q2PvWFcVx8LADNUrGyCSojCE+wQBUsAPUtNw5iYt42T/HpqxVes8bT7V99aSlzIgOYNyX+uwA4PQYKNfYHlnCA8MUuFGzcbv52xKh/Wpd6nTJ/2eGxegxGSdoL4yE/e7kgDlhVdeW4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764188110; c=relaxed/simple; bh=lGalWdKdj7sgPvEF0tMHB26XJTqmC+jY3n19/VWB7KY=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=Xia60mp1XqajLBgvCTlT9gcxE9nZ8yYwFni2399qTG9V8VbwCU37YIp8KHDUaX2YwAl963r33/K26O3j+BzBZEfnnPD2ijSBqGLmMayi3GbODqgtHBkPmsehKDzHV4AaIha5HSs2dOruv1OHraGqfz9i1fOWcD1zQ8PvwFo/ntM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=GaKjUfqB; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="GaKjUfqB" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 76A29C4CEF7; Wed, 26 Nov 2025 20:15:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1764188109; bh=lGalWdKdj7sgPvEF0tMHB26XJTqmC+jY3n19/VWB7KY=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=GaKjUfqBLdCE3XVLZp8E1zn88awwnw5bGw7CzDUn3rm57kKkE6LuYgeuzYA8iqGfy vGhOF3wIdxd4cOeYBhVqn5rZboNTpXnMBc+eolQDU6wplrLNyKF0QZMDr2C1vSyyPW 2VxEwxkeLNyiHaQ6f6MAnfMz0DJjZ11TMwiDn+Nh4xlTYRQ94RcVS98ffRHOM4/kE5 N5RSg4VppsLqY81VZLYeCfQMta3t98QdJPK9QfQQaVwJY5J6g6cWNw8k9JyJh2m2U0 1+XVde2Yb/MeBhQHhgwBV0MKc6DOQy6k/2Qu3srFwSO41cSWr+CDjsxwItvQ55xkt9 Ehn/KJ9LSiHSQ== Date: Wed, 26 Nov 2025 21:15:04 +0100 From: Helge Deller To: John Johansen Cc: david laight , Helge Deller , John Paul Adrian Glaubitz , linux-kernel@vger.kernel.org, apparmor@lists.ubuntu.com, linux-security-module@vger.kernel.org, linux-parisc@vger.kernel.org Subject: Re: [PATCH 0/2] apparmor unaligned memory fixes Message-ID: References: <20251126104444.29002552@pumpkin> <4034ad19-8e09-440c-a042-a66a488c048b@gmx.de> <20251126142201.27e23076@pumpkin> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: * John Johansen : > On 11/26/25 07:12, Helge Deller wrote: > > * david laight : > > > On Wed, 26 Nov 2025 12:03:03 +0100 > > > Helge Deller wrote: > > > > > > > On 11/26/25 11:44, david laight wrote: > > > ... > > > > > > diff --git a/security/apparmor/match.c b/security/apparmor/match.c > > > > > > index 26e82ba879d44..3dcc342337aca 100644 > > > > > > --- a/security/apparmor/match.c > > > > > > +++ b/security/apparmor/match.c > > > > > > @@ -71,10 +71,10 @@ static struct table_header *unpack_table(char *blob, size_t bsize) > > > > > > u8, u8, byte_to_byte); > > > > > > > > > > Is that that just memcpy() ? > > > > > > > > No, it's memcpy() only on big-endian machines. > > > > > > You've misread the quoting... > > > The 'data8' case that was only half there is a memcpy(). > > > > > > > On little-endian machines it converts from big-endian > > > > 16/32-bit ints to little-endian 16/32-bit ints. > > > > > > > > But I see some potential for optimization here: > > > > a) on big-endian machines just use memcpy() > > > > > > true > > > > > > > b) on little-endian machines use memcpy() to copy from possibly-unaligned > > > > memory to then known-to-be-aligned destination. Then use a loop with > > > > be32_to_cpu() instead of get_unaligned_xx() as it's faster. > > > > > > There is a function that does a loop byteswap of a buffer - no reason > > > to re-invent it. > > > > I assumed there must be something, but I did not see it. Which one? > > > > > But I doubt it is always (if ever) faster to do a copy and then byteswap. > > > The loop control and extra memory accesses kill performance. > > > > Yes, you are probably right. > > > > > Not that I've seen a fast get_unaligned() - I don't think gcc or clang > > > generate optimal code - For LE I think it is something like: > > > low = *(addr & ~3); > > > high = *((addr + 3) & ~3); > > > shift = (addr & 3) * 8; > > > value = low << shift | high >> (32 - shift); > > > Note that it is only 2 aligned memory reads - even for 64bit. > > > > Ok, then maybe we should keep it simple like this patch: > > > > [PATCH v2] apparmor: Optimize table creation from possibly unaligned memory > > > > Source blob may come from userspace and might be unaligned. > > Try to optize the copying process by avoiding unaligned memory accesses. > > > > Signed-off-by: Helge Deller > > > > diff --git a/security/apparmor/include/match.h b/security/apparmor/include/match.h > > index 1fbe82f5021b..386da2023d50 100644 > > --- a/security/apparmor/include/match.h > > +++ b/security/apparmor/include/match.h > > @@ -104,16 +104,20 @@ struct aa_dfa { > > struct table_header *tables[YYTD_ID_TSIZE]; > > }; > > -#define byte_to_byte(X) (X) > > +#define byte_to_byte(X) (*(X)) > > #define UNPACK_ARRAY(TABLE, BLOB, LEN, TTYPE, BTYPE, NTOHX) \ > > do { \ > > typeof(LEN) __i; \ > > TTYPE *__t = (TTYPE *) TABLE; \ > > BTYPE *__b = (BTYPE *) BLOB; \ > > - for (__i = 0; __i < LEN; __i++) { \ > > - __t[__i] = NTOHX(__b[__i]); \ > > - } \ > > + BUILD_BUG_ON(sizeof(TTYPE) != sizeof(BTYPE)); \ > > + if (IS_ENABLED(CONFIG_CPU_BIG_ENDIAN) || sizeof(BTYPE) == 1) \ > > + memcpy(__t, __b, (LEN) * sizeof(BTYPE)); \ > > + else /* copy & convert convert from big-endian */ \ > > + for (__i = 0; __i < LEN; __i++) { \ > > + __t[__i] = NTOHX(&__b[__i]); \ > > + } \ > > } while (0) > > static inline size_t table_size(size_t len, size_t el_size) > > diff --git a/security/apparmor/match.c b/security/apparmor/match.c > > index c5a91600842a..13e2f6873329 100644 > > --- a/security/apparmor/match.c > > +++ b/security/apparmor/match.c > > @@ -15,6 +15,7 @@ > > #include > > #include > > #include > > +#include > > #include "include/lib.h" > > #include "include/match.h" > > @@ -70,10 +71,10 @@ static struct table_header *unpack_table(char *blob, size_t bsize) > > u8, u8, byte_to_byte); > > else if (th.td_flags == YYTD_DATA16) > > UNPACK_ARRAY(table->td_data, blob, th.td_lolen, > > - u16, __be16, be16_to_cpu); > > + u16, __be16, get_unaligned_be16); > > else if (th.td_flags == YYTD_DATA32) > > UNPACK_ARRAY(table->td_data, blob, th.td_lolen, > > - u32, __be32, be32_to_cpu); > > + u32, __be32, get_unaligned_be32); > > else > > goto fail; > > /* if table was vmalloced make sure the page tables are synced > > I think we can make one more tweak, in just not using UNPACK_ARRAY at all for the byte case > ie. > > diff --git a/security/apparmor/match.c b/security/apparmor/match.c > index 26e82ba879d44..389202560675c 100644 > --- a/security/apparmor/match.c > +++ b/security/apparmor/match.c > @@ -67,8 +67,7 @@ static struct table_header *unpack_table(char *blob, size_t bsize) > table->td_flags = th.td_flags; > table->td_lolen = th.td_lolen; > if (th.td_flags == YYTD_DATA8) > - UNPACK_ARRAY(table->td_data, blob, th.td_lolen, > - u8, u8, byte_to_byte); > + memcp(table->td_data, blob, th.td_lolen); True. Then byte_to_byte() can go away in match.h as well. So, here is a (untested) v3: [PATCH v3] apparmor: Optimize table creation from possibly unaligned memory Source blob may come from userspace and might be unaligned. Try to optize the copying process by avoiding unaligned memory accesses. Signed-off-by: Helge Deller diff --git a/security/apparmor/include/match.h b/security/apparmor/include/match.h index 1fbe82f5021b..19e72b3e8f49 100644 --- a/security/apparmor/include/match.h +++ b/security/apparmor/include/match.h @@ -104,16 +104,18 @@ struct aa_dfa { struct table_header *tables[YYTD_ID_TSIZE]; }; -#define byte_to_byte(X) (X) - #define UNPACK_ARRAY(TABLE, BLOB, LEN, TTYPE, BTYPE, NTOHX) \ do { \ typeof(LEN) __i; \ TTYPE *__t = (TTYPE *) TABLE; \ BTYPE *__b = (BTYPE *) BLOB; \ - for (__i = 0; __i < LEN; __i++) { \ - __t[__i] = NTOHX(__b[__i]); \ - } \ + BUILD_BUG_ON(sizeof(TTYPE) != sizeof(BTYPE)); \ + if (IS_ENABLED(CONFIG_CPU_BIG_ENDIAN)) \ + memcpy(__t, __b, (LEN) * sizeof(BTYPE)); \ + else /* copy & convert convert from big-endian */ \ + for (__i = 0; __i < LEN; __i++) { \ + __t[__i] = NTOHX(&__b[__i]); \ + } \ } while (0) static inline size_t table_size(size_t len, size_t el_size) diff --git a/security/apparmor/match.c b/security/apparmor/match.c index c5a91600842a..1e32c8ba14ae 100644 --- a/security/apparmor/match.c +++ b/security/apparmor/match.c @@ -15,6 +15,7 @@ #include #include #include +#include #include "include/lib.h" #include "include/match.h" @@ -66,14 +67,13 @@ static struct table_header *unpack_table(char *blob, size_t bsize) table->td_flags = th.td_flags; table->td_lolen = th.td_lolen; if (th.td_flags == YYTD_DATA8) - UNPACK_ARRAY(table->td_data, blob, th.td_lolen, - u8, u8, byte_to_byte); + memcpy(table->td_data, blob, th.td_lolen); else if (th.td_flags == YYTD_DATA16) UNPACK_ARRAY(table->td_data, blob, th.td_lolen, - u16, __be16, be16_to_cpu); + u16, __be16, get_unaligned_be16); else if (th.td_flags == YYTD_DATA32) UNPACK_ARRAY(table->td_data, blob, th.td_lolen, - u32, __be32, be32_to_cpu); + u32, __be32, get_unaligned_be32); else goto fail; /* if table was vmalloced make sure the page tables are synced