* [PATCH v2 0/5] lib/base64: add generic encoder/decoder, migrate users
@ 2025-09-11 7:29 Guan-Chun Wu
2025-09-11 7:32 ` [PATCH v2 1/5] lib/base64: Replace strchr() for better performance Guan-Chun Wu
` (4 more replies)
0 siblings, 5 replies; 28+ messages in thread
From: Guan-Chun Wu @ 2025-09-11 7:29 UTC (permalink / raw)
To: kbusch, axboe, hch, sagi, xiubli, idryomov, ebiggers, tytso,
jaegeuk, akpm
Cc: visitorckw, home7438072, 409411716, linux-kernel, linux-nvme,
ceph-devel, linux-fscrypt
This series introduces a generic, customizable Base64 encoder/decoder to
the kernel library, eliminating duplicated implementations and delivering
significant performance improvements.
The new helpers support a caller-supplied 64-character table and optional
'=' padding, covering existing variants such as base64url (fscrypt) and
Ceph's custom alphabet. As part of this series, both fscrypt and Ceph are
migrated to the generic helpers, removing their local routines while
preserving their specific formats.
On the encoder side, the implementation operates on 3-byte input blocks
mapped directly to 4 output symbols, avoiding bit-by-bit streaming. This
reduces shifts, masks, and loop overhead, achieving up to ~2.7x speedup
over previous implementations while remaining fully RFC 4648-compatible.
On the decoder side, optimizations replace strchr()-based lookups with a
direct mapping table. Together with stricter RFC 4648 validation, this
yields a ~12-15x improvement in decode throughput.
Overall, the series improves maintainability, correctness, and
performance of Base64 handling across the kernel.
Note:
- The included KUnit patch provides correctness and performance
comparison tests to help reviewers validate the improvements. All
tests pass locally on x86_64 (KTAP: pass:3 fail:0 skip:0). Benchmark
numbers are informational only and do not gate the tests.
- Updates nvme-auth call sites to the new API.
Thanks,
Guan-Chun Wu
---
v1 -> v2:
- Add a KUnit test suite for lib/base64:
* correctness tests (multiple alphabets, with/without padding)
* simple microbenchmark for informational performance comparison
- Rework encoder/decoder:
* encoder: generalize to a caller-provided 64-character table and
optional '=' padding
* decoder: optimize and extend to generic tables
- fscrypt: migrate from local base64url helpers to generic lib/base64
- ceph: migrate from local base64 helpers to generic lib/base64
---
Guan-Chun Wu (4):
lib/base64: rework encoder/decoder with customizable support and
update nvme-auth
lib: add KUnit tests for base64 encoding/decoding
fscrypt: replace local base64url helpers with generic lib/base64
helpers
ceph: replace local base64 encode/decode with generic lib/base64
helpers
Kuan-Wei Chiu (1):
lib/base64: Replace strchr() for better performance
drivers/nvme/common/auth.c | 7 +-
fs/ceph/crypto.c | 53 +-------
fs/ceph/crypto.h | 6 +-
fs/ceph/dir.c | 5 +-
fs/ceph/inode.c | 2 +-
fs/crypto/fname.c | 86 +------------
include/linux/base64.h | 4 +-
lib/Kconfig.debug | 19 ++-
lib/base64.c | 239 ++++++++++++++++++++++++++++++-------
lib/tests/Makefile | 1 +
lib/tests/base64_kunit.c | 230 +++++++++++++++++++++++++++++++++++
11 files changed, 466 insertions(+), 186 deletions(-)
create mode 100644 lib/tests/base64_kunit.c
--
2.34.1
^ permalink raw reply [flat|nested] 28+ messages in thread* [PATCH v2 1/5] lib/base64: Replace strchr() for better performance 2025-09-11 7:29 [PATCH v2 0/5] lib/base64: add generic encoder/decoder, migrate users Guan-Chun Wu @ 2025-09-11 7:32 ` Guan-Chun Wu 2025-09-11 15:50 ` Caleb Sander Mateos ` (2 more replies) 2025-09-11 7:41 ` [PATCH v2 2/5] lib/base64: rework encoder/decoder with customizable support and update nvme-auth Guan-Chun Wu ` (3 subsequent siblings) 4 siblings, 3 replies; 28+ messages in thread From: Guan-Chun Wu @ 2025-09-11 7:32 UTC (permalink / raw) To: 409411716 Cc: akpm, axboe, ceph-devel, ebiggers, hch, home7438072, idryomov, jaegeuk, kbusch, linux-fscrypt, linux-kernel, linux-nvme, sagi, tytso, visitorckw, xiubli From: Kuan-Wei Chiu <visitorckw@gmail.com> The base64 decoder previously relied on strchr() to locate each character in the base64 table. In the worst case, this requires scanning all 64 entries, and even with bitwise tricks or word-sized comparisons, still needs up to 8 checks. Introduce a small helper function that maps input characters directly to their position in the base64 table. This reduces the maximum number of comparisons to 5, improving decoding efficiency while keeping the logic straightforward. Benchmarks on x86_64 (Intel Core i7-10700 @ 2.90GHz, averaged over 1000 runs, tested with KUnit): Decode: - 64B input: avg ~1530ns -> ~126ns (~12x faster) - 1KB input: avg ~27726ns -> ~2003ns (~14x faster) Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Co-developed-by: Guan-Chun Wu <409411716@gms.tku.edu.tw> Signed-off-by: Guan-Chun Wu <409411716@gms.tku.edu.tw> --- lib/base64.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/lib/base64.c b/lib/base64.c index b736a7a43..9416bded2 100644 --- a/lib/base64.c +++ b/lib/base64.c @@ -18,6 +18,21 @@ static const char base64_table[65] = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"; +static inline const char *find_chr(const char *base64_table, char ch) +{ + if ('A' <= ch && ch <= 'Z') + return base64_table + ch - 'A'; + if ('a' <= ch && ch <= 'z') + return base64_table + 26 + ch - 'a'; + if ('0' <= ch && ch <= '9') + return base64_table + 26 * 2 + ch - '0'; + if (ch == base64_table[26 * 2 + 10]) + return base64_table + 26 * 2 + 10; + if (ch == base64_table[26 * 2 + 10 + 1]) + return base64_table + 26 * 2 + 10 + 1; + return NULL; +} + /** * base64_encode() - base64-encode some binary data * @src: the binary data to encode @@ -78,7 +93,7 @@ int base64_decode(const char *src, int srclen, u8 *dst) u8 *bp = dst; for (i = 0; i < srclen; i++) { - const char *p = strchr(base64_table, src[i]); + const char *p = find_chr(base64_table, src[i]); if (src[i] == '=') { ac = (ac << 6); -- 2.34.1 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/5] lib/base64: Replace strchr() for better performance 2025-09-11 7:32 ` [PATCH v2 1/5] lib/base64: Replace strchr() for better performance Guan-Chun Wu @ 2025-09-11 15:50 ` Caleb Sander Mateos 2025-09-11 16:02 ` Caleb Sander Mateos 2025-09-11 16:25 ` Kuan-Wei Chiu 2025-09-11 18:14 ` Eric Biggers 2025-09-12 22:54 ` David Laight 2 siblings, 2 replies; 28+ messages in thread From: Caleb Sander Mateos @ 2025-09-11 15:50 UTC (permalink / raw) To: Guan-Chun Wu Cc: akpm, axboe, ceph-devel, ebiggers, hch, home7438072, idryomov, jaegeuk, kbusch, linux-fscrypt, linux-kernel, linux-nvme, sagi, tytso, visitorckw, xiubli On Thu, Sep 11, 2025 at 12:33 AM Guan-Chun Wu <409411716@gms.tku.edu.tw> wrote: > > From: Kuan-Wei Chiu <visitorckw@gmail.com> > > The base64 decoder previously relied on strchr() to locate each > character in the base64 table. In the worst case, this requires > scanning all 64 entries, and even with bitwise tricks or word-sized > comparisons, still needs up to 8 checks. > > Introduce a small helper function that maps input characters directly > to their position in the base64 table. This reduces the maximum number > of comparisons to 5, improving decoding efficiency while keeping the > logic straightforward. > > Benchmarks on x86_64 (Intel Core i7-10700 @ 2.90GHz, averaged > over 1000 runs, tested with KUnit): > > Decode: > - 64B input: avg ~1530ns -> ~126ns (~12x faster) > - 1KB input: avg ~27726ns -> ~2003ns (~14x faster) > > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> > Co-developed-by: Guan-Chun Wu <409411716@gms.tku.edu.tw> > Signed-off-by: Guan-Chun Wu <409411716@gms.tku.edu.tw> > --- > lib/base64.c | 17 ++++++++++++++++- > 1 file changed, 16 insertions(+), 1 deletion(-) > > diff --git a/lib/base64.c b/lib/base64.c > index b736a7a43..9416bded2 100644 > --- a/lib/base64.c > +++ b/lib/base64.c > @@ -18,6 +18,21 @@ > static const char base64_table[65] = > "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"; Does base64_table still need to be NUL-terminated? > > +static inline const char *find_chr(const char *base64_table, char ch) Don't see a need to pass in base64_table, the function could just access the global variable directly. > +{ > + if ('A' <= ch && ch <= 'Z') > + return base64_table + ch - 'A'; > + if ('a' <= ch && ch <= 'z') > + return base64_table + 26 + ch - 'a'; > + if ('0' <= ch && ch <= '9') > + return base64_table + 26 * 2 + ch - '0'; > + if (ch == base64_table[26 * 2 + 10]) > + return base64_table + 26 * 2 + 10; > + if (ch == base64_table[26 * 2 + 10 + 1]) > + return base64_table + 26 * 2 + 10 + 1; > + return NULL; This is still pretty branchy. One way to avoid the branches would be to define a reverse lookup table mapping base64 chars to their values (or a sentinel value for invalid chars). Have you benchmarked that approach? Best, Caleb > +} > + > /** > * base64_encode() - base64-encode some binary data > * @src: the binary data to encode > @@ -78,7 +93,7 @@ int base64_decode(const char *src, int srclen, u8 *dst) > u8 *bp = dst; > > for (i = 0; i < srclen; i++) { > - const char *p = strchr(base64_table, src[i]); > + const char *p = find_chr(base64_table, src[i]); > > if (src[i] == '=') { > ac = (ac << 6); > -- > 2.34.1 > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/5] lib/base64: Replace strchr() for better performance 2025-09-11 15:50 ` Caleb Sander Mateos @ 2025-09-11 16:02 ` Caleb Sander Mateos 2025-09-11 16:25 ` Kuan-Wei Chiu 1 sibling, 0 replies; 28+ messages in thread From: Caleb Sander Mateos @ 2025-09-11 16:02 UTC (permalink / raw) To: Guan-Chun Wu Cc: akpm, axboe, ceph-devel, ebiggers, hch, home7438072, idryomov, jaegeuk, kbusch, linux-fscrypt, linux-kernel, linux-nvme, sagi, tytso, visitorckw, xiubli On Thu, Sep 11, 2025 at 8:50 AM Caleb Sander Mateos <csander@purestorage.com> wrote: > > On Thu, Sep 11, 2025 at 12:33 AM Guan-Chun Wu <409411716@gms.tku.edu.tw> wrote: > > > > From: Kuan-Wei Chiu <visitorckw@gmail.com> > > > > The base64 decoder previously relied on strchr() to locate each > > character in the base64 table. In the worst case, this requires > > scanning all 64 entries, and even with bitwise tricks or word-sized > > comparisons, still needs up to 8 checks. > > > > Introduce a small helper function that maps input characters directly > > to their position in the base64 table. This reduces the maximum number > > of comparisons to 5, improving decoding efficiency while keeping the > > logic straightforward. > > > > Benchmarks on x86_64 (Intel Core i7-10700 @ 2.90GHz, averaged > > over 1000 runs, tested with KUnit): > > > > Decode: > > - 64B input: avg ~1530ns -> ~126ns (~12x faster) > > - 1KB input: avg ~27726ns -> ~2003ns (~14x faster) > > > > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> > > Co-developed-by: Guan-Chun Wu <409411716@gms.tku.edu.tw> > > Signed-off-by: Guan-Chun Wu <409411716@gms.tku.edu.tw> > > --- > > lib/base64.c | 17 ++++++++++++++++- > > 1 file changed, 16 insertions(+), 1 deletion(-) > > > > diff --git a/lib/base64.c b/lib/base64.c > > index b736a7a43..9416bded2 100644 > > --- a/lib/base64.c > > +++ b/lib/base64.c > > @@ -18,6 +18,21 @@ > > static const char base64_table[65] = > > "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"; > > Does base64_table still need to be NUL-terminated? > > > > > +static inline const char *find_chr(const char *base64_table, char ch) > > Don't see a need to pass in base64_table, the function could just > access the global variable directly. Never mind, I see the following patches pass in different base64_table values. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/5] lib/base64: Replace strchr() for better performance 2025-09-11 15:50 ` Caleb Sander Mateos 2025-09-11 16:02 ` Caleb Sander Mateos @ 2025-09-11 16:25 ` Kuan-Wei Chiu 2025-09-11 16:38 ` Kuan-Wei Chiu 1 sibling, 1 reply; 28+ messages in thread From: Kuan-Wei Chiu @ 2025-09-11 16:25 UTC (permalink / raw) To: Caleb Sander Mateos Cc: Guan-Chun Wu, akpm, axboe, ceph-devel, ebiggers, hch, home7438072, idryomov, jaegeuk, kbusch, linux-fscrypt, linux-kernel, linux-nvme, sagi, tytso, xiubli Hi Caleb, On Thu, Sep 11, 2025 at 08:50:12AM -0700, Caleb Sander Mateos wrote: > On Thu, Sep 11, 2025 at 12:33 AM Guan-Chun Wu <409411716@gms.tku.edu.tw> wrote: > > > > From: Kuan-Wei Chiu <visitorckw@gmail.com> > > > > The base64 decoder previously relied on strchr() to locate each > > character in the base64 table. In the worst case, this requires > > scanning all 64 entries, and even with bitwise tricks or word-sized > > comparisons, still needs up to 8 checks. > > > > Introduce a small helper function that maps input characters directly > > to their position in the base64 table. This reduces the maximum number > > of comparisons to 5, improving decoding efficiency while keeping the > > logic straightforward. > > > > Benchmarks on x86_64 (Intel Core i7-10700 @ 2.90GHz, averaged > > over 1000 runs, tested with KUnit): > > > > Decode: > > - 64B input: avg ~1530ns -> ~126ns (~12x faster) > > - 1KB input: avg ~27726ns -> ~2003ns (~14x faster) > > > > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> > > Co-developed-by: Guan-Chun Wu <409411716@gms.tku.edu.tw> > > Signed-off-by: Guan-Chun Wu <409411716@gms.tku.edu.tw> > > --- > > lib/base64.c | 17 ++++++++++++++++- > > 1 file changed, 16 insertions(+), 1 deletion(-) > > > > diff --git a/lib/base64.c b/lib/base64.c > > index b736a7a43..9416bded2 100644 > > --- a/lib/base64.c > > +++ b/lib/base64.c > > @@ -18,6 +18,21 @@ > > static const char base64_table[65] = > > "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"; > > Does base64_table still need to be NUL-terminated? > Right, it doesn't need to be nul-terminated. > > > > +static inline const char *find_chr(const char *base64_table, char ch) > > Don't see a need to pass in base64_table, the function could just > access the global variable directly. > > > +{ > > + if ('A' <= ch && ch <= 'Z') > > + return base64_table + ch - 'A'; > > + if ('a' <= ch && ch <= 'z') > > + return base64_table + 26 + ch - 'a'; > > + if ('0' <= ch && ch <= '9') > > + return base64_table + 26 * 2 + ch - '0'; > > + if (ch == base64_table[26 * 2 + 10]) > > + return base64_table + 26 * 2 + 10; > > + if (ch == base64_table[26 * 2 + 10 + 1]) > > + return base64_table + 26 * 2 + 10 + 1; > > + return NULL; > > This is still pretty branchy. One way to avoid the branches would be > to define a reverse lookup table mapping base64 chars to their values > (or a sentinel value for invalid chars). Have you benchmarked that > approach? > We've considered that approach and agree it could very likely be faster. However, since a later patch in this series will add support for users to provide their own base64 table, adopting a reverse lookup table would also require each user to supply a corresponding reverse table. We're not sure whether the extra memory overhead in exchange for runtime speed would be an acceptable tradeoff for everyone, and it might also cause confusion on the API side as to why it's mandatory to pass in a reverse table. By contrast, the simple inline function gives us a clear performance improvement without additional memory cost or complicating the API. That said, if there's consensus that a reverse lookup table is worthwhile, we can certainly revisit the idea. Regards, Kuan-Wei > > > +} > > + > > /** > > * base64_encode() - base64-encode some binary data > > * @src: the binary data to encode > > @@ -78,7 +93,7 @@ int base64_decode(const char *src, int srclen, u8 *dst) > > u8 *bp = dst; > > > > for (i = 0; i < srclen; i++) { > > - const char *p = strchr(base64_table, src[i]); > > + const char *p = find_chr(base64_table, src[i]); > > > > if (src[i] == '=') { > > ac = (ac << 6); > > -- > > 2.34.1 > > > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/5] lib/base64: Replace strchr() for better performance 2025-09-11 16:25 ` Kuan-Wei Chiu @ 2025-09-11 16:38 ` Kuan-Wei Chiu 2025-09-14 20:12 ` David Laight 0 siblings, 1 reply; 28+ messages in thread From: Kuan-Wei Chiu @ 2025-09-11 16:38 UTC (permalink / raw) To: Caleb Sander Mateos Cc: Guan-Chun Wu, akpm, axboe, ceph-devel, ebiggers, hch, home7438072, idryomov, jaegeuk, kbusch, linux-fscrypt, linux-kernel, linux-nvme, sagi, tytso, xiubli On Fri, Sep 12, 2025 at 12:26:02AM +0800, Kuan-Wei Chiu wrote: > Hi Caleb, > > On Thu, Sep 11, 2025 at 08:50:12AM -0700, Caleb Sander Mateos wrote: > > On Thu, Sep 11, 2025 at 12:33 AM Guan-Chun Wu <409411716@gms.tku.edu.tw> wrote: > > > > > > From: Kuan-Wei Chiu <visitorckw@gmail.com> > > > > > > The base64 decoder previously relied on strchr() to locate each > > > character in the base64 table. In the worst case, this requires > > > scanning all 64 entries, and even with bitwise tricks or word-sized > > > comparisons, still needs up to 8 checks. > > > > > > Introduce a small helper function that maps input characters directly > > > to their position in the base64 table. This reduces the maximum number > > > of comparisons to 5, improving decoding efficiency while keeping the > > > logic straightforward. > > > > > > Benchmarks on x86_64 (Intel Core i7-10700 @ 2.90GHz, averaged > > > over 1000 runs, tested with KUnit): > > > > > > Decode: > > > - 64B input: avg ~1530ns -> ~126ns (~12x faster) > > > - 1KB input: avg ~27726ns -> ~2003ns (~14x faster) > > > > > > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> > > > Co-developed-by: Guan-Chun Wu <409411716@gms.tku.edu.tw> > > > Signed-off-by: Guan-Chun Wu <409411716@gms.tku.edu.tw> > > > --- > > > lib/base64.c | 17 ++++++++++++++++- > > > 1 file changed, 16 insertions(+), 1 deletion(-) > > > > > > diff --git a/lib/base64.c b/lib/base64.c > > > index b736a7a43..9416bded2 100644 > > > --- a/lib/base64.c > > > +++ b/lib/base64.c > > > @@ -18,6 +18,21 @@ > > > static const char base64_table[65] = > > > "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"; > > > > Does base64_table still need to be NUL-terminated? > > > Right, it doesn't need to be nul-terminated. > > > > > > > +static inline const char *find_chr(const char *base64_table, char ch) > > > > Don't see a need to pass in base64_table, the function could just > > access the global variable directly. > > > > > +{ > > > + if ('A' <= ch && ch <= 'Z') > > > + return base64_table + ch - 'A'; > > > + if ('a' <= ch && ch <= 'z') > > > + return base64_table + 26 + ch - 'a'; > > > + if ('0' <= ch && ch <= '9') > > > + return base64_table + 26 * 2 + ch - '0'; > > > + if (ch == base64_table[26 * 2 + 10]) > > > + return base64_table + 26 * 2 + 10; > > > + if (ch == base64_table[26 * 2 + 10 + 1]) > > > + return base64_table + 26 * 2 + 10 + 1; > > > + return NULL; > > > > This is still pretty branchy. One way to avoid the branches would be > > to define a reverse lookup table mapping base64 chars to their values > > (or a sentinel value for invalid chars). Have you benchmarked that > > approach? > > > We've considered that approach and agree it could very likely be faster. > However, since a later patch in this series will add support for users to > provide their own base64 table, adopting a reverse lookup table would also > require each user to supply a corresponding reverse table. We're not sure > whether the extra memory overhead in exchange for runtime speed would be > an acceptable tradeoff for everyone, and it might also cause confusion on > the API side as to why it's mandatory to pass in a reverse table. > > By contrast, the simple inline function gives us a clear performance > improvement without additional memory cost or complicating the API. That > said, if there's consensus that a reverse lookup table is worthwhile, we > can certainly revisit the idea. > Or I just realized that since different base64 tables only differ in the last two characters, we could allocate a 256 entry reverse table inside the base64 function and set the mapping for those two characters. That way, users wouldn't need to pass in a reverse table. The downside is that this would significantly increase the function's stack size. Regards, Kuan-Wei > > > > > > +} > > > + > > > /** > > > * base64_encode() - base64-encode some binary data > > > * @src: the binary data to encode > > > @@ -78,7 +93,7 @@ int base64_decode(const char *src, int srclen, u8 *dst) > > > u8 *bp = dst; > > > > > > for (i = 0; i < srclen; i++) { > > > - const char *p = strchr(base64_table, src[i]); > > > + const char *p = find_chr(base64_table, src[i]); > > > > > > if (src[i] == '=') { > > > ac = (ac << 6); > > > -- > > > 2.34.1 > > > > > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/5] lib/base64: Replace strchr() for better performance 2025-09-11 16:38 ` Kuan-Wei Chiu @ 2025-09-14 20:12 ` David Laight 2025-09-15 7:50 ` Kuan-Wei Chiu 0 siblings, 1 reply; 28+ messages in thread From: David Laight @ 2025-09-14 20:12 UTC (permalink / raw) To: Kuan-Wei Chiu Cc: Caleb Sander Mateos, Guan-Chun Wu, akpm, axboe, ceph-devel, ebiggers, hch, home7438072, idryomov, jaegeuk, kbusch, linux-fscrypt, linux-kernel, linux-nvme, sagi, tytso, xiubli On Fri, 12 Sep 2025 00:38:20 +0800 Kuan-Wei Chiu <visitorckw@gmail.com> wrote: ... > Or I just realized that since different base64 tables only differ in the > last two characters, we could allocate a 256 entry reverse table inside > the base64 function and set the mapping for those two characters. That > way, users wouldn't need to pass in a reverse table. The downside is that > this would significantly increase the function's stack size. How many different variants are there? IIRC there are only are two common ones. (and it might not matter is the decoder accepted both sets since I'm pretty sure the issue is that '/' can't be used because it has already been treated as a separator.) Since the code only has to handle in-kernel users - which presumably use a fixed table for each call site, they only need to pass in an identifier for the table. That would mean they can use the same identifier for encode and decode, and the tables themselves wouldn't be replicated and would be part of the implementation. David ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/5] lib/base64: Replace strchr() for better performance 2025-09-14 20:12 ` David Laight @ 2025-09-15 7:50 ` Kuan-Wei Chiu 2025-09-15 11:02 ` David Laight 0 siblings, 1 reply; 28+ messages in thread From: Kuan-Wei Chiu @ 2025-09-15 7:50 UTC (permalink / raw) To: David Laight Cc: Caleb Sander Mateos, Guan-Chun Wu, akpm, axboe, ceph-devel, ebiggers, hch, home7438072, idryomov, jaegeuk, kbusch, linux-fscrypt, linux-kernel, linux-nvme, sagi, tytso, xiubli On Sun, Sep 14, 2025 at 09:12:43PM +0100, David Laight wrote: > On Fri, 12 Sep 2025 00:38:20 +0800 > Kuan-Wei Chiu <visitorckw@gmail.com> wrote: > > ... > > Or I just realized that since different base64 tables only differ in the > > last two characters, we could allocate a 256 entry reverse table inside > > the base64 function and set the mapping for those two characters. That > > way, users wouldn't need to pass in a reverse table. The downside is that > > this would significantly increase the function's stack size. > > How many different variants are there? Currently there are 3 variants: RFC 4648 (standard), RFC 4648 (base64url), and RFC 3501. They use "+/", "-_", and "+," respectively for the last two characters. > IIRC there are only are two common ones. > (and it might not matter is the decoder accepted both sets since I'm > pretty sure the issue is that '/' can't be used because it has already > been treated as a separator.) > > Since the code only has to handle in-kernel users - which presumably > use a fixed table for each call site, they only need to pass in > an identifier for the table. > That would mean they can use the same identifier for encode and decode, > and the tables themselves wouldn't be replicated and would be part of > the implementation. > So maybe we can define an enum in the header like this: enum base64_variant { BASE64_STD, /* RFC 4648 (standard) */ BASE64_URLSAFE, /* RFC 4648 (base64url) */ BASE64_IMAP, /* RFC 3501 */ }; Then the enum value can be passed as a parameter to base64_encode/decode, and in base64.c we can define the tables and reverse tables like this: static const char base64_tables[][64] = { [BASE64_STD] = "ABC...+/", [BASE64_URLSAFE] = "ABC...-_", [BASE64_IMAP] = "ABC...+,", }; What do you think about this approach? Regards, Kuan-Wei ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/5] lib/base64: Replace strchr() for better performance 2025-09-15 7:50 ` Kuan-Wei Chiu @ 2025-09-15 11:02 ` David Laight 2025-09-16 7:22 ` Kuan-Wei Chiu 0 siblings, 1 reply; 28+ messages in thread From: David Laight @ 2025-09-15 11:02 UTC (permalink / raw) To: Kuan-Wei Chiu Cc: Caleb Sander Mateos, Guan-Chun Wu, akpm, axboe, ceph-devel, ebiggers, hch, home7438072, idryomov, jaegeuk, kbusch, linux-fscrypt, linux-kernel, linux-nvme, sagi, tytso, xiubli On Mon, 15 Sep 2025 15:50:18 +0800 Kuan-Wei Chiu <visitorckw@gmail.com> wrote: > On Sun, Sep 14, 2025 at 09:12:43PM +0100, David Laight wrote: > > On Fri, 12 Sep 2025 00:38:20 +0800 > > Kuan-Wei Chiu <visitorckw@gmail.com> wrote: > > > > ... > > > Or I just realized that since different base64 tables only differ in the > > > last two characters, we could allocate a 256 entry reverse table inside > > > the base64 function and set the mapping for those two characters. That > > > way, users wouldn't need to pass in a reverse table. The downside is that > > > this would significantly increase the function's stack size. > > > > How many different variants are there? > > Currently there are 3 variants: > RFC 4648 (standard), RFC 4648 (base64url), and RFC 3501. > They use "+/", "-_", and "+," respectively for the last two characters. So always decoding "+-" to 62 and "/_," to 63 would just miss a few error cases - which may not matter. > > > IIRC there are only are two common ones. > > (and it might not matter is the decoder accepted both sets since I'm > > pretty sure the issue is that '/' can't be used because it has already > > been treated as a separator.) > > > > Since the code only has to handle in-kernel users - which presumably > > use a fixed table for each call site, they only need to pass in > > an identifier for the table. > > That would mean they can use the same identifier for encode and decode, > > and the tables themselves wouldn't be replicated and would be part of > > the implementation. > > > So maybe we can define an enum in the header like this: > > enum base64_variant { > BASE64_STD, /* RFC 4648 (standard) */ > BASE64_URLSAFE, /* RFC 4648 (base64url) */ > BASE64_IMAP, /* RFC 3501 */ > }; > > Then the enum value can be passed as a parameter to base64_encode/decode, > and in base64.c we can define the tables and reverse tables like this: > > static const char base64_tables[][64] = { > [BASE64_STD] = "ABC...+/", > [BASE64_URLSAFE] = "ABC...-_", > [BASE64_IMAP] = "ABC...+,", > }; > > What do you think about this approach? That is the sort of thing I was thinking about. It even lets you change the implementation without changing the callers. For instance BASE64_STD could actually be a pointer to an incomplete struct that contains the lookup tables. Initialising the decode table is going to be a PITA. You probably want 'signed char' with -1 for the invalid characters. Then if any of the four characters for a 24bit output are invalid the 24bit value will be negative. David > > Regards, > Kuan-Wei > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/5] lib/base64: Replace strchr() for better performance 2025-09-15 11:02 ` David Laight @ 2025-09-16 7:22 ` Kuan-Wei Chiu 0 siblings, 0 replies; 28+ messages in thread From: Kuan-Wei Chiu @ 2025-09-16 7:22 UTC (permalink / raw) To: David Laight Cc: Caleb Sander Mateos, Guan-Chun Wu, akpm, axboe, ceph-devel, ebiggers, hch, home7438072, idryomov, jaegeuk, kbusch, linux-fscrypt, linux-kernel, linux-nvme, sagi, tytso, xiubli On Mon, Sep 15, 2025 at 12:02:20PM +0100, David Laight wrote: > On Mon, 15 Sep 2025 15:50:18 +0800 > Kuan-Wei Chiu <visitorckw@gmail.com> wrote: > > > On Sun, Sep 14, 2025 at 09:12:43PM +0100, David Laight wrote: > > > On Fri, 12 Sep 2025 00:38:20 +0800 > > > Kuan-Wei Chiu <visitorckw@gmail.com> wrote: > > > > > > ... > > > > Or I just realized that since different base64 tables only differ in the > > > > last two characters, we could allocate a 256 entry reverse table inside > > > > the base64 function and set the mapping for those two characters. That > > > > way, users wouldn't need to pass in a reverse table. The downside is that > > > > this would significantly increase the function's stack size. > > > > > > How many different variants are there? > > > > Currently there are 3 variants: > > RFC 4648 (standard), RFC 4648 (base64url), and RFC 3501. > > They use "+/", "-_", and "+," respectively for the last two characters. > > So always decoding "+-" to 62 and "/_," to 63 would just miss a few error > cases - which may not matter. > > > > > > IIRC there are only are two common ones. > > > (and it might not matter is the decoder accepted both sets since I'm > > > pretty sure the issue is that '/' can't be used because it has already > > > been treated as a separator.) > > > > > > Since the code only has to handle in-kernel users - which presumably > > > use a fixed table for each call site, they only need to pass in > > > an identifier for the table. > > > That would mean they can use the same identifier for encode and decode, > > > and the tables themselves wouldn't be replicated and would be part of > > > the implementation. > > > > > So maybe we can define an enum in the header like this: > > > > enum base64_variant { > > BASE64_STD, /* RFC 4648 (standard) */ > > BASE64_URLSAFE, /* RFC 4648 (base64url) */ > > BASE64_IMAP, /* RFC 3501 */ > > }; > > > > Then the enum value can be passed as a parameter to base64_encode/decode, > > and in base64.c we can define the tables and reverse tables like this: > > > > static const char base64_tables[][64] = { > > [BASE64_STD] = "ABC...+/", > > [BASE64_URLSAFE] = "ABC...-_", > > [BASE64_IMAP] = "ABC...+,", > > }; > > > > What do you think about this approach? > > That is the sort of thing I was thinking about. > > It even lets you change the implementation without changing the callers. > For instance BASE64_STD could actually be a pointer to an incomplete > struct that contains the lookup tables. > > Initialising the decode table is going to be a PITA. > You probably want 'signed char' with -1 for the invalid characters. > Then if any of the four characters for a 24bit output are invalid > the 24bit value will be negative. > Thanks for the feedback. so for the next version of the patch, I plan to use a 3×64 encode table and a 3×256 reverse table. Does this approach sound good to everyone? Regards, Kuan-Wei ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/5] lib/base64: Replace strchr() for better performance 2025-09-11 7:32 ` [PATCH v2 1/5] lib/base64: Replace strchr() for better performance Guan-Chun Wu 2025-09-11 15:50 ` Caleb Sander Mateos @ 2025-09-11 18:14 ` Eric Biggers 2025-09-11 18:44 ` Kuan-Wei Chiu 2025-09-12 22:54 ` David Laight 2 siblings, 1 reply; 28+ messages in thread From: Eric Biggers @ 2025-09-11 18:14 UTC (permalink / raw) To: Guan-Chun Wu Cc: akpm, axboe, ceph-devel, hch, home7438072, idryomov, jaegeuk, kbusch, linux-fscrypt, linux-kernel, linux-nvme, sagi, tytso, visitorckw, xiubli On Thu, Sep 11, 2025 at 03:32:04PM +0800, Guan-Chun Wu wrote: > From: Kuan-Wei Chiu <visitorckw@gmail.com> > > The base64 decoder previously relied on strchr() to locate each > character in the base64 table. In the worst case, this requires > scanning all 64 entries, and even with bitwise tricks or word-sized > comparisons, still needs up to 8 checks. > > Introduce a small helper function that maps input characters directly > to their position in the base64 table. This reduces the maximum number > of comparisons to 5, improving decoding efficiency while keeping the > logic straightforward. > > Benchmarks on x86_64 (Intel Core i7-10700 @ 2.90GHz, averaged > over 1000 runs, tested with KUnit): > > Decode: > - 64B input: avg ~1530ns -> ~126ns (~12x faster) > - 1KB input: avg ~27726ns -> ~2003ns (~14x faster) > > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> > Co-developed-by: Guan-Chun Wu <409411716@gms.tku.edu.tw> > Signed-off-by: Guan-Chun Wu <409411716@gms.tku.edu.tw> > --- > lib/base64.c | 17 ++++++++++++++++- > 1 file changed, 16 insertions(+), 1 deletion(-) > > diff --git a/lib/base64.c b/lib/base64.c > index b736a7a43..9416bded2 100644 > --- a/lib/base64.c > +++ b/lib/base64.c > @@ -18,6 +18,21 @@ > static const char base64_table[65] = > "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"; > > +static inline const char *find_chr(const char *base64_table, char ch) > +{ > + if ('A' <= ch && ch <= 'Z') > + return base64_table + ch - 'A'; > + if ('a' <= ch && ch <= 'z') > + return base64_table + 26 + ch - 'a'; > + if ('0' <= ch && ch <= '9') > + return base64_table + 26 * 2 + ch - '0'; > + if (ch == base64_table[26 * 2 + 10]) > + return base64_table + 26 * 2 + 10; > + if (ch == base64_table[26 * 2 + 10 + 1]) > + return base64_table + 26 * 2 + 10 + 1; > + return NULL; > +} > + > /** > * base64_encode() - base64-encode some binary data > * @src: the binary data to encode > @@ -78,7 +93,7 @@ int base64_decode(const char *src, int srclen, u8 *dst) > u8 *bp = dst; > > for (i = 0; i < srclen; i++) { > - const char *p = strchr(base64_table, src[i]); > + const char *p = find_chr(base64_table, src[i]); > > if (src[i] == '=') { > ac = (ac << 6); But this makes the contents of base64_table no longer be used, except for entries 62 and 63. So this patch doesn't make sense. Either we should actually use base64_table, or we should remove base64_table and do the mapping entirely in code. - Eric ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/5] lib/base64: Replace strchr() for better performance 2025-09-11 18:14 ` Eric Biggers @ 2025-09-11 18:44 ` Kuan-Wei Chiu 2025-09-11 18:49 ` Eric Biggers 2025-09-13 21:27 ` David Laight 0 siblings, 2 replies; 28+ messages in thread From: Kuan-Wei Chiu @ 2025-09-11 18:44 UTC (permalink / raw) To: Eric Biggers Cc: Guan-Chun Wu, akpm, axboe, ceph-devel, hch, home7438072, idryomov, jaegeuk, kbusch, linux-fscrypt, linux-kernel, linux-nvme, sagi, tytso, xiubli Hi Eric, On Thu, Sep 11, 2025 at 11:14:18AM -0700, Eric Biggers wrote: > On Thu, Sep 11, 2025 at 03:32:04PM +0800, Guan-Chun Wu wrote: > > From: Kuan-Wei Chiu <visitorckw@gmail.com> > > > > The base64 decoder previously relied on strchr() to locate each > > character in the base64 table. In the worst case, this requires > > scanning all 64 entries, and even with bitwise tricks or word-sized > > comparisons, still needs up to 8 checks. > > > > Introduce a small helper function that maps input characters directly > > to their position in the base64 table. This reduces the maximum number > > of comparisons to 5, improving decoding efficiency while keeping the > > logic straightforward. > > > > Benchmarks on x86_64 (Intel Core i7-10700 @ 2.90GHz, averaged > > over 1000 runs, tested with KUnit): > > > > Decode: > > - 64B input: avg ~1530ns -> ~126ns (~12x faster) > > - 1KB input: avg ~27726ns -> ~2003ns (~14x faster) > > > > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> > > Co-developed-by: Guan-Chun Wu <409411716@gms.tku.edu.tw> > > Signed-off-by: Guan-Chun Wu <409411716@gms.tku.edu.tw> > > --- > > lib/base64.c | 17 ++++++++++++++++- > > 1 file changed, 16 insertions(+), 1 deletion(-) > > > > diff --git a/lib/base64.c b/lib/base64.c > > index b736a7a43..9416bded2 100644 > > --- a/lib/base64.c > > +++ b/lib/base64.c > > @@ -18,6 +18,21 @@ > > static const char base64_table[65] = > > "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"; > > > > +static inline const char *find_chr(const char *base64_table, char ch) > > +{ > > + if ('A' <= ch && ch <= 'Z') > > + return base64_table + ch - 'A'; > > + if ('a' <= ch && ch <= 'z') > > + return base64_table + 26 + ch - 'a'; > > + if ('0' <= ch && ch <= '9') > > + return base64_table + 26 * 2 + ch - '0'; > > + if (ch == base64_table[26 * 2 + 10]) > > + return base64_table + 26 * 2 + 10; > > + if (ch == base64_table[26 * 2 + 10 + 1]) > > + return base64_table + 26 * 2 + 10 + 1; > > + return NULL; > > +} > > + > > /** > > * base64_encode() - base64-encode some binary data > > * @src: the binary data to encode > > @@ -78,7 +93,7 @@ int base64_decode(const char *src, int srclen, u8 *dst) > > u8 *bp = dst; > > > > for (i = 0; i < srclen; i++) { > > - const char *p = strchr(base64_table, src[i]); > > + const char *p = find_chr(base64_table, src[i]); > > > > if (src[i] == '=') { > > ac = (ac << 6); > > But this makes the contents of base64_table no longer be used, except > for entries 62 and 63. So this patch doesn't make sense. Either we > should actually use base64_table, or we should remove base64_table and > do the mapping entirely in code. > For base64_decode(), you're right. After this patch it only uses the last two entries of base64_table. However, base64_encode() still makes use of the entire table. I'm a bit unsure why it would be unacceptable if only one of the two functions relies on the full base64 table. Regards, Kuan-Wei ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/5] lib/base64: Replace strchr() for better performance 2025-09-11 18:44 ` Kuan-Wei Chiu @ 2025-09-11 18:49 ` Eric Biggers 2025-09-11 19:00 ` Kuan-Wei Chiu 2025-09-13 21:27 ` David Laight 1 sibling, 1 reply; 28+ messages in thread From: Eric Biggers @ 2025-09-11 18:49 UTC (permalink / raw) To: Kuan-Wei Chiu Cc: Guan-Chun Wu, akpm, axboe, ceph-devel, hch, home7438072, idryomov, jaegeuk, kbusch, linux-fscrypt, linux-kernel, linux-nvme, sagi, tytso, xiubli On Fri, Sep 12, 2025 at 02:44:41AM +0800, Kuan-Wei Chiu wrote: > Hi Eric, > > On Thu, Sep 11, 2025 at 11:14:18AM -0700, Eric Biggers wrote: > > On Thu, Sep 11, 2025 at 03:32:04PM +0800, Guan-Chun Wu wrote: > > > From: Kuan-Wei Chiu <visitorckw@gmail.com> > > > > > > The base64 decoder previously relied on strchr() to locate each > > > character in the base64 table. In the worst case, this requires > > > scanning all 64 entries, and even with bitwise tricks or word-sized > > > comparisons, still needs up to 8 checks. > > > > > > Introduce a small helper function that maps input characters directly > > > to their position in the base64 table. This reduces the maximum number > > > of comparisons to 5, improving decoding efficiency while keeping the > > > logic straightforward. > > > > > > Benchmarks on x86_64 (Intel Core i7-10700 @ 2.90GHz, averaged > > > over 1000 runs, tested with KUnit): > > > > > > Decode: > > > - 64B input: avg ~1530ns -> ~126ns (~12x faster) > > > - 1KB input: avg ~27726ns -> ~2003ns (~14x faster) > > > > > > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> > > > Co-developed-by: Guan-Chun Wu <409411716@gms.tku.edu.tw> > > > Signed-off-by: Guan-Chun Wu <409411716@gms.tku.edu.tw> > > > --- > > > lib/base64.c | 17 ++++++++++++++++- > > > 1 file changed, 16 insertions(+), 1 deletion(-) > > > > > > diff --git a/lib/base64.c b/lib/base64.c > > > index b736a7a43..9416bded2 100644 > > > --- a/lib/base64.c > > > +++ b/lib/base64.c > > > @@ -18,6 +18,21 @@ > > > static const char base64_table[65] = > > > "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"; > > > > > > +static inline const char *find_chr(const char *base64_table, char ch) > > > +{ > > > + if ('A' <= ch && ch <= 'Z') > > > + return base64_table + ch - 'A'; > > > + if ('a' <= ch && ch <= 'z') > > > + return base64_table + 26 + ch - 'a'; > > > + if ('0' <= ch && ch <= '9') > > > + return base64_table + 26 * 2 + ch - '0'; > > > + if (ch == base64_table[26 * 2 + 10]) > > > + return base64_table + 26 * 2 + 10; > > > + if (ch == base64_table[26 * 2 + 10 + 1]) > > > + return base64_table + 26 * 2 + 10 + 1; > > > + return NULL; > > > +} > > > + > > > /** > > > * base64_encode() - base64-encode some binary data > > > * @src: the binary data to encode > > > @@ -78,7 +93,7 @@ int base64_decode(const char *src, int srclen, u8 *dst) > > > u8 *bp = dst; > > > > > > for (i = 0; i < srclen; i++) { > > > - const char *p = strchr(base64_table, src[i]); > > > + const char *p = find_chr(base64_table, src[i]); > > > > > > if (src[i] == '=') { > > > ac = (ac << 6); > > > > But this makes the contents of base64_table no longer be used, except > > for entries 62 and 63. So this patch doesn't make sense. Either we > > should actually use base64_table, or we should remove base64_table and > > do the mapping entirely in code. > > > For base64_decode(), you're right. After this patch it only uses the last > two entries of base64_table. However, base64_encode() still makes use of > the entire table. > > I'm a bit unsure why it would be unacceptable if only one of the two > functions relies on the full base64 table. Well, don't remove the table then. But please don't calculate pointers into it, only to take the offset from the beginning and never actually dereference them. You should just generate the offset directly. - Eric ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/5] lib/base64: Replace strchr() for better performance 2025-09-11 18:49 ` Eric Biggers @ 2025-09-11 19:00 ` Kuan-Wei Chiu 0 siblings, 0 replies; 28+ messages in thread From: Kuan-Wei Chiu @ 2025-09-11 19:00 UTC (permalink / raw) To: Eric Biggers Cc: Guan-Chun Wu, akpm, axboe, ceph-devel, hch, home7438072, idryomov, jaegeuk, kbusch, linux-fscrypt, linux-kernel, linux-nvme, sagi, tytso, xiubli On Thu, Sep 11, 2025 at 11:49:35AM -0700, Eric Biggers wrote: > On Fri, Sep 12, 2025 at 02:44:41AM +0800, Kuan-Wei Chiu wrote: > > Hi Eric, > > > > On Thu, Sep 11, 2025 at 11:14:18AM -0700, Eric Biggers wrote: > > > On Thu, Sep 11, 2025 at 03:32:04PM +0800, Guan-Chun Wu wrote: > > > > From: Kuan-Wei Chiu <visitorckw@gmail.com> > > > > > > > > The base64 decoder previously relied on strchr() to locate each > > > > character in the base64 table. In the worst case, this requires > > > > scanning all 64 entries, and even with bitwise tricks or word-sized > > > > comparisons, still needs up to 8 checks. > > > > > > > > Introduce a small helper function that maps input characters directly > > > > to their position in the base64 table. This reduces the maximum number > > > > of comparisons to 5, improving decoding efficiency while keeping the > > > > logic straightforward. > > > > > > > > Benchmarks on x86_64 (Intel Core i7-10700 @ 2.90GHz, averaged > > > > over 1000 runs, tested with KUnit): > > > > > > > > Decode: > > > > - 64B input: avg ~1530ns -> ~126ns (~12x faster) > > > > - 1KB input: avg ~27726ns -> ~2003ns (~14x faster) > > > > > > > > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> > > > > Co-developed-by: Guan-Chun Wu <409411716@gms.tku.edu.tw> > > > > Signed-off-by: Guan-Chun Wu <409411716@gms.tku.edu.tw> > > > > --- > > > > lib/base64.c | 17 ++++++++++++++++- > > > > 1 file changed, 16 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/lib/base64.c b/lib/base64.c > > > > index b736a7a43..9416bded2 100644 > > > > --- a/lib/base64.c > > > > +++ b/lib/base64.c > > > > @@ -18,6 +18,21 @@ > > > > static const char base64_table[65] = > > > > "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"; > > > > > > > > +static inline const char *find_chr(const char *base64_table, char ch) > > > > +{ > > > > + if ('A' <= ch && ch <= 'Z') > > > > + return base64_table + ch - 'A'; > > > > + if ('a' <= ch && ch <= 'z') > > > > + return base64_table + 26 + ch - 'a'; > > > > + if ('0' <= ch && ch <= '9') > > > > + return base64_table + 26 * 2 + ch - '0'; > > > > + if (ch == base64_table[26 * 2 + 10]) > > > > + return base64_table + 26 * 2 + 10; > > > > + if (ch == base64_table[26 * 2 + 10 + 1]) > > > > + return base64_table + 26 * 2 + 10 + 1; > > > > + return NULL; > > > > +} > > > > + > > > > /** > > > > * base64_encode() - base64-encode some binary data > > > > * @src: the binary data to encode > > > > @@ -78,7 +93,7 @@ int base64_decode(const char *src, int srclen, u8 *dst) > > > > u8 *bp = dst; > > > > > > > > for (i = 0; i < srclen; i++) { > > > > - const char *p = strchr(base64_table, src[i]); > > > > + const char *p = find_chr(base64_table, src[i]); > > > > > > > > if (src[i] == '=') { > > > > ac = (ac << 6); > > > > > > But this makes the contents of base64_table no longer be used, except > > > for entries 62 and 63. So this patch doesn't make sense. Either we > > > should actually use base64_table, or we should remove base64_table and > > > do the mapping entirely in code. > > > > > For base64_decode(), you're right. After this patch it only uses the last > > two entries of base64_table. However, base64_encode() still makes use of > > the entire table. > > > > I'm a bit unsure why it would be unacceptable if only one of the two > > functions relies on the full base64 table. > > Well, don't remove the table then. But please don't calculate pointers > into it, only to take the offset from the beginning and never actually > dereference them. You should just generate the offset directly. > Agreed. Thanks for the review. I'll make that change in the next version. Regards, Kuan-Wei ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/5] lib/base64: Replace strchr() for better performance 2025-09-11 18:44 ` Kuan-Wei Chiu 2025-09-11 18:49 ` Eric Biggers @ 2025-09-13 21:27 ` David Laight 1 sibling, 0 replies; 28+ messages in thread From: David Laight @ 2025-09-13 21:27 UTC (permalink / raw) To: Kuan-Wei Chiu Cc: Eric Biggers, Guan-Chun Wu, akpm, axboe, ceph-devel, hch, home7438072, idryomov, jaegeuk, kbusch, linux-fscrypt, linux-kernel, linux-nvme, sagi, tytso, xiubli On Fri, 12 Sep 2025 02:44:41 +0800 Kuan-Wei Chiu <visitorckw@gmail.com> wrote: > Hi Eric, > > On Thu, Sep 11, 2025 at 11:14:18AM -0700, Eric Biggers wrote: > > On Thu, Sep 11, 2025 at 03:32:04PM +0800, Guan-Chun Wu wrote: > > > From: Kuan-Wei Chiu <visitorckw@gmail.com> > > > > > > The base64 decoder previously relied on strchr() to locate each > > > character in the base64 table. In the worst case, this requires > > > scanning all 64 entries, and even with bitwise tricks or word-sized > > > comparisons, still needs up to 8 checks. > > > > > > Introduce a small helper function that maps input characters directly > > > to their position in the base64 table. This reduces the maximum number > > > of comparisons to 5, improving decoding efficiency while keeping the > > > logic straightforward. > > > > > > Benchmarks on x86_64 (Intel Core i7-10700 @ 2.90GHz, averaged > > > over 1000 runs, tested with KUnit): > > > > > > Decode: > > > - 64B input: avg ~1530ns -> ~126ns (~12x faster) > > > - 1KB input: avg ~27726ns -> ~2003ns (~14x faster) > > > > > > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> > > > Co-developed-by: Guan-Chun Wu <409411716@gms.tku.edu.tw> > > > Signed-off-by: Guan-Chun Wu <409411716@gms.tku.edu.tw> > > > --- > > > lib/base64.c | 17 ++++++++++++++++- > > > 1 file changed, 16 insertions(+), 1 deletion(-) > > > > > > diff --git a/lib/base64.c b/lib/base64.c > > > index b736a7a43..9416bded2 100644 > > > --- a/lib/base64.c > > > +++ b/lib/base64.c > > > @@ -18,6 +18,21 @@ > > > static const char base64_table[65] = > > > "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"; > > > > > > +static inline const char *find_chr(const char *base64_table, char ch) > > > +{ > > > + if ('A' <= ch && ch <= 'Z') > > > + return base64_table + ch - 'A'; > > > + if ('a' <= ch && ch <= 'z') > > > + return base64_table + 26 + ch - 'a'; > > > + if ('0' <= ch && ch <= '9') > > > + return base64_table + 26 * 2 + ch - '0'; > > > + if (ch == base64_table[26 * 2 + 10]) > > > + return base64_table + 26 * 2 + 10; > > > + if (ch == base64_table[26 * 2 + 10 + 1]) > > > + return base64_table + 26 * 2 + 10 + 1; > > > + return NULL; > > > +} > > > + > > > /** > > > * base64_encode() - base64-encode some binary data > > > * @src: the binary data to encode > > > @@ -78,7 +93,7 @@ int base64_decode(const char *src, int srclen, u8 *dst) > > > u8 *bp = dst; > > > > > > for (i = 0; i < srclen; i++) { > > > - const char *p = strchr(base64_table, src[i]); > > > + const char *p = find_chr(base64_table, src[i]); > > > > > > if (src[i] == '=') { > > > ac = (ac << 6); > > > > But this makes the contents of base64_table no longer be used, except > > for entries 62 and 63. So this patch doesn't make sense. Either we > > should actually use base64_table, or we should remove base64_table and > > do the mapping entirely in code. > > > For base64_decode(), you're right. After this patch it only uses the last > two entries of base64_table. However, base64_encode() still makes use of > the entire table. > > I'm a bit unsure why it would be unacceptable if only one of the two > functions relies on the full base64 table. I think the point is that that first 62 entries are fixed, only the last two change. Passing the full table might make someone think they can an arbitrary table. Clearly this isn't true. Having the full table is convenient for the encoder (although the memory lookups may be slower than other algorithms). This might be ok if you could guarantee the compiler would use conditional moves: if (val > 26) val += 'a' - 'Z' - 1; if (val > 52) val += '0' - 'z' - 1; if (val > 62) val = val == 62 ? ch_62 : ch_63; else val += 'A'; This test code seems to do the decode. The only conditional is in the path for 62 and 53 (and errors). int main(int argc, char **argv) { char *cp = argv[1]; unsigned char ch; unsigned int bank, val; unsigned int ch_62 = '+', ch_63 = '/'; if (!cp) return 1; while ((ch = *cp++)) { bank = (ch >> 5) * 4; val = ch & 0x1f; // Need to subtract 1 or 16, variants using 'conditional move' // are probably better - but not all cpu have sane ones. val -= ((0xf << 4) >> bank) + 1; if (val >= (((10u << 4 | 26u << 8 | 26u << 12) >> bank) & 0x1e)) { if (ch == ch_62) val = 62; else if (ch == ch_63) val = 63; else val = 255; } else { val += ((52u << 8 | 0u << 16 | 26u << 24) >> (bank * 2)) & 63; } printf("%c => %u\n", ch, val); } return 0; } David > > Regards, > Kuan-Wei > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/5] lib/base64: Replace strchr() for better performance 2025-09-11 7:32 ` [PATCH v2 1/5] lib/base64: Replace strchr() for better performance Guan-Chun Wu 2025-09-11 15:50 ` Caleb Sander Mateos 2025-09-11 18:14 ` Eric Biggers @ 2025-09-12 22:54 ` David Laight 2 siblings, 0 replies; 28+ messages in thread From: David Laight @ 2025-09-12 22:54 UTC (permalink / raw) To: Guan-Chun Wu Cc: akpm, axboe, ceph-devel, ebiggers, hch, home7438072, idryomov, jaegeuk, kbusch, linux-fscrypt, linux-kernel, linux-nvme, sagi, tytso, visitorckw, xiubli On Thu, 11 Sep 2025 15:32:04 +0800 Guan-Chun Wu <409411716@gms.tku.edu.tw> wrote: > From: Kuan-Wei Chiu <visitorckw@gmail.com> > > The base64 decoder previously relied on strchr() to locate each > character in the base64 table. In the worst case, this requires > scanning all 64 entries, and even with bitwise tricks or word-sized > comparisons, still needs up to 8 checks. > > Introduce a small helper function that maps input characters directly > to their position in the base64 table. This reduces the maximum number > of comparisons to 5, improving decoding efficiency while keeping the > logic straightforward. > > Benchmarks on x86_64 (Intel Core i7-10700 @ 2.90GHz, averaged > over 1000 runs, tested with KUnit): > > Decode: > - 64B input: avg ~1530ns -> ~126ns (~12x faster) > - 1KB input: avg ~27726ns -> ~2003ns (~14x faster) > > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> > Co-developed-by: Guan-Chun Wu <409411716@gms.tku.edu.tw> > Signed-off-by: Guan-Chun Wu <409411716@gms.tku.edu.tw> > --- > lib/base64.c | 17 ++++++++++++++++- > 1 file changed, 16 insertions(+), 1 deletion(-) > > diff --git a/lib/base64.c b/lib/base64.c > index b736a7a43..9416bded2 100644 > --- a/lib/base64.c > +++ b/lib/base64.c > @@ -18,6 +18,21 @@ > static const char base64_table[65] = > "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"; > > +static inline const char *find_chr(const char *base64_table, char ch) > +{ > + if ('A' <= ch && ch <= 'Z') > + return base64_table + ch - 'A'; > + if ('a' <= ch && ch <= 'z') > + return base64_table + 26 + ch - 'a'; > + if ('0' <= ch && ch <= '9') > + return base64_table + 26 * 2 + ch - '0'; > + if (ch == base64_table[26 * 2 + 10]) > + return base64_table + 26 * 2 + 10; > + if (ch == base64_table[26 * 2 + 10 + 1]) > + return base64_table + 26 * 2 + 10 + 1; > + return NULL; > +} That's still going to be really horrible with random data. You'll get a lot of mispredicted branch penalties. I think they are about 20 clocks each on my Zen-5. A 256 byte lookup table might be better. However if you assume ascii then 'ch' can be split 3:5 bits and the top three used to determine the valid values for the low bits (probably using shifts of constants rather than actual arrays). So apart from the outlying '+' and '/' (and IIRC there is a variant that uses different characters) which can be picked up in the error path; it ought to be possible to code with no conditionals at all. To late at night to write (and test) an implementation. David > + > /** > * base64_encode() - base64-encode some binary data > * @src: the binary data to encode > @@ -78,7 +93,7 @@ int base64_decode(const char *src, int srclen, u8 *dst) > u8 *bp = dst; > > for (i = 0; i < srclen; i++) { > - const char *p = strchr(base64_table, src[i]); > + const char *p = find_chr(base64_table, src[i]); > > if (src[i] == '=') { > ac = (ac << 6); ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v2 2/5] lib/base64: rework encoder/decoder with customizable support and update nvme-auth 2025-09-11 7:29 [PATCH v2 0/5] lib/base64: add generic encoder/decoder, migrate users Guan-Chun Wu 2025-09-11 7:32 ` [PATCH v2 1/5] lib/base64: Replace strchr() for better performance Guan-Chun Wu @ 2025-09-11 7:41 ` Guan-Chun Wu 2025-09-11 15:59 ` Caleb Sander Mateos 2025-09-11 18:27 ` Eric Biggers 2025-09-11 7:45 ` [PATCH v2 3/5] lib: add KUnit tests for base64 encoding/decoding Guan-Chun Wu ` (2 subsequent siblings) 4 siblings, 2 replies; 28+ messages in thread From: Guan-Chun Wu @ 2025-09-11 7:41 UTC (permalink / raw) To: 409411716 Cc: akpm, axboe, ceph-devel, ebiggers, hch, home7438072, idryomov, jaegeuk, kbusch, linux-fscrypt, linux-kernel, linux-nvme, sagi, tytso, visitorckw, xiubli Rework base64_encode() and base64_decode() with extended interfaces that support custom 64-character tables and optional '=' padding. This makes them flexible enough to cover both standard RFC4648 Base64 and non-standard variants such as base64url. The encoder is redesigned to process input in 3-byte blocks, each mapped directly into 4 output symbols. Base64 naturally encodes 24 bits of input as four 6-bit values, so operating on aligned 3-byte chunks matches the algorithm's structure. This block-based approach eliminates the need for bit-by-bit streaming, reduces shifts, masks, and loop iterations, and removes data-dependent branches from the main loop. Only the final 1 or 2 leftover bytes are handled separately according to the standard rules. As a result, the encoder achieves ~2.8x speedup for small inputs (64B) and up to ~2.6x speedup for larger inputs (1KB), while remaining fully RFC4648-compliant. The decoder replaces strchr()-based lookups with direct table-indexed mapping. It processes input in 4-character groups and supports both padded and non-padded forms. Validation has been strengthened: illegal characters and misplaced '=' padding now cause errors, preventing silent data corruption. These changes improve decoding performance by ~12-15x. Benchmarks on x86_64 (Intel Core i7-10700 @ 2.90GHz, averaged over 1000 runs, tested with KUnit): Encode: - 64B input: avg ~90ns -> ~32ns (~2.8x faster) - 1KB input: avg ~1332ns -> ~510ns (~2.6x faster) Decode: - 64B input: avg ~1530ns -> ~122ns (~12.5x faster) - 1KB input: avg ~27726ns -> ~1859ns (~15x faster) Update nvme-auth to use the reworked base64_encode() and base64_decode() interfaces, which now require explicit padding and table parameters. A static base64_table is defined to preserve RFC4648 standard encoding with padding enabled, ensuring functional behavior remains unchanged. While this is a mechanical update following the lib/base64 rework, nvme-auth also benefits from the performance improvements in the new encoder/decoder, achieving faster encode/decode without altering the output format. The reworked encoder and decoder unify Base64 handling across the kernel with higher performance, stricter correctness, and flexibility to support subsystem-specific variants. Co-developed-by: Kuan-Wei Chiu <visitorckw@gmail.com> Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Co-developed-by: Yu-Sheng Huang <home7438072@gmail.com> Signed-off-by: Yu-Sheng Huang <home7438072@gmail.com> Signed-off-by: Guan-Chun Wu <409411716@gms.tku.edu.tw> --- drivers/nvme/common/auth.c | 7 +- include/linux/base64.h | 4 +- lib/base64.c | 238 ++++++++++++++++++++++++++++--------- 3 files changed, 192 insertions(+), 57 deletions(-) diff --git a/drivers/nvme/common/auth.c b/drivers/nvme/common/auth.c index 91e273b89..4d57694f8 100644 --- a/drivers/nvme/common/auth.c +++ b/drivers/nvme/common/auth.c @@ -161,6 +161,9 @@ u32 nvme_auth_key_struct_size(u32 key_len) } EXPORT_SYMBOL_GPL(nvme_auth_key_struct_size); +static const char base64_table[65] = + "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"; + struct nvme_dhchap_key *nvme_auth_extract_key(unsigned char *secret, u8 key_hash) { @@ -178,7 +181,7 @@ struct nvme_dhchap_key *nvme_auth_extract_key(unsigned char *secret, if (!key) return ERR_PTR(-ENOMEM); - key_len = base64_decode(secret, allocated_len, key->key); + key_len = base64_decode(secret, allocated_len, key->key, true, base64_table); if (key_len < 0) { pr_debug("base64 key decoding error %d\n", key_len); @@ -663,7 +666,7 @@ int nvme_auth_generate_digest(u8 hmac_id, u8 *psk, size_t psk_len, if (ret) goto out_free_digest; - ret = base64_encode(digest, digest_len, enc); + ret = base64_encode(digest, digest_len, enc, true, base64_table); if (ret < hmac_len) { ret = -ENOKEY; goto out_free_digest; diff --git a/include/linux/base64.h b/include/linux/base64.h index 660d4cb1e..22351323d 100644 --- a/include/linux/base64.h +++ b/include/linux/base64.h @@ -10,7 +10,7 @@ #define BASE64_CHARS(nbytes) DIV_ROUND_UP((nbytes) * 4, 3) -int base64_encode(const u8 *src, int len, char *dst); -int base64_decode(const char *src, int len, u8 *dst); +int base64_encode(const u8 *src, int len, char *dst, bool padding, const char *table); +int base64_decode(const char *src, int len, u8 *dst, bool padding, const char *table); #endif /* _LINUX_BASE64_H */ diff --git a/lib/base64.c b/lib/base64.c index 9416bded2..b2bd5dab5 100644 --- a/lib/base64.c +++ b/lib/base64.c @@ -15,104 +15,236 @@ #include <linux/string.h> #include <linux/base64.h> -static const char base64_table[65] = - "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"; +#define BASE64_6BIT_MASK 0x3f /* Mask to extract lowest 6 bits */ +#define BASE64_BITS_PER_BYTE 8 +#define BASE64_CHUNK_BITS 6 + +/* Output-char-indexed shifts: for output chars 0,1,2,3 respectively */ +#define BASE64_SHIFT_OUT0 (BASE64_CHUNK_BITS * 3) /* 18 */ +#define BASE64_SHIFT_OUT1 (BASE64_CHUNK_BITS * 2) /* 12 */ +#define BASE64_SHIFT_OUT2 (BASE64_CHUNK_BITS * 1) /* 6 */ +/* OUT3 uses 0 shift and just masks with BASE64_6BIT_MASK */ + +/* For extracting bytes from the 24-bit value (decode main loop) */ +#define BASE64_SHIFT_BYTE0 (BASE64_BITS_PER_BYTE * 2) /* 16 */ +#define BASE64_SHIFT_BYTE1 (BASE64_BITS_PER_BYTE * 1) /* 8 */ + +/* Tail (no padding) shifts to extract bytes */ +#define BASE64_TAIL2_BYTE0_SHIFT ((BASE64_CHUNK_BITS * 2) - BASE64_BITS_PER_BYTE) /* 4 */ +#define BASE64_TAIL3_BYTE0_SHIFT ((BASE64_CHUNK_BITS * 3) - BASE64_BITS_PER_BYTE) /* 10 */ +#define BASE64_TAIL3_BYTE1_SHIFT ((BASE64_CHUNK_BITS * 3) - (BASE64_BITS_PER_BYTE * 2)) /* 2 */ + +/* Extra: masks for leftover validation (no padding) */ +#define BASE64_MASK(n) ({ \ + unsigned int __n = (n); \ + __n ? ((1U << __n) - 1U) : 0U; \ +}) +#define BASE64_TAIL2_UNUSED_BITS (BASE64_CHUNK_BITS * 2 - BASE64_BITS_PER_BYTE) /* 4 */ +#define BASE64_TAIL3_UNUSED_BITS (BASE64_CHUNK_BITS * 3 - BASE64_BITS_PER_BYTE * 2) /* 2 */ static inline const char *find_chr(const char *base64_table, char ch) { if ('A' <= ch && ch <= 'Z') - return base64_table + ch - 'A'; + return base64_table + (ch - 'A'); if ('a' <= ch && ch <= 'z') - return base64_table + 26 + ch - 'a'; + return base64_table + 26 + (ch - 'a'); if ('0' <= ch && ch <= '9') - return base64_table + 26 * 2 + ch - '0'; - if (ch == base64_table[26 * 2 + 10]) - return base64_table + 26 * 2 + 10; - if (ch == base64_table[26 * 2 + 10 + 1]) - return base64_table + 26 * 2 + 10 + 1; + return base64_table + 52 + (ch - '0'); + if (ch == base64_table[62]) + return &base64_table[62]; + if (ch == base64_table[63]) + return &base64_table[63]; return NULL; } /** - * base64_encode() - base64-encode some binary data + * base64_encode() - base64-encode with custom table and optional padding * @src: the binary data to encode * @srclen: the length of @src in bytes - * @dst: (output) the base64-encoded string. Not NUL-terminated. + * @dst: (output) the base64-encoded string. Not NUL-terminated. + * @padding: whether to append '=' characters so output length is a multiple of 4 + * @table: 64-character encoding table to use (e.g. standard or URL-safe variant) * - * Encodes data using base64 encoding, i.e. the "Base 64 Encoding" specified - * by RFC 4648, including the '='-padding. + * Encodes data using the given 64-character @table. If @padding is true, + * the output is padded with '=' as described in RFC 4648; otherwise padding + * is omitted. This allows generation of both standard and non-standard + * Base64 variants (e.g. URL-safe encoding). * * Return: the length of the resulting base64-encoded string in bytes. */ -int base64_encode(const u8 *src, int srclen, char *dst) +int base64_encode(const u8 *src, int srclen, char *dst, bool padding, const char *table) { u32 ac = 0; - int bits = 0; - int i; char *cp = dst; - for (i = 0; i < srclen; i++) { - ac = (ac << 8) | src[i]; - bits += 8; - do { - bits -= 6; - *cp++ = base64_table[(ac >> bits) & 0x3f]; - } while (bits >= 6); - } - if (bits) { - *cp++ = base64_table[(ac << (6 - bits)) & 0x3f]; - bits -= 6; + while (srclen >= 3) { + ac = ((u32)src[0] << (BASE64_BITS_PER_BYTE * 2)) | + ((u32)src[1] << (BASE64_BITS_PER_BYTE)) | + (u32)src[2]; + + *cp++ = table[ac >> BASE64_SHIFT_OUT0]; + *cp++ = table[(ac >> BASE64_SHIFT_OUT1) & BASE64_6BIT_MASK]; + *cp++ = table[(ac >> BASE64_SHIFT_OUT2) & BASE64_6BIT_MASK]; + *cp++ = table[ac & BASE64_6BIT_MASK]; + + src += 3; + srclen -= 3; } - while (bits < 0) { - *cp++ = '='; - bits += 2; + + switch (srclen) { + case 2: + ac = ((u32)src[0] << (BASE64_BITS_PER_BYTE * 2)) | + ((u32)src[1] << (BASE64_BITS_PER_BYTE)); + + *cp++ = table[ac >> BASE64_SHIFT_OUT0]; + *cp++ = table[(ac >> BASE64_SHIFT_OUT1) & BASE64_6BIT_MASK]; + *cp++ = table[(ac >> BASE64_SHIFT_OUT2) & BASE64_6BIT_MASK]; + if (padding) + *cp++ = '='; + break; + case 1: + ac = ((u32)src[0] << (BASE64_BITS_PER_BYTE * 2)); + *cp++ = table[ac >> BASE64_SHIFT_OUT0]; + *cp++ = table[(ac >> BASE64_SHIFT_OUT1) & BASE64_6BIT_MASK]; + if (padding) { + *cp++ = '='; + *cp++ = '='; + } + break; } return cp - dst; } EXPORT_SYMBOL_GPL(base64_encode); /** - * base64_decode() - base64-decode a string + * base64_decode() - base64-decode with custom table and optional padding * @src: the string to decode. Doesn't need to be NUL-terminated. * @srclen: the length of @src in bytes * @dst: (output) the decoded binary data + * @padding: when true, accept and handle '=' padding as per RFC 4648; + * when false, '=' is treated as invalid + * @table: 64-character encoding table to use (e.g. standard or URL-safe variant) * - * Decodes a string using base64 encoding, i.e. the "Base 64 Encoding" - * specified by RFC 4648, including the '='-padding. + * Decodes a string using the given 64-character @table. If @padding is true, + * '=' padding is accepted as described in RFC 4648; otherwise '=' is + * treated as an error. This allows decoding of both standard and + * non-standard Base64 variants (e.g. URL-safe decoding). * * This implementation hasn't been optimized for performance. * * Return: the length of the resulting decoded binary data in bytes, * or -1 if the string isn't a valid base64 string. */ -int base64_decode(const char *src, int srclen, u8 *dst) +static inline int base64_decode_table(char ch, const char *table) +{ + if (ch == '\0') + return -1; + const char *p = find_chr(table, ch); + + return p ? (p - table) : -1; +} + +static inline int decode_base64_block(const char *src, const char *table, + int *input1, int *input2, + int *input3, int *input4, + bool padding) +{ + *input1 = base64_decode_table(src[0], table); + *input2 = base64_decode_table(src[1], table); + *input3 = base64_decode_table(src[2], table); + *input4 = base64_decode_table(src[3], table); + + /* Return error if any base64 character is invalid */ + if (*input1 < 0 || *input2 < 0 || (!padding && (*input3 < 0 || *input4 < 0))) + return -1; + + /* Handle padding */ + if (padding) { + if (*input3 < 0 && *input4 >= 0) + return -1; + if (*input3 < 0 && src[2] != '=') + return -1; + if (*input4 < 0 && src[3] != '=') + return -1; + } + return 0; +} + +int base64_decode(const char *src, int srclen, u8 *dst, bool padding, const char *table) { - u32 ac = 0; - int bits = 0; - int i; u8 *bp = dst; + int input1, input2, input3, input4; + u32 val; - for (i = 0; i < srclen; i++) { - const char *p = find_chr(base64_table, src[i]); + if (srclen == 0) + return 0; - if (src[i] == '=') { - ac = (ac << 6); - bits += 6; - if (bits >= 8) - bits -= 8; - continue; + /* Validate the input length for padding */ + if (padding && (srclen & 0x03) != 0) + return -1; + + while (srclen >= 4) { + /* Decode the next 4 characters */ + if (decode_base64_block(src, table, &input1, &input2, &input3, + &input4, padding) < 0) + return -1; + if (padding && srclen > 4) { + if (input3 < 0 || input4 < 0) + return -1; } - if (p == NULL || src[i] == 0) + val = ((u32)input1 << BASE64_SHIFT_OUT0) | + ((u32)input2 << BASE64_SHIFT_OUT1) | + ((u32)((input3 < 0) ? 0 : input3) << BASE64_SHIFT_OUT2) | + (u32)((input4 < 0) ? 0 : input4); + + *bp++ = (u8)(val >> BASE64_SHIFT_BYTE0); + + if (input3 >= 0) + *bp++ = (u8)(val >> BASE64_SHIFT_BYTE1); + if (input4 >= 0) + *bp++ = (u8)val; + + src += 4; + srclen -= 4; + } + + /* Handle leftover characters when padding is not used */ + if (!padding && srclen > 0) { + switch (srclen) { + case 2: + input1 = base64_decode_table(src[0], table); + input2 = base64_decode_table(src[1], table); + if (input1 < 0 || input2 < 0) + return -1; + + val = ((u32)input1 << BASE64_CHUNK_BITS) | (u32)input2; /* 12 bits */ + if (val & BASE64_MASK(BASE64_TAIL2_UNUSED_BITS)) + return -1; /* low 4 bits must be zero */ + + *bp++ = (u8)(val >> BASE64_TAIL2_BYTE0_SHIFT); + break; + case 3: + input1 = base64_decode_table(src[0], table); + input2 = base64_decode_table(src[1], table); + input3 = base64_decode_table(src[2], table); + if (input1 < 0 || input2 < 0 || input3 < 0) + return -1; + + val = ((u32)input1 << (BASE64_CHUNK_BITS * 2)) | + ((u32)input2 << BASE64_CHUNK_BITS) | + (u32)input3; /* 18 bits */ + + if (val & BASE64_MASK(BASE64_TAIL3_UNUSED_BITS)) + return -1; /* low 2 bits must be zero */ + + *bp++ = (u8)(val >> BASE64_TAIL3_BYTE0_SHIFT); + *bp++ = (u8)((val >> BASE64_TAIL3_BYTE1_SHIFT) & 0xFF); + break; + default: return -1; - ac = (ac << 6) | (p - base64_table); - bits += 6; - if (bits >= 8) { - bits -= 8; - *bp++ = (u8)(ac >> bits); } } - if (ac & ((1 << bits) - 1)) - return -1; + return bp - dst; } EXPORT_SYMBOL_GPL(base64_decode); -- 2.34.1 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH v2 2/5] lib/base64: rework encoder/decoder with customizable support and update nvme-auth 2025-09-11 7:41 ` [PATCH v2 2/5] lib/base64: rework encoder/decoder with customizable support and update nvme-auth Guan-Chun Wu @ 2025-09-11 15:59 ` Caleb Sander Mateos 2025-09-12 7:21 ` Guan-Chun Wu 2025-09-11 18:27 ` Eric Biggers 1 sibling, 1 reply; 28+ messages in thread From: Caleb Sander Mateos @ 2025-09-11 15:59 UTC (permalink / raw) To: Guan-Chun Wu Cc: akpm, axboe, ceph-devel, ebiggers, hch, home7438072, idryomov, jaegeuk, kbusch, linux-fscrypt, linux-kernel, linux-nvme, sagi, tytso, visitorckw, xiubli On Thu, Sep 11, 2025 at 12:43 AM Guan-Chun Wu <409411716@gms.tku.edu.tw> wrote: > > Rework base64_encode() and base64_decode() with extended interfaces > that support custom 64-character tables and optional '=' padding. > This makes them flexible enough to cover both standard RFC4648 Base64 > and non-standard variants such as base64url. > > The encoder is redesigned to process input in 3-byte blocks, each > mapped directly into 4 output symbols. Base64 naturally encodes > 24 bits of input as four 6-bit values, so operating on aligned > 3-byte chunks matches the algorithm's structure. This block-based > approach eliminates the need for bit-by-bit streaming, reduces shifts, > masks, and loop iterations, and removes data-dependent branches from > the main loop. Only the final 1 or 2 leftover bytes are handled > separately according to the standard rules. As a result, the encoder > achieves ~2.8x speedup for small inputs (64B) and up to ~2.6x > speedup for larger inputs (1KB), while remaining fully RFC4648-compliant. > > The decoder replaces strchr()-based lookups with direct table-indexed > mapping. It processes input in 4-character groups and supports both > padded and non-padded forms. Validation has been strengthened: illegal > characters and misplaced '=' padding now cause errors, preventing > silent data corruption. > > These changes improve decoding performance by ~12-15x. > > Benchmarks on x86_64 (Intel Core i7-10700 @ 2.90GHz, averaged > over 1000 runs, tested with KUnit): > > Encode: > - 64B input: avg ~90ns -> ~32ns (~2.8x faster) > - 1KB input: avg ~1332ns -> ~510ns (~2.6x faster) > > Decode: > - 64B input: avg ~1530ns -> ~122ns (~12.5x faster) > - 1KB input: avg ~27726ns -> ~1859ns (~15x faster) > > Update nvme-auth to use the reworked base64_encode() and base64_decode() > interfaces, which now require explicit padding and table parameters. > A static base64_table is defined to preserve RFC4648 standard encoding > with padding enabled, ensuring functional behavior remains unchanged. > > While this is a mechanical update following the lib/base64 rework, > nvme-auth also benefits from the performance improvements in the new > encoder/decoder, achieving faster encode/decode without altering the > output format. > > The reworked encoder and decoder unify Base64 handling across the kernel > with higher performance, stricter correctness, and flexibility to support > subsystem-specific variants. > > Co-developed-by: Kuan-Wei Chiu <visitorckw@gmail.com> > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> > Co-developed-by: Yu-Sheng Huang <home7438072@gmail.com> > Signed-off-by: Yu-Sheng Huang <home7438072@gmail.com> > Signed-off-by: Guan-Chun Wu <409411716@gms.tku.edu.tw> > --- > drivers/nvme/common/auth.c | 7 +- > include/linux/base64.h | 4 +- > lib/base64.c | 238 ++++++++++++++++++++++++++++--------- > 3 files changed, 192 insertions(+), 57 deletions(-) > > diff --git a/drivers/nvme/common/auth.c b/drivers/nvme/common/auth.c > index 91e273b89..4d57694f8 100644 > --- a/drivers/nvme/common/auth.c > +++ b/drivers/nvme/common/auth.c > @@ -161,6 +161,9 @@ u32 nvme_auth_key_struct_size(u32 key_len) > } > EXPORT_SYMBOL_GPL(nvme_auth_key_struct_size); > > +static const char base64_table[65] = > + "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"; > + > struct nvme_dhchap_key *nvme_auth_extract_key(unsigned char *secret, > u8 key_hash) > { > @@ -178,7 +181,7 @@ struct nvme_dhchap_key *nvme_auth_extract_key(unsigned char *secret, > if (!key) > return ERR_PTR(-ENOMEM); > > - key_len = base64_decode(secret, allocated_len, key->key); > + key_len = base64_decode(secret, allocated_len, key->key, true, base64_table); > if (key_len < 0) { > pr_debug("base64 key decoding error %d\n", > key_len); > @@ -663,7 +666,7 @@ int nvme_auth_generate_digest(u8 hmac_id, u8 *psk, size_t psk_len, > if (ret) > goto out_free_digest; > > - ret = base64_encode(digest, digest_len, enc); > + ret = base64_encode(digest, digest_len, enc, true, base64_table); > if (ret < hmac_len) { > ret = -ENOKEY; > goto out_free_digest; > diff --git a/include/linux/base64.h b/include/linux/base64.h > index 660d4cb1e..22351323d 100644 > --- a/include/linux/base64.h > +++ b/include/linux/base64.h > @@ -10,7 +10,7 @@ > > #define BASE64_CHARS(nbytes) DIV_ROUND_UP((nbytes) * 4, 3) > > -int base64_encode(const u8 *src, int len, char *dst); > -int base64_decode(const char *src, int len, u8 *dst); > +int base64_encode(const u8 *src, int len, char *dst, bool padding, const char *table); > +int base64_decode(const char *src, int len, u8 *dst, bool padding, const char *table); > > #endif /* _LINUX_BASE64_H */ > diff --git a/lib/base64.c b/lib/base64.c > index 9416bded2..b2bd5dab5 100644 > --- a/lib/base64.c > +++ b/lib/base64.c > @@ -15,104 +15,236 @@ > #include <linux/string.h> > #include <linux/base64.h> > > -static const char base64_table[65] = > - "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"; > +#define BASE64_6BIT_MASK 0x3f /* Mask to extract lowest 6 bits */ > +#define BASE64_BITS_PER_BYTE 8 > +#define BASE64_CHUNK_BITS 6 > + > +/* Output-char-indexed shifts: for output chars 0,1,2,3 respectively */ > +#define BASE64_SHIFT_OUT0 (BASE64_CHUNK_BITS * 3) /* 18 */ > +#define BASE64_SHIFT_OUT1 (BASE64_CHUNK_BITS * 2) /* 12 */ > +#define BASE64_SHIFT_OUT2 (BASE64_CHUNK_BITS * 1) /* 6 */ > +/* OUT3 uses 0 shift and just masks with BASE64_6BIT_MASK */ > + > +/* For extracting bytes from the 24-bit value (decode main loop) */ > +#define BASE64_SHIFT_BYTE0 (BASE64_BITS_PER_BYTE * 2) /* 16 */ > +#define BASE64_SHIFT_BYTE1 (BASE64_BITS_PER_BYTE * 1) /* 8 */ > + > +/* Tail (no padding) shifts to extract bytes */ > +#define BASE64_TAIL2_BYTE0_SHIFT ((BASE64_CHUNK_BITS * 2) - BASE64_BITS_PER_BYTE) /* 4 */ > +#define BASE64_TAIL3_BYTE0_SHIFT ((BASE64_CHUNK_BITS * 3) - BASE64_BITS_PER_BYTE) /* 10 */ > +#define BASE64_TAIL3_BYTE1_SHIFT ((BASE64_CHUNK_BITS * 3) - (BASE64_BITS_PER_BYTE * 2)) /* 2 */ > + > +/* Extra: masks for leftover validation (no padding) */ > +#define BASE64_MASK(n) ({ \ > + unsigned int __n = (n); \ > + __n ? ((1U << __n) - 1U) : 0U; \ > +}) > +#define BASE64_TAIL2_UNUSED_BITS (BASE64_CHUNK_BITS * 2 - BASE64_BITS_PER_BYTE) /* 4 */ > +#define BASE64_TAIL3_UNUSED_BITS (BASE64_CHUNK_BITS * 3 - BASE64_BITS_PER_BYTE * 2) /* 2 */ > > static inline const char *find_chr(const char *base64_table, char ch) > { > if ('A' <= ch && ch <= 'Z') > - return base64_table + ch - 'A'; > + return base64_table + (ch - 'A'); > if ('a' <= ch && ch <= 'z') > - return base64_table + 26 + ch - 'a'; > + return base64_table + 26 + (ch - 'a'); > if ('0' <= ch && ch <= '9') > - return base64_table + 26 * 2 + ch - '0'; > - if (ch == base64_table[26 * 2 + 10]) > - return base64_table + 26 * 2 + 10; > - if (ch == base64_table[26 * 2 + 10 + 1]) > - return base64_table + 26 * 2 + 10 + 1; > + return base64_table + 52 + (ch - '0'); > + if (ch == base64_table[62]) > + return &base64_table[62]; > + if (ch == base64_table[63]) > + return &base64_table[63]; All the changes in this function look cosmetic. Could you fold them into the patch that introduced the function to avoid touching the lines multiple times? Best, Caleb > return NULL; > } > > /** > - * base64_encode() - base64-encode some binary data > + * base64_encode() - base64-encode with custom table and optional padding > * @src: the binary data to encode > * @srclen: the length of @src in bytes > - * @dst: (output) the base64-encoded string. Not NUL-terminated. > + * @dst: (output) the base64-encoded string. Not NUL-terminated. > + * @padding: whether to append '=' characters so output length is a multiple of 4 > + * @table: 64-character encoding table to use (e.g. standard or URL-safe variant) > * > - * Encodes data using base64 encoding, i.e. the "Base 64 Encoding" specified > - * by RFC 4648, including the '='-padding. > + * Encodes data using the given 64-character @table. If @padding is true, > + * the output is padded with '=' as described in RFC 4648; otherwise padding > + * is omitted. This allows generation of both standard and non-standard > + * Base64 variants (e.g. URL-safe encoding). > * > * Return: the length of the resulting base64-encoded string in bytes. > */ > -int base64_encode(const u8 *src, int srclen, char *dst) > +int base64_encode(const u8 *src, int srclen, char *dst, bool padding, const char *table) > { > u32 ac = 0; > - int bits = 0; > - int i; > char *cp = dst; > > - for (i = 0; i < srclen; i++) { > - ac = (ac << 8) | src[i]; > - bits += 8; > - do { > - bits -= 6; > - *cp++ = base64_table[(ac >> bits) & 0x3f]; > - } while (bits >= 6); > - } > - if (bits) { > - *cp++ = base64_table[(ac << (6 - bits)) & 0x3f]; > - bits -= 6; > + while (srclen >= 3) { > + ac = ((u32)src[0] << (BASE64_BITS_PER_BYTE * 2)) | > + ((u32)src[1] << (BASE64_BITS_PER_BYTE)) | > + (u32)src[2]; > + > + *cp++ = table[ac >> BASE64_SHIFT_OUT0]; > + *cp++ = table[(ac >> BASE64_SHIFT_OUT1) & BASE64_6BIT_MASK]; > + *cp++ = table[(ac >> BASE64_SHIFT_OUT2) & BASE64_6BIT_MASK]; > + *cp++ = table[ac & BASE64_6BIT_MASK]; > + > + src += 3; > + srclen -= 3; > } > - while (bits < 0) { > - *cp++ = '='; > - bits += 2; > + > + switch (srclen) { > + case 2: > + ac = ((u32)src[0] << (BASE64_BITS_PER_BYTE * 2)) | > + ((u32)src[1] << (BASE64_BITS_PER_BYTE)); > + > + *cp++ = table[ac >> BASE64_SHIFT_OUT0]; > + *cp++ = table[(ac >> BASE64_SHIFT_OUT1) & BASE64_6BIT_MASK]; > + *cp++ = table[(ac >> BASE64_SHIFT_OUT2) & BASE64_6BIT_MASK]; > + if (padding) > + *cp++ = '='; > + break; > + case 1: > + ac = ((u32)src[0] << (BASE64_BITS_PER_BYTE * 2)); > + *cp++ = table[ac >> BASE64_SHIFT_OUT0]; > + *cp++ = table[(ac >> BASE64_SHIFT_OUT1) & BASE64_6BIT_MASK]; > + if (padding) { > + *cp++ = '='; > + *cp++ = '='; > + } > + break; > } > return cp - dst; > } > EXPORT_SYMBOL_GPL(base64_encode); > > /** > - * base64_decode() - base64-decode a string > + * base64_decode() - base64-decode with custom table and optional padding > * @src: the string to decode. Doesn't need to be NUL-terminated. > * @srclen: the length of @src in bytes > * @dst: (output) the decoded binary data > + * @padding: when true, accept and handle '=' padding as per RFC 4648; > + * when false, '=' is treated as invalid > + * @table: 64-character encoding table to use (e.g. standard or URL-safe variant) > * > - * Decodes a string using base64 encoding, i.e. the "Base 64 Encoding" > - * specified by RFC 4648, including the '='-padding. > + * Decodes a string using the given 64-character @table. If @padding is true, > + * '=' padding is accepted as described in RFC 4648; otherwise '=' is > + * treated as an error. This allows decoding of both standard and > + * non-standard Base64 variants (e.g. URL-safe decoding). > * > * This implementation hasn't been optimized for performance. > * > * Return: the length of the resulting decoded binary data in bytes, > * or -1 if the string isn't a valid base64 string. > */ > -int base64_decode(const char *src, int srclen, u8 *dst) > +static inline int base64_decode_table(char ch, const char *table) > +{ > + if (ch == '\0') > + return -1; > + const char *p = find_chr(table, ch); > + > + return p ? (p - table) : -1; > +} > + > +static inline int decode_base64_block(const char *src, const char *table, > + int *input1, int *input2, > + int *input3, int *input4, > + bool padding) > +{ > + *input1 = base64_decode_table(src[0], table); > + *input2 = base64_decode_table(src[1], table); > + *input3 = base64_decode_table(src[2], table); > + *input4 = base64_decode_table(src[3], table); > + > + /* Return error if any base64 character is invalid */ > + if (*input1 < 0 || *input2 < 0 || (!padding && (*input3 < 0 || *input4 < 0))) > + return -1; > + > + /* Handle padding */ > + if (padding) { > + if (*input3 < 0 && *input4 >= 0) > + return -1; > + if (*input3 < 0 && src[2] != '=') > + return -1; > + if (*input4 < 0 && src[3] != '=') > + return -1; > + } > + return 0; > +} > + > +int base64_decode(const char *src, int srclen, u8 *dst, bool padding, const char *table) > { > - u32 ac = 0; > - int bits = 0; > - int i; > u8 *bp = dst; > + int input1, input2, input3, input4; > + u32 val; > > - for (i = 0; i < srclen; i++) { > - const char *p = find_chr(base64_table, src[i]); > + if (srclen == 0) > + return 0; > > - if (src[i] == '=') { > - ac = (ac << 6); > - bits += 6; > - if (bits >= 8) > - bits -= 8; > - continue; > + /* Validate the input length for padding */ > + if (padding && (srclen & 0x03) != 0) > + return -1; > + > + while (srclen >= 4) { > + /* Decode the next 4 characters */ > + if (decode_base64_block(src, table, &input1, &input2, &input3, > + &input4, padding) < 0) > + return -1; > + if (padding && srclen > 4) { > + if (input3 < 0 || input4 < 0) > + return -1; > } > - if (p == NULL || src[i] == 0) > + val = ((u32)input1 << BASE64_SHIFT_OUT0) | > + ((u32)input2 << BASE64_SHIFT_OUT1) | > + ((u32)((input3 < 0) ? 0 : input3) << BASE64_SHIFT_OUT2) | > + (u32)((input4 < 0) ? 0 : input4); > + > + *bp++ = (u8)(val >> BASE64_SHIFT_BYTE0); > + > + if (input3 >= 0) > + *bp++ = (u8)(val >> BASE64_SHIFT_BYTE1); > + if (input4 >= 0) > + *bp++ = (u8)val; > + > + src += 4; > + srclen -= 4; > + } > + > + /* Handle leftover characters when padding is not used */ > + if (!padding && srclen > 0) { > + switch (srclen) { > + case 2: > + input1 = base64_decode_table(src[0], table); > + input2 = base64_decode_table(src[1], table); > + if (input1 < 0 || input2 < 0) > + return -1; > + > + val = ((u32)input1 << BASE64_CHUNK_BITS) | (u32)input2; /* 12 bits */ > + if (val & BASE64_MASK(BASE64_TAIL2_UNUSED_BITS)) > + return -1; /* low 4 bits must be zero */ > + > + *bp++ = (u8)(val >> BASE64_TAIL2_BYTE0_SHIFT); > + break; > + case 3: > + input1 = base64_decode_table(src[0], table); > + input2 = base64_decode_table(src[1], table); > + input3 = base64_decode_table(src[2], table); > + if (input1 < 0 || input2 < 0 || input3 < 0) > + return -1; > + > + val = ((u32)input1 << (BASE64_CHUNK_BITS * 2)) | > + ((u32)input2 << BASE64_CHUNK_BITS) | > + (u32)input3; /* 18 bits */ > + > + if (val & BASE64_MASK(BASE64_TAIL3_UNUSED_BITS)) > + return -1; /* low 2 bits must be zero */ > + > + *bp++ = (u8)(val >> BASE64_TAIL3_BYTE0_SHIFT); > + *bp++ = (u8)((val >> BASE64_TAIL3_BYTE1_SHIFT) & 0xFF); > + break; > + default: > return -1; > - ac = (ac << 6) | (p - base64_table); > - bits += 6; > - if (bits >= 8) { > - bits -= 8; > - *bp++ = (u8)(ac >> bits); > } > } > - if (ac & ((1 << bits) - 1)) > - return -1; > + > return bp - dst; > } > EXPORT_SYMBOL_GPL(base64_decode); > -- > 2.34.1 > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 2/5] lib/base64: rework encoder/decoder with customizable support and update nvme-auth 2025-09-11 15:59 ` Caleb Sander Mateos @ 2025-09-12 7:21 ` Guan-Chun Wu 0 siblings, 0 replies; 28+ messages in thread From: Guan-Chun Wu @ 2025-09-12 7:21 UTC (permalink / raw) To: Caleb Sander Mateos Cc: akpm, axboe, ceph-devel, ebiggers, hch, home7438072, idryomov, jaegeuk, kbusch, linux-fscrypt, linux-kernel, linux-nvme, sagi, tytso, visitorckw, xiubli Hi Caleb, On Thu, Sep 11, 2025 at 08:59:26AM -0700, Caleb Sander Mateos wrote: > On Thu, Sep 11, 2025 at 12:43 AM Guan-Chun Wu <409411716@gms.tku.edu.tw> wrote: > > > > Rework base64_encode() and base64_decode() with extended interfaces > > that support custom 64-character tables and optional '=' padding. > > This makes them flexible enough to cover both standard RFC4648 Base64 > > and non-standard variants such as base64url. > > > > The encoder is redesigned to process input in 3-byte blocks, each > > mapped directly into 4 output symbols. Base64 naturally encodes > > 24 bits of input as four 6-bit values, so operating on aligned > > 3-byte chunks matches the algorithm's structure. This block-based > > approach eliminates the need for bit-by-bit streaming, reduces shifts, > > masks, and loop iterations, and removes data-dependent branches from > > the main loop. Only the final 1 or 2 leftover bytes are handled > > separately according to the standard rules. As a result, the encoder > > achieves ~2.8x speedup for small inputs (64B) and up to ~2.6x > > speedup for larger inputs (1KB), while remaining fully RFC4648-compliant. > > > > The decoder replaces strchr()-based lookups with direct table-indexed > > mapping. It processes input in 4-character groups and supports both > > padded and non-padded forms. Validation has been strengthened: illegal > > characters and misplaced '=' padding now cause errors, preventing > > silent data corruption. > > > > These changes improve decoding performance by ~12-15x. > > > > Benchmarks on x86_64 (Intel Core i7-10700 @ 2.90GHz, averaged > > over 1000 runs, tested with KUnit): > > > > Encode: > > - 64B input: avg ~90ns -> ~32ns (~2.8x faster) > > - 1KB input: avg ~1332ns -> ~510ns (~2.6x faster) > > > > Decode: > > - 64B input: avg ~1530ns -> ~122ns (~12.5x faster) > > - 1KB input: avg ~27726ns -> ~1859ns (~15x faster) > > > > Update nvme-auth to use the reworked base64_encode() and base64_decode() > > interfaces, which now require explicit padding and table parameters. > > A static base64_table is defined to preserve RFC4648 standard encoding > > with padding enabled, ensuring functional behavior remains unchanged. > > > > While this is a mechanical update following the lib/base64 rework, > > nvme-auth also benefits from the performance improvements in the new > > encoder/decoder, achieving faster encode/decode without altering the > > output format. > > > > The reworked encoder and decoder unify Base64 handling across the kernel > > with higher performance, stricter correctness, and flexibility to support > > subsystem-specific variants. > > > > Co-developed-by: Kuan-Wei Chiu <visitorckw@gmail.com> > > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> > > Co-developed-by: Yu-Sheng Huang <home7438072@gmail.com> > > Signed-off-by: Yu-Sheng Huang <home7438072@gmail.com> > > Signed-off-by: Guan-Chun Wu <409411716@gms.tku.edu.tw> > > --- > > drivers/nvme/common/auth.c | 7 +- > > include/linux/base64.h | 4 +- > > lib/base64.c | 238 ++++++++++++++++++++++++++++--------- > > 3 files changed, 192 insertions(+), 57 deletions(-) > > > > diff --git a/drivers/nvme/common/auth.c b/drivers/nvme/common/auth.c > > index 91e273b89..4d57694f8 100644 > > --- a/drivers/nvme/common/auth.c > > +++ b/drivers/nvme/common/auth.c > > @@ -161,6 +161,9 @@ u32 nvme_auth_key_struct_size(u32 key_len) > > } > > EXPORT_SYMBOL_GPL(nvme_auth_key_struct_size); > > > > +static const char base64_table[65] = > > + "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"; > > + > > struct nvme_dhchap_key *nvme_auth_extract_key(unsigned char *secret, > > u8 key_hash) > > { > > @@ -178,7 +181,7 @@ struct nvme_dhchap_key *nvme_auth_extract_key(unsigned char *secret, > > if (!key) > > return ERR_PTR(-ENOMEM); > > > > - key_len = base64_decode(secret, allocated_len, key->key); > > + key_len = base64_decode(secret, allocated_len, key->key, true, base64_table); > > if (key_len < 0) { > > pr_debug("base64 key decoding error %d\n", > > key_len); > > @@ -663,7 +666,7 @@ int nvme_auth_generate_digest(u8 hmac_id, u8 *psk, size_t psk_len, > > if (ret) > > goto out_free_digest; > > > > - ret = base64_encode(digest, digest_len, enc); > > + ret = base64_encode(digest, digest_len, enc, true, base64_table); > > if (ret < hmac_len) { > > ret = -ENOKEY; > > goto out_free_digest; > > diff --git a/include/linux/base64.h b/include/linux/base64.h > > index 660d4cb1e..22351323d 100644 > > --- a/include/linux/base64.h > > +++ b/include/linux/base64.h > > @@ -10,7 +10,7 @@ > > > > #define BASE64_CHARS(nbytes) DIV_ROUND_UP((nbytes) * 4, 3) > > > > -int base64_encode(const u8 *src, int len, char *dst); > > -int base64_decode(const char *src, int len, u8 *dst); > > +int base64_encode(const u8 *src, int len, char *dst, bool padding, const char *table); > > +int base64_decode(const char *src, int len, u8 *dst, bool padding, const char *table); > > > > #endif /* _LINUX_BASE64_H */ > > diff --git a/lib/base64.c b/lib/base64.c > > index 9416bded2..b2bd5dab5 100644 > > --- a/lib/base64.c > > +++ b/lib/base64.c > > @@ -15,104 +15,236 @@ > > #include <linux/string.h> > > #include <linux/base64.h> > > > > -static const char base64_table[65] = > > - "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"; > > +#define BASE64_6BIT_MASK 0x3f /* Mask to extract lowest 6 bits */ > > +#define BASE64_BITS_PER_BYTE 8 > > +#define BASE64_CHUNK_BITS 6 > > + > > +/* Output-char-indexed shifts: for output chars 0,1,2,3 respectively */ > > +#define BASE64_SHIFT_OUT0 (BASE64_CHUNK_BITS * 3) /* 18 */ > > +#define BASE64_SHIFT_OUT1 (BASE64_CHUNK_BITS * 2) /* 12 */ > > +#define BASE64_SHIFT_OUT2 (BASE64_CHUNK_BITS * 1) /* 6 */ > > +/* OUT3 uses 0 shift and just masks with BASE64_6BIT_MASK */ > > + > > +/* For extracting bytes from the 24-bit value (decode main loop) */ > > +#define BASE64_SHIFT_BYTE0 (BASE64_BITS_PER_BYTE * 2) /* 16 */ > > +#define BASE64_SHIFT_BYTE1 (BASE64_BITS_PER_BYTE * 1) /* 8 */ > > + > > +/* Tail (no padding) shifts to extract bytes */ > > +#define BASE64_TAIL2_BYTE0_SHIFT ((BASE64_CHUNK_BITS * 2) - BASE64_BITS_PER_BYTE) /* 4 */ > > +#define BASE64_TAIL3_BYTE0_SHIFT ((BASE64_CHUNK_BITS * 3) - BASE64_BITS_PER_BYTE) /* 10 */ > > +#define BASE64_TAIL3_BYTE1_SHIFT ((BASE64_CHUNK_BITS * 3) - (BASE64_BITS_PER_BYTE * 2)) /* 2 */ > > + > > +/* Extra: masks for leftover validation (no padding) */ > > +#define BASE64_MASK(n) ({ \ > > + unsigned int __n = (n); \ > > + __n ? ((1U << __n) - 1U) : 0U; \ > > +}) > > +#define BASE64_TAIL2_UNUSED_BITS (BASE64_CHUNK_BITS * 2 - BASE64_BITS_PER_BYTE) /* 4 */ > > +#define BASE64_TAIL3_UNUSED_BITS (BASE64_CHUNK_BITS * 3 - BASE64_BITS_PER_BYTE * 2) /* 2 */ > > > > static inline const char *find_chr(const char *base64_table, char ch) > > { > > if ('A' <= ch && ch <= 'Z') > > - return base64_table + ch - 'A'; > > + return base64_table + (ch - 'A'); > > if ('a' <= ch && ch <= 'z') > > - return base64_table + 26 + ch - 'a'; > > + return base64_table + 26 + (ch - 'a'); > > if ('0' <= ch && ch <= '9') > > - return base64_table + 26 * 2 + ch - '0'; > > - if (ch == base64_table[26 * 2 + 10]) > > - return base64_table + 26 * 2 + 10; > > - if (ch == base64_table[26 * 2 + 10 + 1]) > > - return base64_table + 26 * 2 + 10 + 1; > > + return base64_table + 52 + (ch - '0'); > > + if (ch == base64_table[62]) > > + return &base64_table[62]; > > + if (ch == base64_table[63]) > > + return &base64_table[63]; > > All the changes in this function look cosmetic. Could you fold them > into the patch that introduced the function to avoid touching the > lines multiple times? > > Best, > Caleb > You're right, these are just cosmetic changes. I'll fold them into the original patch. Best regards, Guan-chun > > return NULL; > > } > > > > /** > > - * base64_encode() - base64-encode some binary data > > + * base64_encode() - base64-encode with custom table and optional padding > > * @src: the binary data to encode > > * @srclen: the length of @src in bytes > > - * @dst: (output) the base64-encoded string. Not NUL-terminated. > > + * @dst: (output) the base64-encoded string. Not NUL-terminated. > > + * @padding: whether to append '=' characters so output length is a multiple of 4 > > + * @table: 64-character encoding table to use (e.g. standard or URL-safe variant) > > * > > - * Encodes data using base64 encoding, i.e. the "Base 64 Encoding" specified > > - * by RFC 4648, including the '='-padding. > > + * Encodes data using the given 64-character @table. If @padding is true, > > + * the output is padded with '=' as described in RFC 4648; otherwise padding > > + * is omitted. This allows generation of both standard and non-standard > > + * Base64 variants (e.g. URL-safe encoding). > > * > > * Return: the length of the resulting base64-encoded string in bytes. > > */ > > -int base64_encode(const u8 *src, int srclen, char *dst) > > +int base64_encode(const u8 *src, int srclen, char *dst, bool padding, const char *table) > > { > > u32 ac = 0; > > - int bits = 0; > > - int i; > > char *cp = dst; > > > > - for (i = 0; i < srclen; i++) { > > - ac = (ac << 8) | src[i]; > > - bits += 8; > > - do { > > - bits -= 6; > > - *cp++ = base64_table[(ac >> bits) & 0x3f]; > > - } while (bits >= 6); > > - } > > - if (bits) { > > - *cp++ = base64_table[(ac << (6 - bits)) & 0x3f]; > > - bits -= 6; > > + while (srclen >= 3) { > > + ac = ((u32)src[0] << (BASE64_BITS_PER_BYTE * 2)) | > > + ((u32)src[1] << (BASE64_BITS_PER_BYTE)) | > > + (u32)src[2]; > > + > > + *cp++ = table[ac >> BASE64_SHIFT_OUT0]; > > + *cp++ = table[(ac >> BASE64_SHIFT_OUT1) & BASE64_6BIT_MASK]; > > + *cp++ = table[(ac >> BASE64_SHIFT_OUT2) & BASE64_6BIT_MASK]; > > + *cp++ = table[ac & BASE64_6BIT_MASK]; > > + > > + src += 3; > > + srclen -= 3; > > } > > - while (bits < 0) { > > - *cp++ = '='; > > - bits += 2; > > + > > + switch (srclen) { > > + case 2: > > + ac = ((u32)src[0] << (BASE64_BITS_PER_BYTE * 2)) | > > + ((u32)src[1] << (BASE64_BITS_PER_BYTE)); > > + > > + *cp++ = table[ac >> BASE64_SHIFT_OUT0]; > > + *cp++ = table[(ac >> BASE64_SHIFT_OUT1) & BASE64_6BIT_MASK]; > > + *cp++ = table[(ac >> BASE64_SHIFT_OUT2) & BASE64_6BIT_MASK]; > > + if (padding) > > + *cp++ = '='; > > + break; > > + case 1: > > + ac = ((u32)src[0] << (BASE64_BITS_PER_BYTE * 2)); > > + *cp++ = table[ac >> BASE64_SHIFT_OUT0]; > > + *cp++ = table[(ac >> BASE64_SHIFT_OUT1) & BASE64_6BIT_MASK]; > > + if (padding) { > > + *cp++ = '='; > > + *cp++ = '='; > > + } > > + break; > > } > > return cp - dst; > > } > > EXPORT_SYMBOL_GPL(base64_encode); > > > > /** > > - * base64_decode() - base64-decode a string > > + * base64_decode() - base64-decode with custom table and optional padding > > * @src: the string to decode. Doesn't need to be NUL-terminated. > > * @srclen: the length of @src in bytes > > * @dst: (output) the decoded binary data > > + * @padding: when true, accept and handle '=' padding as per RFC 4648; > > + * when false, '=' is treated as invalid > > + * @table: 64-character encoding table to use (e.g. standard or URL-safe variant) > > * > > - * Decodes a string using base64 encoding, i.e. the "Base 64 Encoding" > > - * specified by RFC 4648, including the '='-padding. > > + * Decodes a string using the given 64-character @table. If @padding is true, > > + * '=' padding is accepted as described in RFC 4648; otherwise '=' is > > + * treated as an error. This allows decoding of both standard and > > + * non-standard Base64 variants (e.g. URL-safe decoding). > > * > > * This implementation hasn't been optimized for performance. > > * > > * Return: the length of the resulting decoded binary data in bytes, > > * or -1 if the string isn't a valid base64 string. > > */ > > -int base64_decode(const char *src, int srclen, u8 *dst) > > +static inline int base64_decode_table(char ch, const char *table) > > +{ > > + if (ch == '\0') > > + return -1; > > + const char *p = find_chr(table, ch); > > + > > + return p ? (p - table) : -1; > > +} > > + > > +static inline int decode_base64_block(const char *src, const char *table, > > + int *input1, int *input2, > > + int *input3, int *input4, > > + bool padding) > > +{ > > + *input1 = base64_decode_table(src[0], table); > > + *input2 = base64_decode_table(src[1], table); > > + *input3 = base64_decode_table(src[2], table); > > + *input4 = base64_decode_table(src[3], table); > > + > > + /* Return error if any base64 character is invalid */ > > + if (*input1 < 0 || *input2 < 0 || (!padding && (*input3 < 0 || *input4 < 0))) > > + return -1; > > + > > + /* Handle padding */ > > + if (padding) { > > + if (*input3 < 0 && *input4 >= 0) > > + return -1; > > + if (*input3 < 0 && src[2] != '=') > > + return -1; > > + if (*input4 < 0 && src[3] != '=') > > + return -1; > > + } > > + return 0; > > +} > > + > > +int base64_decode(const char *src, int srclen, u8 *dst, bool padding, const char *table) > > { > > - u32 ac = 0; > > - int bits = 0; > > - int i; > > u8 *bp = dst; > > + int input1, input2, input3, input4; > > + u32 val; > > > > - for (i = 0; i < srclen; i++) { > > - const char *p = find_chr(base64_table, src[i]); > > + if (srclen == 0) > > + return 0; > > > > - if (src[i] == '=') { > > - ac = (ac << 6); > > - bits += 6; > > - if (bits >= 8) > > - bits -= 8; > > - continue; > > + /* Validate the input length for padding */ > > + if (padding && (srclen & 0x03) != 0) > > + return -1; > > + > > + while (srclen >= 4) { > > + /* Decode the next 4 characters */ > > + if (decode_base64_block(src, table, &input1, &input2, &input3, > > + &input4, padding) < 0) > > + return -1; > > + if (padding && srclen > 4) { > > + if (input3 < 0 || input4 < 0) > > + return -1; > > } > > - if (p == NULL || src[i] == 0) > > + val = ((u32)input1 << BASE64_SHIFT_OUT0) | > > + ((u32)input2 << BASE64_SHIFT_OUT1) | > > + ((u32)((input3 < 0) ? 0 : input3) << BASE64_SHIFT_OUT2) | > > + (u32)((input4 < 0) ? 0 : input4); > > + > > + *bp++ = (u8)(val >> BASE64_SHIFT_BYTE0); > > + > > + if (input3 >= 0) > > + *bp++ = (u8)(val >> BASE64_SHIFT_BYTE1); > > + if (input4 >= 0) > > + *bp++ = (u8)val; > > + > > + src += 4; > > + srclen -= 4; > > + } > > + > > + /* Handle leftover characters when padding is not used */ > > + if (!padding && srclen > 0) { > > + switch (srclen) { > > + case 2: > > + input1 = base64_decode_table(src[0], table); > > + input2 = base64_decode_table(src[1], table); > > + if (input1 < 0 || input2 < 0) > > + return -1; > > + > > + val = ((u32)input1 << BASE64_CHUNK_BITS) | (u32)input2; /* 12 bits */ > > + if (val & BASE64_MASK(BASE64_TAIL2_UNUSED_BITS)) > > + return -1; /* low 4 bits must be zero */ > > + > > + *bp++ = (u8)(val >> BASE64_TAIL2_BYTE0_SHIFT); > > + break; > > + case 3: > > + input1 = base64_decode_table(src[0], table); > > + input2 = base64_decode_table(src[1], table); > > + input3 = base64_decode_table(src[2], table); > > + if (input1 < 0 || input2 < 0 || input3 < 0) > > + return -1; > > + > > + val = ((u32)input1 << (BASE64_CHUNK_BITS * 2)) | > > + ((u32)input2 << BASE64_CHUNK_BITS) | > > + (u32)input3; /* 18 bits */ > > + > > + if (val & BASE64_MASK(BASE64_TAIL3_UNUSED_BITS)) > > + return -1; /* low 2 bits must be zero */ > > + > > + *bp++ = (u8)(val >> BASE64_TAIL3_BYTE0_SHIFT); > > + *bp++ = (u8)((val >> BASE64_TAIL3_BYTE1_SHIFT) & 0xFF); > > + break; > > + default: > > return -1; > > - ac = (ac << 6) | (p - base64_table); > > - bits += 6; > > - if (bits >= 8) { > > - bits -= 8; > > - *bp++ = (u8)(ac >> bits); > > } > > } > > - if (ac & ((1 << bits) - 1)) > > - return -1; > > + > > return bp - dst; > > } > > EXPORT_SYMBOL_GPL(base64_decode); > > -- > > 2.34.1 > > > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 2/5] lib/base64: rework encoder/decoder with customizable support and update nvme-auth 2025-09-11 7:41 ` [PATCH v2 2/5] lib/base64: rework encoder/decoder with customizable support and update nvme-auth Guan-Chun Wu 2025-09-11 15:59 ` Caleb Sander Mateos @ 2025-09-11 18:27 ` Eric Biggers 2025-09-12 6:37 ` FIRST_NAME LAST_NAME 2025-09-12 7:15 ` Guan-Chun Wu 1 sibling, 2 replies; 28+ messages in thread From: Eric Biggers @ 2025-09-11 18:27 UTC (permalink / raw) To: Guan-Chun Wu Cc: akpm, axboe, ceph-devel, hch, home7438072, idryomov, jaegeuk, kbusch, linux-fscrypt, linux-kernel, linux-nvme, sagi, tytso, visitorckw, xiubli On Thu, Sep 11, 2025 at 03:41:59PM +0800, Guan-Chun Wu wrote: > Rework base64_encode() and base64_decode() with extended interfaces > that support custom 64-character tables and optional '=' padding. > This makes them flexible enough to cover both standard RFC4648 Base64 > and non-standard variants such as base64url. RFC4648 specifies both base64 and base64url. > The encoder is redesigned to process input in 3-byte blocks, each > mapped directly into 4 output symbols. Base64 naturally encodes > 24 bits of input as four 6-bit values, so operating on aligned > 3-byte chunks matches the algorithm's structure. This block-based > approach eliminates the need for bit-by-bit streaming, reduces shifts, > masks, and loop iterations, and removes data-dependent branches from > the main loop. There already weren't any data-dependent branches in the encoder. > The decoder replaces strchr()-based lookups with direct table-indexed > mapping. It processes input in 4-character groups and supports both > padded and non-padded forms. Validation has been strengthened: illegal > characters and misplaced '=' padding now cause errors, preventing > silent data corruption. The decoder already detected invalid inputs. > While this is a mechanical update following the lib/base64 rework, > nvme-auth also benefits from the performance improvements in the new > encoder/decoder, achieving faster encode/decode without altering the > output format. > > The reworked encoder and decoder unify Base64 handling across the kernel > with higher performance, stricter correctness, and flexibility to support > subsystem-specific variants. Which part is more strictly correct? > diff --git a/lib/base64.c b/lib/base64.c > index 9416bded2..b2bd5dab5 100644 > --- a/lib/base64.c > +++ b/lib/base64.c > @@ -15,104 +15,236 @@ > #include <linux/string.h> > #include <linux/base64.h> > > -static const char base64_table[65] = > - "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"; > +#define BASE64_6BIT_MASK 0x3f /* Mask to extract lowest 6 bits */ > +#define BASE64_BITS_PER_BYTE 8 > +#define BASE64_CHUNK_BITS 6 > + > +/* Output-char-indexed shifts: for output chars 0,1,2,3 respectively */ > +#define BASE64_SHIFT_OUT0 (BASE64_CHUNK_BITS * 3) /* 18 */ > +#define BASE64_SHIFT_OUT1 (BASE64_CHUNK_BITS * 2) /* 12 */ > +#define BASE64_SHIFT_OUT2 (BASE64_CHUNK_BITS * 1) /* 6 */ > +/* OUT3 uses 0 shift and just masks with BASE64_6BIT_MASK */ > + > +/* For extracting bytes from the 24-bit value (decode main loop) */ > +#define BASE64_SHIFT_BYTE0 (BASE64_BITS_PER_BYTE * 2) /* 16 */ > +#define BASE64_SHIFT_BYTE1 (BASE64_BITS_PER_BYTE * 1) /* 8 */ > + > +/* Tail (no padding) shifts to extract bytes */ > +#define BASE64_TAIL2_BYTE0_SHIFT ((BASE64_CHUNK_BITS * 2) - BASE64_BITS_PER_BYTE) /* 4 */ > +#define BASE64_TAIL3_BYTE0_SHIFT ((BASE64_CHUNK_BITS * 3) - BASE64_BITS_PER_BYTE) /* 10 */ > +#define BASE64_TAIL3_BYTE1_SHIFT ((BASE64_CHUNK_BITS * 3) - (BASE64_BITS_PER_BYTE * 2)) /* 2 */ > + > +/* Extra: masks for leftover validation (no padding) */ > +#define BASE64_MASK(n) ({ \ > + unsigned int __n = (n); \ > + __n ? ((1U << __n) - 1U) : 0U; \ > +}) > +#define BASE64_TAIL2_UNUSED_BITS (BASE64_CHUNK_BITS * 2 - BASE64_BITS_PER_BYTE) /* 4 */ > +#define BASE64_TAIL3_UNUSED_BITS (BASE64_CHUNK_BITS * 3 - BASE64_BITS_PER_BYTE * 2) /* 2 */ These #defines make the code unnecessarily hard to read. Most of them should just be replaced with the integer literals. > * This implementation hasn't been optimized for performance. But the commit message claims performance improvements. > * > * Return: the length of the resulting decoded binary data in bytes, > * or -1 if the string isn't a valid base64 string. base64 => Base64, since multiple variants are supported now. Refer to the terminology used by RFC4686. Base64 is the general term, and "base64" and "base64url" specific variants of Base64. - Eric ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 2/5] lib/base64: rework encoder/decoder with customizable support and update nvme-auth 2025-09-11 18:27 ` Eric Biggers @ 2025-09-12 6:37 ` FIRST_NAME LAST_NAME 2025-09-12 6:52 ` Guan-Chun Wu 2025-09-12 7:15 ` Guan-Chun Wu 1 sibling, 1 reply; 28+ messages in thread From: FIRST_NAME LAST_NAME @ 2025-09-12 6:37 UTC (permalink / raw) To: Eric Biggers Cc: akpm, axboe, ceph-devel, hch, home7438072, idryomov, jaegeuk, kbusch, linux-fscrypt, linux-kernel, linux-nvme, sagi, tytso, visitorckw, xiubli On Thu, Sep 11, 2025 at 11:27:42AM -0700, Eric Biggers wrote: > On Thu, Sep 11, 2025 at 03:41:59PM +0800, Guan-Chun Wu wrote: > > Rework base64_encode() and base64_decode() with extended interfaces > > that support custom 64-character tables and optional '=' padding. > > This makes them flexible enough to cover both standard RFC4648 Base64 > > and non-standard variants such as base64url. > > RFC4648 specifies both base64 and base64url. > Got it, I'll update the commit message in the next version. > > The encoder is redesigned to process input in 3-byte blocks, each > > mapped directly into 4 output symbols. Base64 naturally encodes > > 24 bits of input as four 6-bit values, so operating on aligned > > 3-byte chunks matches the algorithm's structure. This block-based > > approach eliminates the need for bit-by-bit streaming, reduces shifts, > > masks, and loop iterations, and removes data-dependent branches from > > the main loop. > > There already weren't any data-dependent branches in the encoder. > Got it, I'll update the commit message in the next version. > > The decoder replaces strchr()-based lookups with direct table-indexed > > mapping. It processes input in 4-character groups and supports both > > padded and non-padded forms. Validation has been strengthened: illegal > > characters and misplaced '=' padding now cause errors, preventing > > silent data corruption. > > The decoder already detected invalid inputs. > You're right, the decoder already rejected invalid inputs. What has been strengthened in the new version is the padding handling (length must be a multiple of 4, and = only allowed in the last two positions). > > While this is a mechanical update following the lib/base64 rework, > > nvme-auth also benefits from the performance improvements in the new > > encoder/decoder, achieving faster encode/decode without altering the > > output format. > > > > The reworked encoder and decoder unify Base64 handling across the kernel > > with higher performance, stricter correctness, and flexibility to support > > subsystem-specific variants. > > Which part is more strictly correct? > The stricter correctness here refers to the decoder — specifically the padding checks (length must be a multiple of 4, and = only allowed in the last two positions). > > diff --git a/lib/base64.c b/lib/base64.c > > index 9416bded2..b2bd5dab5 100644 > > --- a/lib/base64.c > > +++ b/lib/base64.c > > @@ -15,104 +15,236 @@ > > #include <linux/string.h> > > #include <linux/base64.h> > > > > -static const char base64_table[65] = > > - "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"; > > +#define BASE64_6BIT_MASK 0x3f /* Mask to extract lowest 6 bits */ > > +#define BASE64_BITS_PER_BYTE 8 > > +#define BASE64_CHUNK_BITS 6 > > + > > +/* Output-char-indexed shifts: for output chars 0,1,2,3 respectively */ > > +#define BASE64_SHIFT_OUT0 (BASE64_CHUNK_BITS * 3) /* 18 */ > > +#define BASE64_SHIFT_OUT1 (BASE64_CHUNK_BITS * 2) /* 12 */ > > +#define BASE64_SHIFT_OUT2 (BASE64_CHUNK_BITS * 1) /* 6 */ > > +/* OUT3 uses 0 shift and just masks with BASE64_6BIT_MASK */ > > + > > +/* For extracting bytes from the 24-bit value (decode main loop) */ > > +#define BASE64_SHIFT_BYTE0 (BASE64_BITS_PER_BYTE * 2) /* 16 */ > > +#define BASE64_SHIFT_BYTE1 (BASE64_BITS_PER_BYTE * 1) /* 8 */ > > + > > +/* Tail (no padding) shifts to extract bytes */ > > +#define BASE64_TAIL2_BYTE0_SHIFT ((BASE64_CHUNK_BITS * 2) - BASE64_BITS_PER_BYTE) /* 4 */ > > +#define BASE64_TAIL3_BYTE0_SHIFT ((BASE64_CHUNK_BITS * 3) - BASE64_BITS_PER_BYTE) /* 10 */ > > +#define BASE64_TAIL3_BYTE1_SHIFT ((BASE64_CHUNK_BITS * 3) - (BASE64_BITS_PER_BYTE * 2)) /* 2 */ > > + > > +/* Extra: masks for leftover validation (no padding) */ > > +#define BASE64_MASK(n) ({ \ > > + unsigned int __n = (n); \ > > + __n ? ((1U << __n) - 1U) : 0U; \ > > +}) > > +#define BASE64_TAIL2_UNUSED_BITS (BASE64_CHUNK_BITS * 2 - BASE64_BITS_PER_BYTE) /* 4 */ > > +#define BASE64_TAIL3_UNUSED_BITS (BASE64_CHUNK_BITS * 3 - BASE64_BITS_PER_BYTE * 2) /* 2 */ > > These #defines make the code unnecessarily hard to read. Most of them > should just be replaced with the integer literals. > Got it, thanks for the feedback. I'll simplify this in the next version. > > * This implementation hasn't been optimized for performance. > > But the commit message claims performance improvements. > That was my mistake — I forgot to update this part of the comment. I’ll fix it in the next version. > > * > > * Return: the length of the resulting decoded binary data in bytes, > > * or -1 if the string isn't a valid base64 string. > > base64 => Base64, since multiple variants are supported now. Refer to > the terminology used by RFC4686. Base64 is the general term, and > "base64" and "base64url" specific variants of Base64. > > - Eric Ok, I'll update the comments to use Base64. Best regards, Guan-chun ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 2/5] lib/base64: rework encoder/decoder with customizable support and update nvme-auth 2025-09-12 6:37 ` FIRST_NAME LAST_NAME @ 2025-09-12 6:52 ` Guan-Chun Wu 0 siblings, 0 replies; 28+ messages in thread From: Guan-Chun Wu @ 2025-09-12 6:52 UTC (permalink / raw) To: 409411716 Cc: akpm, axboe, ceph-devel, ebiggers, hch, home7438072, idryomov, jaegeuk, kbusch, linux-fscrypt, linux-kernel, linux-nvme, sagi, tytso, visitorckw, xiubli Sorry, please ignore my previous email. My email client was not configured correctly. Best regards, Guan-chun ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 2/5] lib/base64: rework encoder/decoder with customizable support and update nvme-auth 2025-09-11 18:27 ` Eric Biggers 2025-09-12 6:37 ` FIRST_NAME LAST_NAME @ 2025-09-12 7:15 ` Guan-Chun Wu 1 sibling, 0 replies; 28+ messages in thread From: Guan-Chun Wu @ 2025-09-12 7:15 UTC (permalink / raw) To: Eric Biggers Cc: akpm, axboe, ceph-devel, hch, home7438072, idryomov, jaegeuk, kbusch, linux-fscrypt, linux-kernel, linux-nvme, sagi, tytso, visitorckw, xiubli On Thu, Sep 11, 2025 at 11:27:42AM -0700, Eric Biggers wrote: > On Thu, Sep 11, 2025 at 03:41:59PM +0800, Guan-Chun Wu wrote: > > Rework base64_encode() and base64_decode() with extended interfaces > > that support custom 64-character tables and optional '=' padding. > > This makes them flexible enough to cover both standard RFC4648 Base64 > > and non-standard variants such as base64url. > > RFC4648 specifies both base64 and base64url. > Got it, I'll update the commit message in the next version. > > The encoder is redesigned to process input in 3-byte blocks, each > > mapped directly into 4 output symbols. Base64 naturally encodes > > 24 bits of input as four 6-bit values, so operating on aligned > > 3-byte chunks matches the algorithm's structure. This block-based > > approach eliminates the need for bit-by-bit streaming, reduces shifts, > > masks, and loop iterations, and removes data-dependent branches from > > the main loop. > > There already weren't any data-dependent branches in the encoder. > Got it, I'll update the commit message in the next version. > > The decoder replaces strchr()-based lookups with direct table-indexed > > mapping. It processes input in 4-character groups and supports both > > padded and non-padded forms. Validation has been strengthened: illegal > > characters and misplaced '=' padding now cause errors, preventing > > silent data corruption. > > The decoder already detected invalid inputs. > You're right, the decoder already rejected invalid inputs. What has been strengthened in the new version is the padding handling (length must be a multiple of 4, and = only allowed in the last two positions). > > While this is a mechanical update following the lib/base64 rework, > > nvme-auth also benefits from the performance improvements in the new > > encoder/decoder, achieving faster encode/decode without altering the > > output format. > > > > The reworked encoder and decoder unify Base64 handling across the kernel > > with higher performance, stricter correctness, and flexibility to support > > subsystem-specific variants. > > Which part is more strictly correct? > The stricter correctness here refers to the decoder, specifically the padding checks (length must be a multiple of 4, and = only allowed in the last two positions). > > diff --git a/lib/base64.c b/lib/base64.c > > index 9416bded2..b2bd5dab5 100644 > > --- a/lib/base64.c > > +++ b/lib/base64.c > > @@ -15,104 +15,236 @@ > > #include <linux/string.h> > > #include <linux/base64.h> > > > > -static const char base64_table[65] = > > - "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"; > > +#define BASE64_6BIT_MASK 0x3f /* Mask to extract lowest 6 bits */ > > +#define BASE64_BITS_PER_BYTE 8 > > +#define BASE64_CHUNK_BITS 6 > > + > > +/* Output-char-indexed shifts: for output chars 0,1,2,3 respectively */ > > +#define BASE64_SHIFT_OUT0 (BASE64_CHUNK_BITS * 3) /* 18 */ > > +#define BASE64_SHIFT_OUT1 (BASE64_CHUNK_BITS * 2) /* 12 */ > > +#define BASE64_SHIFT_OUT2 (BASE64_CHUNK_BITS * 1) /* 6 */ > > +/* OUT3 uses 0 shift and just masks with BASE64_6BIT_MASK */ > > + > > +/* For extracting bytes from the 24-bit value (decode main loop) */ > > +#define BASE64_SHIFT_BYTE0 (BASE64_BITS_PER_BYTE * 2) /* 16 */ > > +#define BASE64_SHIFT_BYTE1 (BASE64_BITS_PER_BYTE * 1) /* 8 */ > > + > > +/* Tail (no padding) shifts to extract bytes */ > > +#define BASE64_TAIL2_BYTE0_SHIFT ((BASE64_CHUNK_BITS * 2) - BASE64_BITS_PER_BYTE) /* 4 */ > > +#define BASE64_TAIL3_BYTE0_SHIFT ((BASE64_CHUNK_BITS * 3) - BASE64_BITS_PER_BYTE) /* 10 */ > > +#define BASE64_TAIL3_BYTE1_SHIFT ((BASE64_CHUNK_BITS * 3) - (BASE64_BITS_PER_BYTE * 2)) /* 2 */ > > + > > +/* Extra: masks for leftover validation (no padding) */ > > +#define BASE64_MASK(n) ({ \ > > + unsigned int __n = (n); \ > > + __n ? ((1U << __n) - 1U) : 0U; \ > > +}) > > +#define BASE64_TAIL2_UNUSED_BITS (BASE64_CHUNK_BITS * 2 - BASE64_BITS_PER_BYTE) /* 4 */ > > +#define BASE64_TAIL3_UNUSED_BITS (BASE64_CHUNK_BITS * 3 - BASE64_BITS_PER_BYTE * 2) /* 2 */ > > These #defines make the code unnecessarily hard to read. Most of them > should just be replaced with the integer literals. > Got it, thanks for the feedback. I'll simplify this in the next version. > > * This implementation hasn't been optimized for performance. > > But the commit message claims performance improvements. > That was my mistake. I forgot to update this part of the comment. I’ll fix it in the next version. > > * > > * Return: the length of the resulting decoded binary data in bytes, > > * or -1 if the string isn't a valid base64 string. > > base64 => Base64, since multiple variants are supported now. Refer to > the terminology used by RFC4686. Base64 is the general term, and > "base64" and "base64url" specific variants of Base64. > > - Eric Ok, I'll update the comments to use Base64. Best regards, Guan-chun ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v2 3/5] lib: add KUnit tests for base64 encoding/decoding 2025-09-11 7:29 [PATCH v2 0/5] lib/base64: add generic encoder/decoder, migrate users Guan-Chun Wu 2025-09-11 7:32 ` [PATCH v2 1/5] lib/base64: Replace strchr() for better performance Guan-Chun Wu 2025-09-11 7:41 ` [PATCH v2 2/5] lib/base64: rework encoder/decoder with customizable support and update nvme-auth Guan-Chun Wu @ 2025-09-11 7:45 ` Guan-Chun Wu 2025-09-11 7:45 ` [PATCH v2 4/5] fscrypt: replace local base64url helpers with generic lib/base64 helpers Guan-Chun Wu 2025-09-11 7:46 ` [PATCH v2 5/5] ceph: replace local base64 encode/decode " Guan-Chun Wu 4 siblings, 0 replies; 28+ messages in thread From: Guan-Chun Wu @ 2025-09-11 7:45 UTC (permalink / raw) To: 409411716 Cc: akpm, axboe, ceph-devel, ebiggers, hch, home7438072, idryomov, jaegeuk, kbusch, linux-fscrypt, linux-kernel, linux-nvme, sagi, tytso, visitorckw, xiubli Add a KUnit test suite for the base64 helpers. The tests exercise both encoding and decoding, cover padded and unpadded forms per RFC 4648, and include negative cases for malformed inputs and padding errors. The suite also validates behavior with a caller-supplied encoding table. In addition to functional checks, the suite includes simple microbenchmarks which report average encode/decode latency for small (64B) and larger (1KB) inputs. These numbers are informational only and do not gate the tests. Kconfig (BASE64_KUNIT) and lib/tests/Makefile are updated accordingly. Sample KUnit output: KTAP version 1 # Subtest: base64 # module: base64_kunit 1..3 # base64_performance_tests: [64B] encode run : 32ns # base64_performance_tests: [64B] decode run : 122ns # base64_performance_tests: [1KB] encode run : 507ns # base64_performance_tests: [1KB] decode run : 1870ns ok 1 base64_performance_tests ok 2 base64_encode_tests ok 3 base64_decode_tests # base64: pass:3 fail:0 skip:0 total:3 # Totals: pass:3 fail:0 skip:0 total:3 Signed-off-by: Guan-Chun Wu <409411716@gms.tku.edu.tw> Reviewed-by: Kuan-Wei Chiu <visitorckw@gmail.com> --- lib/Kconfig.debug | 19 +++- lib/tests/Makefile | 1 + lib/tests/base64_kunit.c | 230 +++++++++++++++++++++++++++++++++++++++ 3 files changed, 249 insertions(+), 1 deletion(-) create mode 100644 lib/tests/base64_kunit.c diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index dc0e0c6ed..1cfb12d02 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -2794,8 +2794,25 @@ config CMDLINE_KUNIT_TEST If unsure, say N. +config BASE64_KUNIT + tristate "KUnit test for base64 decoding and encoding" if !KUNIT_ALL_TESTS + depends on KUNIT + default KUNIT_ALL_TESTS + help + This builds the base64 unit tests. + + The tests cover the encoding and decoding logic of Base64 functions + in the kernel. + In addition to correctness checks, simple performance benchmarks + for both encoding and decoding are also included. + + For more information on KUnit and unit tests in general please refer + to the KUnit documentation in Documentation/dev-tools/kunit/. + + If unsure, say N. + config BITS_TEST - tristate "KUnit test for bits.h" if !KUNIT_ALL_TESTS + tristate "KUnit test for bit functions and macros" if !KUNIT_ALL_TESTS depends on KUNIT default KUNIT_ALL_TESTS help diff --git a/lib/tests/Makefile b/lib/tests/Makefile index fa6d728a8..6593a2873 100644 --- a/lib/tests/Makefile +++ b/lib/tests/Makefile @@ -4,6 +4,7 @@ # KUnit tests CFLAGS_bitfield_kunit.o := $(DISABLE_STRUCTLEAK_PLUGIN) +obj-$(CONFIG_BASE64_KUNIT) += base64_kunit.o obj-$(CONFIG_BITFIELD_KUNIT) += bitfield_kunit.o obj-$(CONFIG_BITS_TEST) += test_bits.o obj-$(CONFIG_BLACKHOLE_DEV_KUNIT_TEST) += blackhole_dev_kunit.o diff --git a/lib/tests/base64_kunit.c b/lib/tests/base64_kunit.c new file mode 100644 index 000000000..1941ac504 --- /dev/null +++ b/lib/tests/base64_kunit.c @@ -0,0 +1,230 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * base64_kunit_test.c - KUnit tests for base64 encoding and decoding functions + * + * Copyright (c) 2025, Guan-Chun Wu <409411716@gms.tku.edu.tw> + */ + +#include <kunit/test.h> +#include <linux/base64.h> + +static const char base64_table[65] = + "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"; + +/* ---------- Benchmark helpers ---------- */ +static u64 bench_encode_ns(const u8 *data, int len, char *dst, int reps) +{ + u64 t0, t1; + + t0 = ktime_get_ns(); + for (int i = 0; i < reps; i++) + base64_encode(data, len, dst, true, base64_table); + t1 = ktime_get_ns(); + + return div64_u64(t1 - t0, (u64)reps); +} + +static u64 bench_decode_ns(const char *data, int len, u8 *dst, int reps) +{ + u64 t0, t1; + + t0 = ktime_get_ns(); + for (int i = 0; i < reps; i++) + base64_decode(data, len, dst, true, base64_table); + t1 = ktime_get_ns(); + + return div64_u64(t1 - t0, (u64)reps); +} + +static void run_perf_and_check(struct kunit *test, const char *label, int size) +{ + const int reps = 1000; + size_t outlen = DIV_ROUND_UP(size, 3) * 4; + u8 *in = kmalloc(size, GFP_KERNEL); + char *enc = kmalloc(outlen, GFP_KERNEL); + u8 *decoded = kmalloc(size, GFP_KERNEL); + + KUNIT_ASSERT_NOT_ERR_OR_NULL(test, in); + KUNIT_ASSERT_NOT_ERR_OR_NULL(test, enc); + KUNIT_ASSERT_NOT_ERR_OR_NULL(test, decoded); + + get_random_bytes(in, size); + int enc_len = base64_encode(in, size, enc, true, base64_table); + int dec_len = base64_decode(enc, enc_len, decoded, true, base64_table); + + /* correctness sanity check */ + KUNIT_EXPECT_EQ(test, dec_len, size); + KUNIT_EXPECT_MEMEQ(test, decoded, in, size); + + /* benchmark encode */ + + u64 t1 = bench_encode_ns(in, size, enc, reps); + + kunit_info(test, "[%s] encode run : %lluns", label, t1); + + u64 t2 = bench_decode_ns(enc, enc_len, decoded, reps); + + kunit_info(test, "[%s] decode run : %lluns", label, t2); + + kfree(in); + kfree(enc); + kfree(decoded); +} + +static void base64_performance_tests(struct kunit *test) +{ + run_perf_and_check(test, "64B", 64); + run_perf_and_check(test, "1KB", 1024); +} + +/* ---------- Helpers for encode ---------- */ +static void expect_encode_ok(struct kunit *test, const u8 *src, int srclen, + const char *expected, bool padding) +{ + char buf[128]; + int encoded_len = base64_encode(src, srclen, buf, padding, base64_table); + + buf[encoded_len] = '\0'; + + KUNIT_EXPECT_EQ(test, encoded_len, strlen(expected)); + KUNIT_EXPECT_STREQ(test, buf, expected); +} + +/* ---------- Helpers for decode ---------- */ +static void expect_decode_ok(struct kunit *test, const char *src, + const u8 *expected, int expected_len, bool padding) +{ + u8 buf[128]; + int decoded_len = base64_decode(src, strlen(src), buf, padding, base64_table); + + KUNIT_EXPECT_EQ(test, decoded_len, expected_len); + KUNIT_EXPECT_MEMEQ(test, buf, expected, expected_len); +} + +static void expect_decode_err(struct kunit *test, const char *src, + int srclen, bool padding) +{ + u8 buf[64]; + int decoded_len = base64_decode(src, srclen, buf, padding, base64_table); + + KUNIT_EXPECT_EQ(test, decoded_len, -1); +} + +/* ---------- Encode Tests ---------- */ +static void base64_encode_tests(struct kunit *test) +{ + /* With padding */ + expect_encode_ok(test, (const u8 *)"", 0, "", true); + expect_encode_ok(test, (const u8 *)"f", 1, "Zg==", true); + expect_encode_ok(test, (const u8 *)"fo", 2, "Zm8=", true); + expect_encode_ok(test, (const u8 *)"foo", 3, "Zm9v", true); + expect_encode_ok(test, (const u8 *)"foob", 4, "Zm9vYg==", true); + expect_encode_ok(test, (const u8 *)"fooba", 5, "Zm9vYmE=", true); + expect_encode_ok(test, (const u8 *)"foobar", 6, "Zm9vYmFy", true); + + /* Extra cases with padding */ + expect_encode_ok(test, (const u8 *)"Hello, world!", 13, "SGVsbG8sIHdvcmxkIQ==", true); + expect_encode_ok(test, (const u8 *)"ABCDEFGHIJKLMNOPQRSTUVWXYZ", 26, + "QUJDREVGR0hJSktMTU5PUFFSU1RVVldYWVo=", true); + expect_encode_ok(test, (const u8 *)"abcdefghijklmnopqrstuvwxyz", 26, + "YWJjZGVmZ2hpamtsbW5vcHFyc3R1dnd4eXo=", true); + expect_encode_ok(test, (const u8 *)"0123456789+/", 12, "MDEyMzQ1Njc4OSsv", true); + + /* Without padding */ + expect_encode_ok(test, (const u8 *)"", 0, "", false); + expect_encode_ok(test, (const u8 *)"f", 1, "Zg", false); + expect_encode_ok(test, (const u8 *)"fo", 2, "Zm8", false); + expect_encode_ok(test, (const u8 *)"foo", 3, "Zm9v", false); + expect_encode_ok(test, (const u8 *)"foob", 4, "Zm9vYg", false); + expect_encode_ok(test, (const u8 *)"fooba", 5, "Zm9vYmE", false); + expect_encode_ok(test, (const u8 *)"foobar", 6, "Zm9vYmFy", false); + + /* Extra cases without padding */ + expect_encode_ok(test, (const u8 *)"Hello, world!", 13, "SGVsbG8sIHdvcmxkIQ", false); + expect_encode_ok(test, (const u8 *)"ABCDEFGHIJKLMNOPQRSTUVWXYZ", 26, + "QUJDREVGR0hJSktMTU5PUFFSU1RVVldYWVo", false); + expect_encode_ok(test, (const u8 *)"abcdefghijklmnopqrstuvwxyz", 26, + "YWJjZGVmZ2hpamtsbW5vcHFyc3R1dnd4eXo", false); + expect_encode_ok(test, (const u8 *)"0123456789+/", 12, "MDEyMzQ1Njc4OSsv", false); +} + +/* ---------- Decode Tests ---------- */ +static void base64_decode_tests(struct kunit *test) +{ + /* -------- With padding --------*/ + expect_decode_ok(test, "", (const u8 *)"", 0, true); + expect_decode_ok(test, "Zg==", (const u8 *)"f", 1, true); + expect_decode_ok(test, "Zm8=", (const u8 *)"fo", 2, true); + expect_decode_ok(test, "Zm9v", (const u8 *)"foo", 3, true); + expect_decode_ok(test, "Zm9vYg==", (const u8 *)"foob", 4, true); + expect_decode_ok(test, "Zm9vYmE=", (const u8 *)"fooba", 5, true); + expect_decode_ok(test, "Zm9vYmFy", (const u8 *)"foobar", 6, true); + expect_decode_ok(test, "SGVsbG8sIHdvcmxkIQ==", (const u8 *)"Hello, world!", 13, true); + expect_decode_ok(test, "QUJDREVGR0hJSktMTU5PUFFSU1RVVldYWVo=", + (const u8 *)"ABCDEFGHIJKLMNOPQRSTUVWXYZ", 26, true); + expect_decode_ok(test, "YWJjZGVmZ2hpamtsbW5vcHFyc3R1dnd4eXo=", + (const u8 *)"abcdefghijklmnopqrstuvwxyz", 26, true); + + /* Error cases */ + expect_decode_err(test, "Zg=!", 4, true); + expect_decode_err(test, "Zm$=", 4, true); + expect_decode_err(test, "Z===", 4, true); + expect_decode_err(test, "Zg", 2, true); + expect_decode_err(test, "Zm9v====", 8, true); + expect_decode_err(test, "Zm==A", 5, true); + + { + char with_nul[4] = { 'Z', 'g', '\0', '=' }; + + expect_decode_err(test, with_nul, 4, true); + } + + /* -------- Without padding --------*/ + expect_decode_ok(test, "", (const u8 *)"", 0, false); + expect_decode_ok(test, "Zg", (const u8 *)"f", 1, false); + expect_decode_ok(test, "Zm8", (const u8 *)"fo", 2, false); + expect_decode_ok(test, "Zm9v", (const u8 *)"foo", 3, false); + expect_decode_ok(test, "Zm9vYg", (const u8 *)"foob", 4, false); + expect_decode_ok(test, "Zm9vYmE", (const u8 *)"fooba", 5, false); + expect_decode_ok(test, "Zm9vYmFy", (const u8 *)"foobar", 6, false); + expect_decode_ok(test, "TWFu", (const u8 *)"Man", 3, false); + expect_decode_ok(test, "SGVsbG8sIHdvcmxkIQ", (const u8 *)"Hello, world!", 13, false); + expect_decode_ok(test, "QUJDREVGR0hJSktMTU5PUFFSU1RVVldYWVo", + (const u8 *)"ABCDEFGHIJKLMNOPQRSTUVWXYZ", 26, false); + expect_decode_ok(test, "YWJjZGVmZ2hpamtsbW5vcHFyc3R1dnd4eXo", + (const u8 *)"abcdefghijklmnopqrstuvwxyz", 26, false); + expect_decode_ok(test, "MDEyMzQ1Njc4OSsv", (const u8 *)"0123456789+/", 12, false); + + /* Error cases */ + expect_decode_err(test, "Zg=!", 4, false); + expect_decode_err(test, "Zm$=", 4, false); + expect_decode_err(test, "Z===", 4, false); + expect_decode_err(test, "Zg=", 3, false); + expect_decode_err(test, "Zm9v====", 8, false); + expect_decode_err(test, "Zm==v", 4, false); + + { + char with_nul[4] = { 'Z', 'g', '\0', '=' }; + + expect_decode_err(test, with_nul, 4, false); + } +} + +/* ---------- Test registration ---------- */ +static struct kunit_case base64_test_cases[] = { + KUNIT_CASE(base64_performance_tests), + KUNIT_CASE(base64_encode_tests), + KUNIT_CASE(base64_decode_tests), + {} +}; + +static struct kunit_suite base64_test_suite = { + .name = "base64", + .test_cases = base64_test_cases, +}; + +kunit_test_suite(base64_test_suite); + +MODULE_AUTHOR("Guan-Chun Wu <409411716@gms.tku.edu.tw>"); +MODULE_DESCRIPTION("KUnit tests for Base64 encoding/decoding, including performance checks"); +MODULE_LICENSE("GPL"); -- 2.34.1 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v2 4/5] fscrypt: replace local base64url helpers with generic lib/base64 helpers 2025-09-11 7:29 [PATCH v2 0/5] lib/base64: add generic encoder/decoder, migrate users Guan-Chun Wu ` (2 preceding siblings ...) 2025-09-11 7:45 ` [PATCH v2 3/5] lib: add KUnit tests for base64 encoding/decoding Guan-Chun Wu @ 2025-09-11 7:45 ` Guan-Chun Wu 2025-09-11 18:47 ` Eric Biggers 2025-09-11 7:46 ` [PATCH v2 5/5] ceph: replace local base64 encode/decode " Guan-Chun Wu 4 siblings, 1 reply; 28+ messages in thread From: Guan-Chun Wu @ 2025-09-11 7:45 UTC (permalink / raw) To: 409411716 Cc: akpm, axboe, ceph-devel, ebiggers, hch, home7438072, idryomov, jaegeuk, kbusch, linux-fscrypt, linux-kernel, linux-nvme, sagi, tytso, visitorckw, xiubli Replace the existing local base64url encoding and decoding functions in fscrypt with the generic base64_encode_custom and base64_decode_custom helpers from the kernel's lib/base64 library. This removes custom implementations in fscrypt, reduces code duplication, and leverages the well-maintained, standard base64 code within the kernel. The new helpers preserve RFC 4648-compliant URL-safe Base64 encoding without padding behavior, ensuring no functional changes. At the same time, they also deliver significant performance gains: with the optimized encoder and decoder, encoding runs about 2.7x faster and decoding achieves 12-15x improvements over the previous implementation. This improves maintainability and aligns fscrypt with other kernel components using the generic base64 helpers. Signed-off-by: Guan-Chun Wu <409411716@gms.tku.edu.tw> Reviewed-by: Kuan-Wei Chiu <visitorckw@gmail.com> --- fs/crypto/fname.c | 86 ++++------------------------------------------- 1 file changed, 6 insertions(+), 80 deletions(-) diff --git a/fs/crypto/fname.c b/fs/crypto/fname.c index f9f6713e1..38be85cd5 100644 --- a/fs/crypto/fname.c +++ b/fs/crypto/fname.c @@ -17,6 +17,7 @@ #include <linux/export.h> #include <linux/namei.h> #include <linux/scatterlist.h> +#include <linux/base64.h> #include "fscrypt_private.h" @@ -72,7 +73,7 @@ struct fscrypt_nokey_name { /* Encoded size of max-size no-key name */ #define FSCRYPT_NOKEY_NAME_MAX_ENCODED \ - FSCRYPT_BASE64URL_CHARS(FSCRYPT_NOKEY_NAME_MAX) + BASE64_CHARS(FSCRYPT_NOKEY_NAME_MAX) static inline bool fscrypt_is_dot_dotdot(const struct qstr *str) { @@ -166,81 +167,6 @@ static int fname_decrypt(const struct inode *inode, static const char base64url_table[65] = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_"; -#define FSCRYPT_BASE64URL_CHARS(nbytes) DIV_ROUND_UP((nbytes) * 4, 3) - -/** - * fscrypt_base64url_encode() - base64url-encode some binary data - * @src: the binary data to encode - * @srclen: the length of @src in bytes - * @dst: (output) the base64url-encoded string. Not NUL-terminated. - * - * Encodes data using base64url encoding, i.e. the "Base 64 Encoding with URL - * and Filename Safe Alphabet" specified by RFC 4648. '='-padding isn't used, - * as it's unneeded and not required by the RFC. base64url is used instead of - * base64 to avoid the '/' character, which isn't allowed in filenames. - * - * Return: the length of the resulting base64url-encoded string in bytes. - * This will be equal to FSCRYPT_BASE64URL_CHARS(srclen). - */ -static int fscrypt_base64url_encode(const u8 *src, int srclen, char *dst) -{ - u32 ac = 0; - int bits = 0; - int i; - char *cp = dst; - - for (i = 0; i < srclen; i++) { - ac = (ac << 8) | src[i]; - bits += 8; - do { - bits -= 6; - *cp++ = base64url_table[(ac >> bits) & 0x3f]; - } while (bits >= 6); - } - if (bits) - *cp++ = base64url_table[(ac << (6 - bits)) & 0x3f]; - return cp - dst; -} - -/** - * fscrypt_base64url_decode() - base64url-decode a string - * @src: the string to decode. Doesn't need to be NUL-terminated. - * @srclen: the length of @src in bytes - * @dst: (output) the decoded binary data - * - * Decodes a string using base64url encoding, i.e. the "Base 64 Encoding with - * URL and Filename Safe Alphabet" specified by RFC 4648. '='-padding isn't - * accepted, nor are non-encoding characters such as whitespace. - * - * This implementation hasn't been optimized for performance. - * - * Return: the length of the resulting decoded binary data in bytes, - * or -1 if the string isn't a valid base64url string. - */ -static int fscrypt_base64url_decode(const char *src, int srclen, u8 *dst) -{ - u32 ac = 0; - int bits = 0; - int i; - u8 *bp = dst; - - for (i = 0; i < srclen; i++) { - const char *p = strchr(base64url_table, src[i]); - - if (p == NULL || src[i] == 0) - return -1; - ac = (ac << 6) | (p - base64url_table); - bits += 6; - if (bits >= 8) { - bits -= 8; - *bp++ = (u8)(ac >> bits); - } - } - if (ac & ((1 << bits) - 1)) - return -1; - return bp - dst; -} - bool __fscrypt_fname_encrypted_size(const union fscrypt_policy *policy, u32 orig_len, u32 max_len, u32 *encrypted_len_ret) @@ -387,8 +313,8 @@ int fscrypt_fname_disk_to_usr(const struct inode *inode, nokey_name.sha256); size = FSCRYPT_NOKEY_NAME_MAX; } - oname->len = fscrypt_base64url_encode((const u8 *)&nokey_name, size, - oname->name); + oname->len = base64_encode((const u8 *)&nokey_name, size, + oname->name, false, base64url_table); return 0; } EXPORT_SYMBOL(fscrypt_fname_disk_to_usr); @@ -467,8 +393,8 @@ int fscrypt_setup_filename(struct inode *dir, const struct qstr *iname, if (fname->crypto_buf.name == NULL) return -ENOMEM; - ret = fscrypt_base64url_decode(iname->name, iname->len, - fname->crypto_buf.name); + ret = base64_decode(iname->name, iname->len, + fname->crypto_buf.name, false, base64url_table); if (ret < (int)offsetof(struct fscrypt_nokey_name, bytes[1]) || (ret > offsetof(struct fscrypt_nokey_name, sha256) && ret != FSCRYPT_NOKEY_NAME_MAX)) { -- 2.34.1 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH v2 4/5] fscrypt: replace local base64url helpers with generic lib/base64 helpers 2025-09-11 7:45 ` [PATCH v2 4/5] fscrypt: replace local base64url helpers with generic lib/base64 helpers Guan-Chun Wu @ 2025-09-11 18:47 ` Eric Biggers 2025-09-12 7:51 ` Guan-Chun Wu 0 siblings, 1 reply; 28+ messages in thread From: Eric Biggers @ 2025-09-11 18:47 UTC (permalink / raw) To: Guan-Chun Wu Cc: akpm, axboe, ceph-devel, hch, home7438072, idryomov, jaegeuk, kbusch, linux-fscrypt, linux-kernel, linux-nvme, sagi, tytso, visitorckw, xiubli On Thu, Sep 11, 2025 at 03:45:56PM +0800, Guan-Chun Wu wrote: > Replace the existing local base64url encoding and decoding functions in > fscrypt with the generic base64_encode_custom and base64_decode_custom > helpers from the kernel's lib/base64 library. But those aren't the functions that are actually used. > This removes custom implementations in fscrypt, reduces code duplication, > and leverages the well-maintained, Who is maintaining lib/base64.c? I guess Andrew? > standard base64 code within the kernel. fscrypt uses "base64url", not "base64". > /* Encoded size of max-size no-key name */ > #define FSCRYPT_NOKEY_NAME_MAX_ENCODED \ > - FSCRYPT_BASE64URL_CHARS(FSCRYPT_NOKEY_NAME_MAX) > + BASE64_CHARS(FSCRYPT_NOKEY_NAME_MAX) Does BASE64_CHARS() include '=' padding or not? - Eric ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 4/5] fscrypt: replace local base64url helpers with generic lib/base64 helpers 2025-09-11 18:47 ` Eric Biggers @ 2025-09-12 7:51 ` Guan-Chun Wu 0 siblings, 0 replies; 28+ messages in thread From: Guan-Chun Wu @ 2025-09-12 7:51 UTC (permalink / raw) To: Eric Biggers Cc: akpm, axboe, ceph-devel, hch, home7438072, idryomov, jaegeuk, kbusch, linux-fscrypt, linux-kernel, linux-nvme, sagi, tytso, visitorckw, xiubli Hi Eric, On Thu, Sep 11, 2025 at 11:47:05AM -0700, Eric Biggers wrote: > On Thu, Sep 11, 2025 at 03:45:56PM +0800, Guan-Chun Wu wrote: > > Replace the existing local base64url encoding and decoding functions in > > fscrypt with the generic base64_encode_custom and base64_decode_custom > > helpers from the kernel's lib/base64 library. > > But those aren't the functions that are actually used. > I'll update the commit message. I meant base64_encode and base64_decode. > > This removes custom implementations in fscrypt, reduces code duplication, > > and leverages the well-maintained, > > Who is maintaining lib/base64.c? I guess Andrew? > Yes, Andrew is maintaining lib/base64.c. > > standard base64 code within the kernel. > > fscrypt uses "base64url", not "base64". > You're right, I'll update the commit message in the next version. > > /* Encoded size of max-size no-key name */ > > #define FSCRYPT_NOKEY_NAME_MAX_ENCODED \ > > - FSCRYPT_BASE64URL_CHARS(FSCRYPT_NOKEY_NAME_MAX) > > + BASE64_CHARS(FSCRYPT_NOKEY_NAME_MAX) > > Does BASE64_CHARS() include '=' padding or not? > > - Eric No, BASE64_CHARS() doesn't include the '=' padding; it's defined as #define BASE64_CHARS(nbytes) DIV_ROUND_UP((nbytes) * 4, 3). Best regards, Guan-chun ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v2 5/5] ceph: replace local base64 encode/decode with generic lib/base64 helpers 2025-09-11 7:29 [PATCH v2 0/5] lib/base64: add generic encoder/decoder, migrate users Guan-Chun Wu ` (3 preceding siblings ...) 2025-09-11 7:45 ` [PATCH v2 4/5] fscrypt: replace local base64url helpers with generic lib/base64 helpers Guan-Chun Wu @ 2025-09-11 7:46 ` Guan-Chun Wu 4 siblings, 0 replies; 28+ messages in thread From: Guan-Chun Wu @ 2025-09-11 7:46 UTC (permalink / raw) To: 409411716 Cc: akpm, axboe, ceph-devel, ebiggers, hch, home7438072, idryomov, jaegeuk, kbusch, linux-fscrypt, linux-kernel, linux-nvme, sagi, tytso, visitorckw, xiubli Remove the local ceph_base64_encode and ceph_base64_decode functions and replace their usage with the generic base64_encode and base64_decode helpers from the kernel's lib/base64 library. This eliminates redundant implementations in Ceph, reduces code duplication, and leverages the optimized and well-maintained standard base64 code within the kernel. The change keeps the existing encoding table and disables padding, ensuring no functional or format changes. At the same time, Ceph also benefits from the optimized encoder/decoder: encoding performance improves by ~2.7x and decoding by ~12-15x compared to the previous local implementation. Overall, this improves maintainability, consistency with other kernel components, and runtime performance. Signed-off-by: Guan-Chun Wu <409411716@gms.tku.edu.tw> Reviewed-by: Kuan-Wei Chiu <visitorckw@gmail.com> --- fs/ceph/crypto.c | 53 +++++------------------------------------------- fs/ceph/crypto.h | 6 ++---- fs/ceph/dir.c | 5 +++-- fs/ceph/inode.c | 2 +- 4 files changed, 11 insertions(+), 55 deletions(-) diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c index cab722619..a3cb4ad99 100644 --- a/fs/ceph/crypto.c +++ b/fs/ceph/crypto.c @@ -21,53 +21,9 @@ * used the base64 encoding defined for IMAP mailbox names (RFC 3501) instead, * which replaces '-' and '_' by '+' and ','. */ -static const char base64_table[65] = +const char ceph_base64_table[65] = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+,"; -int ceph_base64_encode(const u8 *src, int srclen, char *dst) -{ - u32 ac = 0; - int bits = 0; - int i; - char *cp = dst; - - for (i = 0; i < srclen; i++) { - ac = (ac << 8) | src[i]; - bits += 8; - do { - bits -= 6; - *cp++ = base64_table[(ac >> bits) & 0x3f]; - } while (bits >= 6); - } - if (bits) - *cp++ = base64_table[(ac << (6 - bits)) & 0x3f]; - return cp - dst; -} - -int ceph_base64_decode(const char *src, int srclen, u8 *dst) -{ - u32 ac = 0; - int bits = 0; - int i; - u8 *bp = dst; - - for (i = 0; i < srclen; i++) { - const char *p = strchr(base64_table, src[i]); - - if (p == NULL || src[i] == 0) - return -1; - ac = (ac << 6) | (p - base64_table); - bits += 6; - if (bits >= 8) { - bits -= 8; - *bp++ = (u8)(ac >> bits); - } - } - if (ac & ((1 << bits) - 1)) - return -1; - return bp - dst; -} - static int ceph_crypt_get_context(struct inode *inode, void *ctx, size_t len) { struct ceph_inode_info *ci = ceph_inode(inode); @@ -316,7 +272,7 @@ int ceph_encode_encrypted_dname(struct inode *parent, char *buf, int elen) } /* base64 encode the encrypted name */ - elen = ceph_base64_encode(cryptbuf, len, p); + elen = base64_encode(cryptbuf, len, p, false, ceph_base64_table); doutc(cl, "base64-encoded ciphertext name = %.*s\n", elen, p); /* To understand the 240 limit, see CEPH_NOHASH_NAME_MAX comments */ @@ -410,7 +366,8 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname, tname = &_tname; } - declen = ceph_base64_decode(name, name_len, tname->name); + declen = base64_decode(name, name_len, + tname->name, false, ceph_base64_table); if (declen <= 0) { ret = -EIO; goto out; @@ -424,7 +381,7 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname, ret = fscrypt_fname_disk_to_usr(dir, 0, 0, &iname, oname); if (!ret && (dir != fname->dir)) { - char tmp_buf[CEPH_BASE64_CHARS(NAME_MAX)]; + char tmp_buf[BASE64_CHARS(NAME_MAX)]; name_len = snprintf(tmp_buf, sizeof(tmp_buf), "_%.*s_%ld", oname->len, oname->name, dir->i_ino); diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h index 23612b2e9..c94da3818 100644 --- a/fs/ceph/crypto.h +++ b/fs/ceph/crypto.h @@ -8,6 +8,7 @@ #include <crypto/sha2.h> #include <linux/fscrypt.h> +#include <linux/base64.h> #define CEPH_FSCRYPT_BLOCK_SHIFT 12 #define CEPH_FSCRYPT_BLOCK_SIZE (_AC(1, UL) << CEPH_FSCRYPT_BLOCK_SHIFT) @@ -89,10 +90,7 @@ static inline u32 ceph_fscrypt_auth_len(struct ceph_fscrypt_auth *fa) */ #define CEPH_NOHASH_NAME_MAX (180 - SHA256_DIGEST_SIZE) -#define CEPH_BASE64_CHARS(nbytes) DIV_ROUND_UP((nbytes) * 4, 3) - -int ceph_base64_encode(const u8 *src, int srclen, char *dst); -int ceph_base64_decode(const char *src, int srclen, u8 *dst); +extern const char ceph_base64_table[65]; void ceph_fscrypt_set_ops(struct super_block *sb); diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c index 8478e7e75..830715988 100644 --- a/fs/ceph/dir.c +++ b/fs/ceph/dir.c @@ -998,13 +998,14 @@ static int prep_encrypted_symlink_target(struct ceph_mds_request *req, if (err) goto out; - req->r_path2 = kmalloc(CEPH_BASE64_CHARS(osd_link.len) + 1, GFP_KERNEL); + req->r_path2 = kmalloc(BASE64_CHARS(osd_link.len) + 1, GFP_KERNEL); if (!req->r_path2) { err = -ENOMEM; goto out; } - len = ceph_base64_encode(osd_link.name, osd_link.len, req->r_path2); + len = base64_encode(osd_link.name, osd_link.len, + req->r_path2, false, ceph_base64_table); req->r_path2[len] = '\0'; out: fscrypt_fname_free_buffer(&osd_link); diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c index fc543075b..94b729ccc 100644 --- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@ -911,7 +911,7 @@ static int decode_encrypted_symlink(struct ceph_mds_client *mdsc, if (!sym) return -ENOMEM; - declen = ceph_base64_decode(encsym, enclen, sym); + declen = base64_decode(encsym, enclen, sym, false, ceph_base64_table); if (declen < 0) { pr_err_client(cl, "can't decode symlink (%d). Content: %.*s\n", -- 2.34.1 ^ permalink raw reply related [flat|nested] 28+ messages in thread
end of thread, other threads:[~2025-09-16 7:23 UTC | newest] Thread overview: 28+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-09-11 7:29 [PATCH v2 0/5] lib/base64: add generic encoder/decoder, migrate users Guan-Chun Wu 2025-09-11 7:32 ` [PATCH v2 1/5] lib/base64: Replace strchr() for better performance Guan-Chun Wu 2025-09-11 15:50 ` Caleb Sander Mateos 2025-09-11 16:02 ` Caleb Sander Mateos 2025-09-11 16:25 ` Kuan-Wei Chiu 2025-09-11 16:38 ` Kuan-Wei Chiu 2025-09-14 20:12 ` David Laight 2025-09-15 7:50 ` Kuan-Wei Chiu 2025-09-15 11:02 ` David Laight 2025-09-16 7:22 ` Kuan-Wei Chiu 2025-09-11 18:14 ` Eric Biggers 2025-09-11 18:44 ` Kuan-Wei Chiu 2025-09-11 18:49 ` Eric Biggers 2025-09-11 19:00 ` Kuan-Wei Chiu 2025-09-13 21:27 ` David Laight 2025-09-12 22:54 ` David Laight 2025-09-11 7:41 ` [PATCH v2 2/5] lib/base64: rework encoder/decoder with customizable support and update nvme-auth Guan-Chun Wu 2025-09-11 15:59 ` Caleb Sander Mateos 2025-09-12 7:21 ` Guan-Chun Wu 2025-09-11 18:27 ` Eric Biggers 2025-09-12 6:37 ` FIRST_NAME LAST_NAME 2025-09-12 6:52 ` Guan-Chun Wu 2025-09-12 7:15 ` Guan-Chun Wu 2025-09-11 7:45 ` [PATCH v2 3/5] lib: add KUnit tests for base64 encoding/decoding Guan-Chun Wu 2025-09-11 7:45 ` [PATCH v2 4/5] fscrypt: replace local base64url helpers with generic lib/base64 helpers Guan-Chun Wu 2025-09-11 18:47 ` Eric Biggers 2025-09-12 7:51 ` Guan-Chun Wu 2025-09-11 7:46 ` [PATCH v2 5/5] ceph: replace local base64 encode/decode " Guan-Chun Wu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).