From: Guan-Chun Wu <409411716@gms.tku.edu.tw>
To: David Laight <david.laight.linux@gmail.com>
Cc: Caleb Sander Mateos <csander@purestorage.com>,
akpm@linux-foundation.org, axboe@kernel.dk,
ceph-devel@vger.kernel.org, ebiggers@kernel.org, hch@lst.de,
home7438072@gmail.com, idryomov@gmail.com, jaegeuk@kernel.org,
kbusch@kernel.org, linux-fscrypt@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org,
sagi@grimberg.me, tytso@mit.edu, visitorckw@gmail.com,
xiubli@redhat.com
Subject: Re: [PATCH v3 2/6] lib/base64: Optimize base64_decode() with reverse lookup tables
Date: Mon, 27 Oct 2025 21:12:00 +0800 [thread overview]
Message-ID: <aP9voK9lE/MlanGl@wu-Pro-E500-G6-WS720T> (raw)
In-Reply-To: <20251014091420.173dfc9c@pumpkin>
On Tue, Oct 14, 2025 at 09:14:20AM +0100, David Laight wrote:
> On Mon, 13 Oct 2025 17:49:55 +0800
> Guan-Chun Wu <409411716@gms.tku.edu.tw> wrote:
>
> > On Fri, Oct 10, 2025 at 10:51:38AM +0100, David Laight wrote:
> > > On Thu, 9 Oct 2025 20:25:17 +0800
> > > Guan-Chun Wu <409411716@gms.tku.edu.tw> wrote:
> > >
> > > ...
> > > > As Eric mentioned, the decoder in fs/crypto/ needs to reject invalid input.
> > >
> > > (to avoid two different input buffers giving the same output)
> > >
> > > Which is annoyingly reasonable.
> > >
> > > > One possible solution I came up with is to first create a shared
> > > > base64_rev_common lookup table as the base for all Base64 variants.
> > > > Then, depending on the variant (e.g., BASE64_STD, BASE64_URLSAFE, etc.), we
> > > > can dynamically adjust the character mappings for position 62 and position 63
> > > > at runtime, based on the variant.
> > > >
> > > > Here are the changes to the code:
> > > >
> > > > static const s8 base64_rev_common[256] = {
> > > > [0 ... 255] = -1,
> > > > ['A'] = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
> > > > 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
> > > > ['a'] = 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
> > > > 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,
> > > > ['0'] = 52, 53, 54, 55, 56, 57, 58, 59, 60, 61,
> > > > };
> > > >
> > > > static const struct {
> > > > char char62, char63;
> > > > } base64_symbols[] = {
> > > > [BASE64_STD] = { '+', '/' },
> > > > [BASE64_URLSAFE] = { '-', '_' },
> > > > [BASE64_IMAP] = { '+', ',' },
> > > > };
> > > >
> > > > int base64_decode(const char *src, int srclen, u8 *dst, bool padding, enum base64_variant variant)
> > > > {
> > > > u8 *bp = dst;
> > > > u8 pad_cnt = 0;
> > > > s8 input1, input2, input3, input4;
> > > > u32 val;
> > > > s8 base64_rev_tables[256];
> > > >
> > > > /* Validate the input length for padding */
> > > > if (unlikely(padding && (srclen & 0x03) != 0))
> > > > return -1;
> > >
> > > There is no need for an early check.
> > > Pick it up after the loop when 'srclen != 0'.
> > >
> >
> > I think the early check is still needed, since I'm removing the
> > padding '=' first.
> > This makes the handling logic consistent for both padded and unpadded
> > inputs, and avoids extra if conditions for padding inside the hot loop.
>
> The 'invalid input' check will detect the padding.
> Then you don't get an extra check if there is no padding (probably normal).
> I realised I didn't get it quite right - updated below.
>
> >
> > > >
> > > > memcpy(base64_rev_tables, base64_rev_common, sizeof(base64_rev_common));
> > >
> > > Ugg - having a memcpy() here is not a good idea.
> > > It really is better to have 3 arrays, but use a 'mostly common' initialiser.
> > > Perhaps:
> > > #define BASE64_REV_INIT(ch_62, ch_63) = { \
> > > [0 ... 255] = -1, \
> > > ['A'] = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, \
> > > 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, \
> > > ['a'] = 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, \
> > > 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, \
> > > ['0'] = 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, \
> > > [ch_62] = 62, [ch_63] = 63, \
> > > }
> > >
> > > static const s8 base64_rev_maps[][256] = {
> > > [BASE64_STD] = BASE64_REV_INIT('+', '/'),
> > > [BASE64_URLSAFE] = BASE64_REV_INIT('-', '_'),
> > > [BASE64_IMAP] = BASE64_REV_INIT('+', ',')
> > > };
> > >
> > > Then (after validating variant):
> > > const s8 *map = base64_rev_maps[variant];
> > >
> >
> > Got it. I'll switch to using three static tables with a common initializer
> > as you suggested.
> >
> > > >
> > > > if (variant < BASE64_STD || variant > BASE64_IMAP)
> > > > return -1;
> > > >
> > > > base64_rev_tables[base64_symbols[variant].char62] = 62;
> > > > base64_rev_tables[base64_symbols[variant].char63] = 63;
> > > >
> > > > while (padding && srclen > 0 && src[srclen - 1] == '=') {
> > > > pad_cnt++;
> > > > srclen--;
> > > > if (pad_cnt > 2)
> > > > return -1;
> > > > }
> > >
> > > I'm not sure I'd to that there.
> > > You are (in some sense) optimising for padding.
> > > From what I remember, "abcd" gives 24 bits, "abc=" 16 and "ab==" 8.
> > >
> > > >
> > > > while (srclen >= 4) {
> > > > /* Decode the next 4 characters */
> > > > input1 = base64_rev_tables[(u8)src[0]];
> > > > input2 = base64_rev_tables[(u8)src[1]];
> > > > input3 = base64_rev_tables[(u8)src[2]];
> > > > input4 = base64_rev_tables[(u8)src[3]];
> > >
> > > I'd be tempted to make src[] unsigned - probably be assigning the parameter
> > > to a local at the top of the function.
> > >
> > > Also you have input3 = ... src[2]...
> > > Perhaps they should be input[0..3] instead.
> > >
> >
> > OK, I'll make the changes.
> >
> > > >
> > > > val = (input1 << 18) |
> > > > (input2 << 12) |
> > > > (input3 << 6) |
> > > > input4;
> > >
> > > Four lines is excessive, C doesn't require the () and I'm not sure the
> > > compilers complain about << and |.
> > >
> >
> > OK, I'll make the changes.
> >
> > > >
> > > > if (unlikely((s32)val < 0))
> > > > return -1;
> > >
> > > Make 'val' signed - then you don't need the cast.
> ...
> > > Or, if you really want to use the code below the loop:
> > > if (!padding || src[3] != '=')
> > > return -1;
> > > padding = 0;
> > > srclen -= 1 + (src[2] == '=');
> > > break;
>
> That is missing a test...
> Change to:
> if (!padding || srclen != 4 || src[3] != '=')
> return -1;
> padding = 0;
> srclen = src[2] == '=' ? 2 : 3;
> break;
>
> The compiler will then optimise away the first checks after the
> loop because it knows they can't happen.
Hi David,
I noticed your suggested approach:
val_24 = t[b[0]] | t[b[1]] << 6 | t[b[2]] << 12 | t[b[3]] << 18;
Per the C11 draft, this can lead to undefined behavior.
"If E1 has a signed type and nonnegative value, and E1 × 2^E2 is
representable in the result type, then that is the resulting value;
otherwise, the behavior is undefined."
Therefore, left-shifting a negative signed value is undefined behavior.
Perhaps we could change the code as shown below. What do you think?
int base64_decode(const char *src, int srclen, u8 *dst, bool padding, enum base64_variant variant)
{
u8 *bp = dst;
s8 input[4];
u32 val;
const u8 *s = (const u8 *)src;
const s8 *base64_rev_table = base64_rev_maps[variant];
while (srclen >= 4) {
input[0] = base64_rev_table[s[0]];
input[1] = base64_rev_table[s[1]];
input[2] = base64_rev_table[s[2]];
input[3] = base64_rev_table[s[3]];
if (unlikely((input[0] | input[1] | input[2] | input[3]) < 0)) {
if (!padding || srclen != 4 || s[3] != '=')
return -1;
padding = 0;
srclen = s[2] == '=' ? 2 : 3;
break;
}
val = (u32)input[0] << 18 | (u32)input[1] << 12 |
(u32)input[2] << 6 | (u32)input[3];
*bp++ = val >> 16;
*bp++ = val >> 8;
*bp++ = val;
s += 4;
srclen -= 4;
}
if (likely(!srclen))
return bp - dst;
if (padding || srclen == 1)
return -1;
input[0] = base64_rev_table[s[0]];
input[1] = base64_rev_table[s[1]];
if (unlikely(input[0] < 0 || input[1] < 0))
return -1;
val = (u32)input[0] << 18 | (u32)input[1] << 12;
if (srclen == 2) {
if (unlikely(input[1] & 0x0f))
return -1;
*bp++ = val >> 16;
} else {
input[2] = base64_rev_table[s[2]];
if (unlikely(input[2] < 0 || (input[2] & 0x03)))
return -1;
val |= (u32)input[2] << 6;
*bp++ = val >> 16;
*bp++ = val >> 8;
}
return bp - dst;
}
Best regards,
Guan-Chun
> > >
> > >
> > > >
> > > > *bp++ = (u8)(val >> 16);
> > > > *bp++ = (u8)(val >> 8);
> > > > *bp++ = (u8)val;
> > >
> > > You don't need those casts.
> > >
> >
> > OK, I'll make the changes.
> >
> > > >
> > > > src += 4;
> > > > srclen -= 4;
> > > > }
> > > >
> > > > /* Handle leftover characters when padding is not used */
> > >
> > > You are coming here with padding.
> > > I'm not sure what should happen without padding.
> > > For a multi-line file decode I suspect the characters need adding to
> > > the start of the next line (ie lines aren't required to contain
> > > multiples of 4 characters - even though they almost always will).
> > >
> >
> > Ah, my mistake. I forgot to remove that comment.
> > Based on my observation, base64_decode() should process the entire input
> > buffer in a single call, so I believe it does not need to handle
> > multi-line input.
>
> I was thinking of the the case where it is processing the output of
> something like base64encode.
> The caller will have separated out the lines, but I don't know whether
> every line has to contain a multiple of 4 characters - or whether the
> lines can be arbitrarily split after being encoded (I know that won't
> normally happen - but you never know).
>
> >
> > Best regards,
> > Guan-Chun
> >
> > > > if (srclen > 0) {
> > > > switch (srclen) {
> > >
> > > You don't need an 'if' and a 'switch'.
> > > srclen is likely to be zero, but perhaps write as:
> > > if (likely(!srclen))
> > > return bp - dst;
> > > if (padding || srclen == 1)
> > > return -1;
> > >
> > > val = base64_rev_tables[(u8)src[0]] << 12 | base64_rev_tables[(u8)src[1]] << 6;
> > > *bp++ = val >> 10;
> > > if (srclen == 1) {
> Obviously should be (srclen == 2)
> > > if (val & 0x800003ff)
> > > return -1;
> > > } else {
> > > val |= base64_rev_tables[(u8)src[2]];
> > > if (val & 0x80000003)
> > > return -1;
> > > *bp++ = val >> 2;
> > > }
> > > return bp - dst;
> > > }
> > >
> > > David
>
> David
>
> > >
> > > > case 2:
> > > > input1 = base64_rev_tables[(u8)src[0]];
> > > > input2 = base64_rev_tables[(u8)src[1]];
> > > > val = (input1 << 6) | input2; /* 12 bits */
> > > > if (unlikely((s32)val < 0 || val & 0x0F))
> > > > return -1;
> > > >
> > > > *bp++ = (u8)(val >> 4);
> > > > break;
> > > > case 3:
> > > > input1 = base64_rev_tables[(u8)src[0]];
> > > > input2 = base64_rev_tables[(u8)src[1]];
> > > > input3 = base64_rev_tables[(u8)src[2]];
> > > >
> > > > val = (input1 << 12) |
> > > > (input2 << 6) |
> > > > input3; /* 18 bits */
> > > > if (unlikely((s32)val < 0 || val & 0x03))
> > > > return -1;
> > > >
> > > > *bp++ = (u8)(val >> 10);
> > > > *bp++ = (u8)(val >> 2);
> > > > break;
> > > > default:
> > > > return -1;
> > > > }
> > > > }
> > > >
> > > > return bp - dst;
> > > > }
> > > > Based on KUnit testing, the performance results are as follows:
> > > > base64_performance_tests: [64B] decode run : 40ns
> > > > base64_performance_tests: [1KB] decode run : 463ns
> > > >
> > > > However, this approach introduces an issue. It uses 256 bytes of memory
> > > > on the stack for base64_rev_tables, which might not be ideal. Does anyone
> > > > have any thoughts or alternative suggestions to solve this issue, or is it
> > > > not really a concern?
> > > >
> > > > Best regards,
> > > > Guan-Chun
> > > >
> > > > > >
> > > > > > Best,
> > > > > > Caleb
> > > > >
> > >
>
next prev parent reply other threads:[~2025-10-27 13:12 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-26 6:52 [PATCH v3 0/6] lib/base64: add generic encoder/decoder, migrate users Guan-Chun Wu
2025-09-26 6:55 ` [PATCH v3 1/6] lib/base64: Add support for multiple variants Guan-Chun Wu
2025-09-30 23:56 ` Caleb Sander Mateos
2025-10-01 14:09 ` Guan-Chun Wu
2025-09-26 6:55 ` [PATCH v3 2/6] lib/base64: Optimize base64_decode() with reverse lookup tables Guan-Chun Wu
2025-09-26 23:33 ` Caleb Sander Mateos
2025-09-28 6:37 ` Kuan-Wei Chiu
2025-10-01 10:18 ` Guan-Chun Wu
2025-10-01 16:20 ` Caleb Sander Mateos
2025-10-05 17:18 ` David Laight
2025-10-07 8:28 ` Guan-Chun Wu
2025-10-07 14:57 ` Caleb Sander Mateos
2025-10-07 17:11 ` Eric Biggers
2025-10-07 18:23 ` David Laight
2025-10-09 12:25 ` Guan-Chun Wu
2025-10-10 9:51 ` David Laight
2025-10-13 9:49 ` Guan-Chun Wu
2025-10-14 8:14 ` David Laight
2025-10-16 10:07 ` Guan-Chun Wu
2025-10-27 13:12 ` Guan-Chun Wu [this message]
2025-10-27 14:18 ` David Laight
2025-10-28 6:58 ` Guan-Chun Wu
2025-09-28 18:57 ` David Laight
2025-09-26 6:56 ` [PATCH v3 3/6] lib/base64: rework encode/decode for speed and stricter validation Guan-Chun Wu
2025-10-01 0:11 ` Caleb Sander Mateos
2025-10-01 9:39 ` Guan-Chun Wu
2025-10-06 20:52 ` David Laight
2025-10-07 8:34 ` Guan-Chun Wu
2025-09-26 6:56 ` [PATCH v3 4/6] lib: add KUnit tests for base64 encoding/decoding Guan-Chun Wu
2025-09-26 6:56 ` [PATCH v3 5/6] fscrypt: replace local base64url helpers with lib/base64 Guan-Chun Wu
2025-09-26 6:57 ` [PATCH v3 6/6] ceph: replace local base64 " Guan-Chun Wu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aP9voK9lE/MlanGl@wu-Pro-E500-G6-WS720T \
--to=409411716@gms.tku.edu.tw \
--cc=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=ceph-devel@vger.kernel.org \
--cc=csander@purestorage.com \
--cc=david.laight.linux@gmail.com \
--cc=ebiggers@kernel.org \
--cc=hch@lst.de \
--cc=home7438072@gmail.com \
--cc=idryomov@gmail.com \
--cc=jaegeuk@kernel.org \
--cc=kbusch@kernel.org \
--cc=linux-fscrypt@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=sagi@grimberg.me \
--cc=tytso@mit.edu \
--cc=visitorckw@gmail.com \
--cc=xiubli@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox