All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Laight <david.laight.linux@gmail.com>
To: Guan-Chun Wu <409411716@gms.tku.edu.tw>
Cc: Caleb Sander Mateos <csander@purestorage.com>,
	akpm@linux-foundation.org, axboe@kernel.dk,
	ceph-devel@vger.kernel.org, ebiggers@kernel.org, hch@lst.de,
	home7438072@gmail.com, idryomov@gmail.com, jaegeuk@kernel.org,
	kbusch@kernel.org, linux-fscrypt@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org,
	sagi@grimberg.me, tytso@mit.edu, visitorckw@gmail.com,
	xiubli@redhat.com
Subject: Re: [PATCH v3 3/6] lib/base64: rework encode/decode for speed and stricter validation
Date: Mon, 6 Oct 2025 21:52:12 +0100	[thread overview]
Message-ID: <20251006215212.2920d571@pumpkin> (raw)
In-Reply-To: <aNz21InCM4Pa93TL@wu-Pro-E500-G6-WS720T>

On Wed, 1 Oct 2025 17:39:32 +0800
Guan-Chun Wu <409411716@gms.tku.edu.tw> wrote:

> On Tue, Sep 30, 2025 at 05:11:12PM -0700, Caleb Sander Mateos wrote:
> > On Fri, Sep 26, 2025 at 12:01 AM Guan-Chun Wu <409411716@gms.tku.edu.tw> wrote:  
> > >
> > > The old base64 implementation relied on a bit-accumulator loop, which was
> > > slow for larger inputs and too permissive in validation. It would accept
> > > extra '=', missing '=', or even '=' appearing in the middle of the input,
> > > allowing malformed strings to pass. This patch reworks the internals to
> > > improve performance and enforce stricter validation.
> > >
> > > Changes:
> > >  - Encoder:
> > >    * Process input in 3-byte blocks, mapping 24 bits into four 6-bit
> > >      symbols, avoiding bit-by-bit shifting and reducing loop iterations.
> > >    * Handle the final 1-2 leftover bytes explicitly and emit '=' only when
> > >      requested.
> > >  - Decoder:
> > >    * Based on the reverse lookup tables from the previous patch, decode
> > >      input in 4-character groups.
> > >    * Each group is looked up directly, converted into numeric values, and
> > >      combined into 3 output bytes.
> > >    * Explicitly handle padded and unpadded forms:
> > >       - With padding: input length must be a multiple of 4, and '=' is
> > >         allowed only in the last two positions. Reject stray or early '='.
> > >       - Without padding: validate tail lengths (2 or 3 chars) and require
> > >         unused low bits to be zero.
> > >    * Removed the bit-accumulator style loop to reduce loop iterations.
> > >
> > > Performance (x86_64, Intel Core i7-10700 @ 2.90GHz, avg over 1000 runs,
> > > KUnit):
> > >
> > > Encode:
> > >   64B   ~90ns   -> ~32ns   (~2.8x)
> > >   1KB  ~1332ns  -> ~510ns  (~2.6x)
> > >
> > > Decode:
> > >   64B  ~1530ns  -> ~64ns   (~23.9x)
> > >   1KB ~27726ns  -> ~982ns  (~28.3x)
> > >
> > > Co-developed-by: Kuan-Wei Chiu <visitorckw@gmail.com>
> > > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
> > > Co-developed-by: Yu-Sheng Huang <home7438072@gmail.com>
> > > Signed-off-by: Yu-Sheng Huang <home7438072@gmail.com>
> > > Signed-off-by: Guan-Chun Wu <409411716@gms.tku.edu.tw>
> > > ---
> > >  lib/base64.c | 150 +++++++++++++++++++++++++++++++++++++--------------
> > >  1 file changed, 110 insertions(+), 40 deletions(-)
> > >
> > > diff --git a/lib/base64.c b/lib/base64.c
> > > index b20fdf168..fd1db4611 100644
> > > --- a/lib/base64.c
> > > +++ b/lib/base64.c
> > > @@ -93,26 +93,43 @@ static const s8 base64_rev_tables[][256] = {
> > >  int base64_encode(const u8 *src, int srclen, char *dst, bool padding, enum base64_variant variant)
> > >  {
> > >         u32 ac = 0;
> > > -       int bits = 0;
> > > -       int i;
> > >         char *cp = dst;
> > >         const char *base64_table = base64_tables[variant];
> > >
> > > -       for (i = 0; i < srclen; i++) {
> > > -               ac = (ac << 8) | src[i];
> > > -               bits += 8;
> > > -               do {
> > > -                       bits -= 6;
> > > -                       *cp++ = base64_table[(ac >> bits) & 0x3f];
> > > -               } while (bits >= 6);
> > > -       }
> > > -       if (bits) {
> > > -               *cp++ = base64_table[(ac << (6 - bits)) & 0x3f];
> > > -               bits -= 6;
> > > +       while (srclen >= 3) {
> > > +               ac = ((u32)src[0] << 16) |
> > > +                        ((u32)src[1] << 8) |
> > > +                        (u32)src[2];
> > > +
> > > +               *cp++ = base64_table[ac >> 18];
> > > +               *cp++ = base64_table[(ac >> 12) & 0x3f];
> > > +               *cp++ = base64_table[(ac >> 6) & 0x3f];
> > > +               *cp++ = base64_table[ac & 0x3f];
> > > +
> > > +               src += 3;
> > > +               srclen -= 3;
> > >         }
> > > -       while (bits < 0) {
> > > -               *cp++ = '=';
> > > -               bits += 2;
> > > +
> > > +       switch (srclen) {
> > > +       case 2:
> > > +               ac = ((u32)src[0] << 16) |
> > > +                    ((u32)src[1] << 8);
> > > +
> > > +               *cp++ = base64_table[ac >> 18];
> > > +               *cp++ = base64_table[(ac >> 12) & 0x3f];
> > > +               *cp++ = base64_table[(ac >> 6) & 0x3f];
> > > +               if (padding)
> > > +                       *cp++ = '=';
> > > +               break;
> > > +       case 1:
> > > +               ac = ((u32)src[0] << 16);
> > > +               *cp++ = base64_table[ac >> 18];
> > > +               *cp++ = base64_table[(ac >> 12) & 0x3f];
> > > +               if (padding) {
> > > +                       *cp++ = '=';
> > > +                       *cp++ = '=';
> > > +               }
> > > +               break;
> > >         }
> > >         return cp - dst;
> > >  }
> > > @@ -128,39 +145,92 @@ EXPORT_SYMBOL_GPL(base64_encode);
> > >   *
> > >   * Decodes a string using the selected Base64 variant.
> > >   *
> > > - * This implementation hasn't been optimized for performance.
> > > - *
> > >   * Return: the length of the resulting decoded binary data in bytes,
> > >   *        or -1 if the string isn't a valid Base64 string.
> > >   */
> > >  int base64_decode(const char *src, int srclen, u8 *dst, bool padding, enum base64_variant variant)
> > >  {
> > > -       u32 ac = 0;
> > > -       int bits = 0;
> > > -       int i;
> > >         u8 *bp = dst;
> > > -       s8 ch;
> > > -
> > > -       for (i = 0; i < srclen; i++) {
> > > -               if (src[i] == '=') {
> > > -                       ac = (ac << 6);
> > > -                       bits += 6;
> > > -                       if (bits >= 8)
> > > -                               bits -= 8;
> > > -                       continue;
> > > -               }
> > > -               ch = base64_rev_tables[variant][(u8)src[i]];
> > > -               if (ch == -1)
> > > +       s8 input1, input2, input3, input4;
> > > +       u32 val;
> > > +
> > > +       if (srclen == 0)
> > > +               return 0;  
> > 
> > Doesn't look like this special case is necessary; all the if and while
> > conditions below are false if srclen == 0, so the function will just
> > end up returning 0 in that case anyways. It would be nice to avoid
> > this branch, especially as it seems like an uncommon case.
> >  
> 
> You're right. I'll remove it. Thanks.
> 
> > > +
> > > +       /* Validate the input length for padding */
> > > +       if (unlikely(padding && (srclen & 0x03) != 0))
> > > +               return -1;
> > > +
> > > +       while (srclen >= 4) {
> > > +               /* Decode the next 4 characters */
> > > +               input1 = base64_rev_tables[variant][(u8)src[0]];
> > > +               input2 = base64_rev_tables[variant][(u8)src[1]];
> > > +               input3 = base64_rev_tables[variant][(u8)src[2]];
> > > +               input4 = base64_rev_tables[variant][(u8)src[3]];
> > > +
> > > +               /* Return error if any Base64 character is invalid */
> > > +               if (unlikely(input1 < 0 || input2 < 0 || (!padding && (input3 < 0 || input4 < 0))))
> > > +                       return -1;
> > > +
> > > +               /* Handle padding */
> > > +               if (unlikely(padding && ((input3 < 0 && input4 >= 0) ||
> > > +                                        (input3 < 0 && src[2] != '=') ||
> > > +                                        (input4 < 0 && src[3] != '=') ||
> > > +                                        (srclen > 4 && (input3 < 0 || input4 < 0)))))  
> > 
> > Would be preferable to check and strip the padding (i.e. decrease
> > srclen) before this main loop. That way we could avoid several
> > branches in this hot loop that are only necessary to handle the
> > padding chars.
> >   
> 
> You're right. As long as we check and strip the padding first, the
> behavior with or without padding can be the same, and it could also
> reduce some unnecessary branches. I'll make the change.

As I said earlier.
Calculate 'val' first using signed arithmetic.
If it is non-negative there are three bytes to write.
If negative then check for src[2] and src[3] being '=' (etc) before erroring out.

That way there is only one check in the normal path.

	David

> 
> Best regards,
> Guan-Chun
> 
> > > +                       return -1;
> > > +               val = ((u32)input1 << 18) |
> > > +                     ((u32)input2 << 12) |
> > > +                     ((u32)((input3 < 0) ? 0 : input3) << 6) |
> > > +                     (u32)((input4 < 0) ? 0 : input4);
> > > +
> > > +               *bp++ = (u8)(val >> 16);
> > > +
> > > +               if (input3 >= 0)
> > > +                       *bp++ = (u8)(val >> 8);
> > > +               if (input4 >= 0)
> > > +                       *bp++ = (u8)val;
> > > +
> > > +               src += 4;
> > > +               srclen -= 4;
> > > +       }
> > > +
> > > +       /* Handle leftover characters when padding is not used */
> > > +       if (!padding && srclen > 0) {
> > > +               switch (srclen) {
> > > +               case 2:
> > > +                       input1 = base64_rev_tables[variant][(u8)src[0]];
> > > +                       input2 = base64_rev_tables[variant][(u8)src[1]];
> > > +                       if (unlikely(input1 < 0 || input2 < 0))
> > > +                               return -1;
> > > +
> > > +                       val = ((u32)input1 << 6) | (u32)input2; /* 12 bits */
> > > +                       if (unlikely(val & 0x0F))
> > > +                               return -1; /* low 4 bits must be zero */
> > > +
> > > +                       *bp++ = (u8)(val >> 4);
> > > +                       break;
> > > +               case 3:
> > > +                       input1 = base64_rev_tables[variant][(u8)src[0]];
> > > +                       input2 = base64_rev_tables[variant][(u8)src[1]];
> > > +                       input3 = base64_rev_tables[variant][(u8)src[2]];
> > > +                       if (unlikely(input1 < 0 || input2 < 0 || input3 < 0))
> > > +                               return -1;
> > > +
> > > +                       val = ((u32)input1 << 12) |
> > > +                             ((u32)input2 << 6) |
> > > +                             (u32)input3; /* 18 bits */
> > > +
> > > +                       if (unlikely(val & 0x03))
> > > +                               return -1; /* low 2 bits must be zero */
> > > +
> > > +                       *bp++ = (u8)(val >> 10);
> > > +                       *bp++ = (u8)((val >> 2) & 0xFF);  
> > 
> > "& 0xFF" is redundant with the cast to u8.
> > 
> > Best,
> > Caleb
> >   
> > > +                       break;
> > > +               default:
> > >                         return -1;
> > > -               ac = (ac << 6) | ch;
> > > -               bits += 6;
> > > -               if (bits >= 8) {
> > > -                       bits -= 8;
> > > -                       *bp++ = (u8)(ac >> bits);
> > >                 }
> > >         }
> > > -       if (ac & ((1 << bits) - 1))
> > > -               return -1;
> > > +
> > >         return bp - dst;
> > >  }
> > >  EXPORT_SYMBOL_GPL(base64_decode);
> > > --
> > > 2.34.1
> > >
> > >  
> 


  reply	other threads:[~2025-10-06 20:52 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-26  6:52 [PATCH v3 0/6] lib/base64: add generic encoder/decoder, migrate users Guan-Chun Wu
2025-09-26  6:55 ` [PATCH v3 1/6] lib/base64: Add support for multiple variants Guan-Chun Wu
2025-09-30 23:56   ` Caleb Sander Mateos
2025-10-01 14:09     ` Guan-Chun Wu
2025-09-26  6:55 ` [PATCH v3 2/6] lib/base64: Optimize base64_decode() with reverse lookup tables Guan-Chun Wu
2025-09-26 23:33   ` Caleb Sander Mateos
2025-09-28  6:37     ` Kuan-Wei Chiu
2025-10-01 10:18     ` Guan-Chun Wu
2025-10-01 16:20       ` Caleb Sander Mateos
2025-10-05 17:18         ` David Laight
2025-10-07  8:28           ` Guan-Chun Wu
2025-10-07 14:57             ` Caleb Sander Mateos
2025-10-07 17:11               ` Eric Biggers
2025-10-07 18:23               ` David Laight
2025-10-09 12:25                 ` Guan-Chun Wu
2025-10-10  9:51                   ` David Laight
2025-10-13  9:49                     ` Guan-Chun Wu
2025-10-14  8:14                       ` David Laight
2025-10-16 10:07                         ` Guan-Chun Wu
2025-10-27 13:12                         ` Guan-Chun Wu
2025-10-27 14:18                           ` David Laight
2025-10-28  6:58                             ` Guan-Chun Wu
2025-09-28 18:57   ` David Laight
2025-09-26  6:56 ` [PATCH v3 3/6] lib/base64: rework encode/decode for speed and stricter validation Guan-Chun Wu
2025-10-01  0:11   ` Caleb Sander Mateos
2025-10-01  9:39     ` Guan-Chun Wu
2025-10-06 20:52       ` David Laight [this message]
2025-10-07  8:34         ` Guan-Chun Wu
2025-09-26  6:56 ` [PATCH v3 4/6] lib: add KUnit tests for base64 encoding/decoding Guan-Chun Wu
2025-09-26  6:56 ` [PATCH v3 5/6] fscrypt: replace local base64url helpers with lib/base64 Guan-Chun Wu
2025-09-26  6:57 ` [PATCH v3 6/6] ceph: replace local base64 " Guan-Chun Wu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251006215212.2920d571@pumpkin \
    --to=david.laight.linux@gmail.com \
    --cc=409411716@gms.tku.edu.tw \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=ceph-devel@vger.kernel.org \
    --cc=csander@purestorage.com \
    --cc=ebiggers@kernel.org \
    --cc=hch@lst.de \
    --cc=home7438072@gmail.com \
    --cc=idryomov@gmail.com \
    --cc=jaegeuk@kernel.org \
    --cc=kbusch@kernel.org \
    --cc=linux-fscrypt@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    --cc=tytso@mit.edu \
    --cc=visitorckw@gmail.com \
    --cc=xiubli@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.