From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f50.google.com (mail-wm1-f50.google.com [209.85.128.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 858B92C159D for ; Fri, 10 Oct 2025 09:51:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.50 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760089909; cv=none; b=GcYCPi5PWfQzFxBl1R5Dxajk4Bo1Fji+i5rq5dB/d86bYscyb7CSpt+OyS0H+s4YoYGxUoJWc6NUw3Z5nBrKGWTvUweOXKwTS28jgGzIcckoA+yuzDJd0mYcVdIg0bKEOrzs/DWhyFkDZ80OvJh+KLX+0wX8S8DN2N0mCu/QUFM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760089909; c=relaxed/simple; bh=0gpFLDbwtlyVkGYGGMoUWd2mlJVVmwWM/n+HgLvMV+U=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=IDv+Yt+BCMBRSJWMsX+dvZptQS1e3pSoCONZ13PGO5vf53/qfizH8AXnGSszc2Hp0jhyrQ5ew6gIyROPvxO7PNT2NCKyr1hFpRKugDzGvlKv2LfMOCAmU8xNoBr0fBwQwfteoTXCmhwkZ4p5awLI3UgQ+4c5p75bMEdRatNKWDM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=YU0Ny0HD; arc=none smtp.client-ip=209.85.128.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="YU0Ny0HD" Received: by mail-wm1-f50.google.com with SMTP id 5b1f17b1804b1-46e6ba26c50so10908115e9.2 for ; Fri, 10 Oct 2025 02:51:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1760089904; x=1760694704; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=eV0VkyvZwjNec7z9vbdRZ7vdRavcOqLYA4ozWnex1xw=; b=YU0Ny0HDH31cOZUgDkKteUlHTCWyIiusJtOAfkapl1o2AwMGd1reGGslLpSjwVa3Zh j4JZggt41K3ZroaZ4+zdSaKgRlRGqQnE6gXFsiqlhHj208ffPHToYRUlBaOiOHkizkfT T2HJTTdrn3LQvCPXk1A26MlM5L/SPjNrA6ivBzI0YwfFUEyQuHpb9s8KDUV8NBfOOZIc ZjUKpKra7YNc5TFmhwmtDHJsUiMBp9rYviraibsv1fjdMW29cDqavKIdbBwQzI+RKEPe Y/udO4bQCK4mTj23ZrGO9lix2gHkr6sEr4sNKW73j/+l8IW+C/Tc8TGvJzI+7EA7r2Ez 8GWQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1760089904; x=1760694704; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=eV0VkyvZwjNec7z9vbdRZ7vdRavcOqLYA4ozWnex1xw=; b=geKB1M4NDtin8Zw3cC+JljdJ8oqxGy0PkLwxwzs7BGAmEP9VPrayXVLY3S7lhlWEeG 1lwnc05tJY8A5Gwg/8OHy/0UYGwVlo5DkANEt59H3so4yw5JxuTSSnikHUNH+toKNpbp 95j1usZDxfHz77edVP0H+hnl+0GYzT5aAcJegJ1D+IArm/q3fr7nlnecRZh4NMfQ7ar+ Tm8Pp1ks+pbSfzBkh9iZGqIHCpI2goZ19yg0oh/av2cn9LPOE+7pFel3fMIUTZbHORyf NlrijXiqhaMXuISvyTfrF3brp6Yk4wei3/Q0nRWyaV+4UTFwF4YZMWODn5JLrtx10NSg 2+Vw== X-Forwarded-Encrypted: i=1; AJvYcCUQWvrMGZ0R3W01bAvV8xxspslrgQnUTr8rTAk2QdtUIdlPzv+lEYx3XcPkOQIzcwwKVx8ik7/PiQdjB5w=@vger.kernel.org X-Gm-Message-State: AOJu0YzHio8+2LXHyeH6AwXgXXP2WYxdQZJit9QPzhoCg0F3GYrsQ6bu eEBehkDXv3Vl10B4oJ5bbKnV6jO2p5PpMiPINgtTLnYFR5kZcL3M1htovL1yyA== X-Gm-Gg: ASbGnctbwpkRf1iUyi57vJuDqIyKtH/s4hOhsrJi0eZww+8usTTkedpiA90anEdsrDj YTllXFQ4j7tCo5+2Tuey7buxc8qJ2f05JfSrU4T173mxCJmY+WtGHy6WfDmbOb0uHgpWVue7gQ8 98ejGoatoS/1HODNUiQIeikyfytQJzEfdGRiYI8TYZaq7AVUhRqUtT6fzFYn0D3/KrTB+pCXt5C d+Kiy16/A+NW59y1XaQCslA889n9mclSn2K8y8/kHdPdjLiPlzYunZN6AxIIpIpwMP29V3LmDgi nIj9qK9LXPeV7RfxjoJVeLrVoicVzXy/vYed2Z/8xqs9qpL42QvvyAQXBU0y0GzeNNw58Iu4yxc 8KDFvYDQoCS4cJi4SF/3cn+tvd7qopw7fvbI+NZ1BpbvhgJCx3zJrJoIKEKoYMGV1ZmyuLvQNQO GG0GejuC0= X-Google-Smtp-Source: AGHT+IEA2a9V9moY9Q0X8l4fa8n2FoyS2Xm8feyo+zRRNYARtth7YTtorp1JP7E6mAtA18UXC+S7bw== X-Received: by 2002:a05:6000:603:b0:401:41a9:524f with SMTP id ffacd0b85a97d-4266e7d452bmr7539955f8f.29.1760089903522; Fri, 10 Oct 2025 02:51:43 -0700 (PDT) Received: from pumpkin (82-69-66-36.dsl.in-addr.zen.co.uk. [82.69.66.36]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-426ce5e83e1sm3392578f8f.51.2025.10.10.02.51.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 Oct 2025 02:51:43 -0700 (PDT) Date: Fri, 10 Oct 2025 10:51:38 +0100 From: David Laight To: Guan-Chun Wu <409411716@gms.tku.edu.tw> Cc: Caleb Sander Mateos , akpm@linux-foundation.org, axboe@kernel.dk, ceph-devel@vger.kernel.org, ebiggers@kernel.org, hch@lst.de, home7438072@gmail.com, idryomov@gmail.com, jaegeuk@kernel.org, kbusch@kernel.org, linux-fscrypt@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org, sagi@grimberg.me, tytso@mit.edu, visitorckw@gmail.com, xiubli@redhat.com Subject: Re: [PATCH v3 2/6] lib/base64: Optimize base64_decode() with reverse lookup tables Message-ID: <20251010105138.0356ad75@pumpkin> In-Reply-To: References: <20250926065235.13623-1-409411716@gms.tku.edu.tw> <20250926065556.14250-1-409411716@gms.tku.edu.tw> <20251005181803.0ba6aee4@pumpkin> <20251007192327.57f00588@pumpkin> X-Mailer: Claws Mail 4.1.1 (GTK 3.24.38; arm-unknown-linux-gnueabihf) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable On Thu, 9 Oct 2025 20:25:17 +0800 Guan-Chun Wu <409411716@gms.tku.edu.tw> wrote: ... > As Eric mentioned, the decoder in fs/crypto/ needs to reject invalid inpu= t. (to avoid two different input buffers giving the same output) Which is annoyingly reasonable. > One possible solution I came up with is to first create a shared > base64_rev_common lookup table as the base for all Base64 variants. > Then, depending on the variant (e.g., BASE64_STD, BASE64_URLSAFE, etc.), = we > can dynamically adjust the character mappings for position 62 and positio= n 63 > at runtime, based on the variant. >=20 > Here are the changes to the code: >=20 > static const s8 base64_rev_common[256] =3D { > [0 ... 255] =3D -1, > ['A'] =3D 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, > 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, > ['a'] =3D 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, > 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, > ['0'] =3D 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, > }; >=20 > static const struct { > char char62, char63; > } base64_symbols[] =3D { > [BASE64_STD] =3D { '+', '/' }, > [BASE64_URLSAFE] =3D { '-', '_' }, > [BASE64_IMAP] =3D { '+', ',' }, > }; >=20 > int base64_decode(const char *src, int srclen, u8 *dst, bool padding, enu= m base64_variant variant) > { > u8 *bp =3D dst; > u8 pad_cnt =3D 0; > s8 input1, input2, input3, input4; > u32 val; > s8 base64_rev_tables[256]; >=20 > /* Validate the input length for padding */ > if (unlikely(padding && (srclen & 0x03) !=3D 0)) > return -1; There is no need for an early check. Pick it up after the loop when 'srclen !=3D 0'. >=20 > memcpy(base64_rev_tables, base64_rev_common, sizeof(base64_rev_common)); Ugg - having a memcpy() here is not a good idea. It really is better to have 3 arrays, but use a 'mostly common' initialiser. Perhaps: #define BASE64_REV_INIT(ch_62, ch_63) =3D { \ [0 ... 255] =3D -1, \ ['A'] =3D 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, \ 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, \ ['a'] =3D 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, \ 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, \ ['0'] =3D 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, \ [ch_62] =3D 62, [ch_63] =3D 63, \ } static const s8 base64_rev_maps[][256] =3D { [BASE64_STD] =3D BASE64_REV_INIT('+', '/'), [BASE64_URLSAFE] =3D BASE64_REV_INIT('-', '_'), [BASE64_IMAP] =3D BASE64_REV_INIT('+', ',') }; Then (after validating variant): const s8 *map =3D base64_rev_maps[variant]; >=20 > if (variant < BASE64_STD || variant > BASE64_IMAP) > return -1; >=20 > base64_rev_tables[base64_symbols[variant].char62] =3D 62; > base64_rev_tables[base64_symbols[variant].char63] =3D 63; >=20 > while (padding && srclen > 0 && src[srclen - 1] =3D=3D '=3D') { > pad_cnt++; > srclen--; > if (pad_cnt > 2) > return -1; > } I'm not sure I'd to that there. You are (in some sense) optimising for padding. =46rom what I remember, "abcd" gives 24 bits, "abc=3D" 16 and "ab=3D=3D" 8. >=20 > while (srclen >=3D 4) { > /* Decode the next 4 characters */ > input1 =3D base64_rev_tables[(u8)src[0]]; > input2 =3D base64_rev_tables[(u8)src[1]]; > input3 =3D base64_rev_tables[(u8)src[2]]; > input4 =3D base64_rev_tables[(u8)src[3]]; I'd be tempted to make src[] unsigned - probably be assigning the parameter to a local at the top of the function. Also you have input3 =3D ... src[2]... Perhaps they should be input[0..3] instead. >=20 > val =3D (input1 << 18) | > (input2 << 12) | > (input3 << 6) | > input4; Four lines is excessive, C doesn't require the () and I'm not sure the compilers complain about << and |. >=20 > if (unlikely((s32)val < 0)) > return -1; Make 'val' signed - then you don't need the cast. You can pick up the padding check here, something like: val =3D input1 << 18 | input2 << 12; if (!padding || val < 0 || src[3] !=3D '=3D') return -1; *bp++ =3D val >> 16; if (src[2] =3D=3D '=3D') return bp - dst; if (input3 < 0) return -1; val |=3D input3 << 6; *bp++ =3D val >> 8; return bp - dst; Or, if you really want to use the code below the loop: if (!padding || src[3] !=3D '=3D') return -1; padding =3D 0; srclen -=3D 1 + (src[2] =3D=3D '=3D'); break; >=20 > *bp++ =3D (u8)(val >> 16); > *bp++ =3D (u8)(val >> 8); > *bp++ =3D (u8)val; You don't need those casts. >=20 > src +=3D 4; > srclen -=3D 4; > } >=20 > /* Handle leftover characters when padding is not used */ You are coming here with padding. I'm not sure what should happen without padding. For a multi-line file decode I suspect the characters need adding to the start of the next line (ie lines aren't required to contain multiples of 4 characters - even though they almost always will). > if (srclen > 0) { > switch (srclen) { You don't need an 'if' and a 'switch'. srclen is likely to be zero, but perhaps write as: if (likely(!srclen)) return bp - dst; if (padding || srclen =3D=3D 1) return -1; val =3D base64_rev_tables[(u8)src[0]] << 12 | base64_rev_tables[(u8)src[1]= ] << 6; *bp++ =3D val >> 10; if (srclen =3D=3D 1) { if (val & 0x800003ff) return -1; } else { val |=3D base64_rev_tables[(u8)src[2]]; if (val & 0x80000003) return -1; *bp++ =3D val >> 2; } return bp - dst; } David > case 2: > input1 =3D base64_rev_tables[(u8)src[0]]; > input2 =3D base64_rev_tables[(u8)src[1]]; > val =3D (input1 << 6) | input2; /* 12 bits */ > if (unlikely((s32)val < 0 || val & 0x0F)) > return -1; >=20 > *bp++ =3D (u8)(val >> 4); > break; > case 3: > input1 =3D base64_rev_tables[(u8)src[0]]; > input2 =3D base64_rev_tables[(u8)src[1]]; > input3 =3D base64_rev_tables[(u8)src[2]]; >=20 > val =3D (input1 << 12) | > (input2 << 6) | > input3; /* 18 bits */ > if (unlikely((s32)val < 0 || val & 0x03)) > return -1; >=20 > *bp++ =3D (u8)(val >> 10); > *bp++ =3D (u8)(val >> 2); > break; > default: > return -1; > } > } >=20 > return bp - dst; > } > Based on KUnit testing, the performance results are as follows: > base64_performance_tests: [64B] decode run : 40ns > base64_performance_tests: [1KB] decode run : 463ns >=20 > However, this approach introduces an issue. It uses 256 bytes of memory > on the stack for base64_rev_tables, which might not be ideal. Does anyone > have any thoughts or alternative suggestions to solve this issue, or is it > not really a concern? >=20 > Best regards, > Guan-Chun >=20 > > >=20 > > > Best, > > > Caleb =20 > > =20