From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f51.google.com (mail-wm1-f51.google.com [209.85.128.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4B7A82459DD for ; Sun, 16 Nov 2025 19:35:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.51 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763321723; cv=none; b=h706KK4ZpY+/S9YLNbWNV0xySqslvDHqD1mjBKSdq1hfb/frvEbhQ7OY8Qy7neWJTBojZvZGwolup7DXgPlizPbav1GpnMZcT8pk1QR92Wb5JeDb0f7StzS37WuPbGcqNeL32sr8qWzsA9tg4UrecXSARN1OsphNPzzNOgQTHUE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763321723; c=relaxed/simple; bh=+XLyfB9syaM1dcJAJnv3sk1oOZlE3BUe4nk1wRoQJMo=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=B2jx9fCx5BQXq1hbn5zGSBhqQR6vPm2cWZR33VMVdKREYnCSh9C3zyFX0z4HKsYbRi2kLsxOs3qbwMVewVdWEdKGMYNnFzH27bctFilXHdDamU3NGov/LBplAZOSnVDtcXyxCwS11Ddjq4H9QVcSPykioRN0wLOg4K4vG/88+Ww= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=XjLTwkWw; arc=none smtp.client-ip=209.85.128.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="XjLTwkWw" Received: by mail-wm1-f51.google.com with SMTP id 5b1f17b1804b1-477a219dbcaso2027085e9.3 for ; Sun, 16 Nov 2025 11:35:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1763321718; x=1763926518; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=zOm/Qpp1i5LjY0bBSuDn0L0O31rDBT9dvq8zoJW3Po8=; b=XjLTwkWwAIrNemm0YLeg1wi6n0KSC09ukb3cOwWqwOudQ+ScaRY4/7rBHkWu85ZI1W q9MQkRkKcBT/r7WgGo5fM6MMEJ8mrJaiBAWLp8XRH/9JCPGooAQnDiCMdx+A8wHrqFRk OntvUPNRF6mK0X6Thpp4rAscK1yUjw8CFszmQ09vCbbXFfVm9SQeC39Jqagfoo1LLCTM 3qZULcdHtEmDmw+y6x/YRdEYPoE0y++ptQlNGIIx3dop7ETe915OK0wAQUlCJHeMwLO7 313HPOpp6u6g2oP830KnkgS77K/u9zuavoGBXfcr0rC4XDE+xJ2x/rWfI6NOI+yQX5sI CwEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1763321718; x=1763926518; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=zOm/Qpp1i5LjY0bBSuDn0L0O31rDBT9dvq8zoJW3Po8=; b=b3Rx0HmUvw0uX22ZNV9+3QIaxnKqXjh25LDzSQmOUlBSObpbtUfD6AnE5zC6TqT//v ca0RXYiJJZmO09QfwouwzLyl5lQlKSWt0li6FTabRjDQmVGfQbPSy2XDg4SXhHoOyGO6 o40bRavYfD6xykDNhjLSivyXTGu5AXxdwUe3Mk5pHuujQI8Hr40ld/uC91/riPIKJoNg xwYt1R2TNhLxW7iSouqmhqH1Lw9gu98lOz1XOTiXAicy4vsFfIkmoeyEKArKIK+BiYcL qMHuUSm7syfiMu18tacsGzbdN3JTkkXTN/qyvC909ZZjnwpPeuqB+4AUlwh3BFPN4BhT OdgQ== X-Forwarded-Encrypted: i=1; AJvYcCV0df8rJLoAvSNW0Tn7UKf7g9eI+trNltqHUT0PZEF7PkNKK3nbz78wm68/nGo5Y8rJZW2z73n8q+QND3E=@vger.kernel.org X-Gm-Message-State: AOJu0Yy2OPR7J+lGJWURyfmu5chme1MPSZyHV/2UD3Wy1GYjq/3YkVPX TRiQOPwlZeB6q3hRzDPAd5JBxz7JNbEiOb3EsK0KivWq/Gjqni70yTVrC44P1Q== X-Gm-Gg: ASbGncvpOAq+t6kjgwIxOJ2U6gcc6kXSGyGXUKbN+RQ/Bu6op1o3NZ9m3pn1H17oedW vOCVyXQ/4sR0EJ6xMjmdtYxWLoqVanDEzx0EzuFThgSFUyee9AmdoYXe+FFmuoHW0TIu8MvTJNo cFy1hOg7Cqp6claLZQNQgbvOVsERphGyZRJf8CNXCMnLtDdgQjGJX7eNkltpzdVgUcdAW94P9AM enndvq5Z5QKFvYWIklPJiI2N4a9jPklkxpe3QeZv6+A5psEVXyZU9hBGEm7P0/WWk9/02wSvlqx Lx8x7wHFgagHVIMXzwBmRG5ZewQy1gTK8asZIze1xFY1FPHJ6jGUbfUsETXgvLthHcamki5jhuW yDfNMWWBz/rUWoGAbKwIJVenBgSArU+B8qMLQDWMuJRTH3W0b4oq/5rYN8V8BFUbHpf7x1lcwBd Eg1Tcapn0JFqcpXnYjpCPWBuszWpn0y+mtunPH0BkMGUUqgdr5mJHhVjzAtyYw/JE= X-Google-Smtp-Source: AGHT+IGB/ejYIOC3uJdhg8ZHcXpMggAMjYaFDOy6y4q3frzPxQKbzbj2VHa9rnFlVW9v4A+9gfDzsg== X-Received: by 2002:a05:600c:1f12:b0:471:c72:c7f8 with SMTP id 5b1f17b1804b1-4778fe9b44dmr100716205e9.21.1763321718307; Sun, 16 Nov 2025 11:35:18 -0800 (PST) Received: from pumpkin (82-69-66-36.dsl.in-addr.zen.co.uk. [82.69.66.36]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-4778bb54bbesm93931885e9.5.2025.11.16.11.35.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 16 Nov 2025 11:35:17 -0800 (PST) Date: Sun, 16 Nov 2025 19:35:13 +0000 From: David Laight To: Guan-Chun Wu <409411716@gms.tku.edu.tw> Cc: Theodore Ts'o , Andreas Dilger , linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org, visitorckw@gmail.com Subject: Re: [PATCH] ext4: improve str2hashbuf by processing 4-byte chunks Message-ID: <20251116193513.0f90712a@pumpkin> In-Reply-To: <20251116130105.1988020-1-409411716@gms.tku.edu.tw> References: <20251116130105.1988020-1-409411716@gms.tku.edu.tw> X-Mailer: Claws Mail 4.1.1 (GTK 3.24.38; arm-unknown-linux-gnueabihf) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Sun, 16 Nov 2025 21:01:05 +0800 Guan-Chun Wu <409411716@gms.tku.edu.tw> wrote: > The original byte-by-byte implementation with modulo checks is less > efficient. Refactor str2hashbuf_unsigned() and str2hashbuf_signed() > to process input in explicit 4-byte chunks instead of using a > modulus-based loop to emit words byte by byte. There are much bigger gains to be made - the current code is horrid. Not least due to the costs of the indirect calls. It is better to use conditionals than indirect calls. > > This change removes per-byte modulo checks and reduces loop iterations, > improving efficiency. > > Performance test (x86_64, Intel Core i7-10700 @ 2.90GHz, average over 10000 > runs, using kernel module for testing): > > len | orig_s | new_s | orig_u | new_u > ----+--------+-------+--------+------- > 1 | 70 | 71 | 63 | 63 > 8 | 68 | 64 | 64 | 62 > 32 | 75 | 70 | 75 | 63 > 64 | 96 | 71 | 100 | 68 > 255 | 192 | 108 | 187 | 84 > > Signed-off-by: Guan-Chun Wu <409411716@gms.tku.edu.tw> > --- > fs/ext4/hash.c | 48 ++++++++++++++++++++++++++++++++---------------- > 1 file changed, 32 insertions(+), 16 deletions(-) > > diff --git a/fs/ext4/hash.c b/fs/ext4/hash.c > index 33cd5b6b02d5..75105828e8b4 100644 > --- a/fs/ext4/hash.c > +++ b/fs/ext4/hash.c > @@ -141,21 +141,29 @@ static void str2hashbuf_signed(const char *msg, int len, __u32 *buf, int num) > pad = (__u32)len | ((__u32)len << 8); > pad |= pad << 16; > > - val = pad; > if (len > num*4) > len = num * 4; > - for (i = 0; i < len; i++) { > - val = ((int) scp[i]) + (val << 8); > - if ((i % 4) == 3) { > - *buf++ = val; > - val = pad; > - num--; > - } > + > + while (len >= 4) { > + val = ((int)scp[0] << 24) + ((int)scp[1] << 16) + > + ((int)scp[2] << 8) + (int)scp[3]; The (int) casts are unnecessary (throughout), 'char' is always promoted to 'signed int' before any arithmetic. > + *buf++ = val; > + scp += 4; > + len -= 4; > + num--; > } > + > + val = pad; > + > + for (i = 0; i < len; i++) > + val = (int)scp[i] + (val << 8); > + > if (--num >= 0) > *buf++ = val; > + > while (--num >= 0) > *buf++ = pad; > + > } > > static void str2hashbuf_unsigned(const char *msg, int len, __u32 *buf, int num) > @@ -167,21 +175,29 @@ static void str2hashbuf_unsigned(const char *msg, int len, __u32 *buf, int num) > pad = (__u32)len | ((__u32)len << 8); > pad |= pad << 16; > > - val = pad; > if (len > num*4) > len = num * 4; > - for (i = 0; i < len; i++) { > - val = ((int) ucp[i]) + (val << 8); > - if ((i % 4) == 3) { > - *buf++ = val; > - val = pad; > - num--; > - } > + > + while (len >= 4) { > + val = ((int)ucp[0] << 24) | ((int)ucp[1] << 16) | > + ((int)ucp[2] << 8) | (int)ucp[3]; Isn't that get_misaligned_be32() ? David > + *buf++ = val; > + ucp += 4; > + len -= 4; > + num--; > } > + > + val = pad; > + > + for (i = 0; i < len; i++) > + val = (int)ucp[i] + (val << 8); > + > if (--num >= 0) > *buf++ = val; > + > while (--num >= 0) > *buf++ = pad; > + > } > > /*