From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.4 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT, USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3AC8C433E0 for ; Wed, 8 Jul 2020 03:06:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A606B2078A for ; Wed, 8 Jul 2020 03:06:03 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="cfqa4ZI0" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729407AbgGHDGB (ORCPT ); Tue, 7 Jul 2020 23:06:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56732 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729302AbgGHDF7 (ORCPT ); Tue, 7 Jul 2020 23:05:59 -0400 Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com [IPv6:2607:f8b0:4864:20::54a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6A1DCC08C5DC for ; Tue, 7 Jul 2020 20:05:59 -0700 (PDT) Received: by mail-pg1-x54a.google.com with SMTP id s8so16278966pgs.9 for ; Tue, 07 Jul 2020 20:05:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=wV4Dlp1PtfRWNCEDy3OUVmwFuHR/znQviVm4udIzuj8=; b=cfqa4ZI0uBxj1JWd8dAGqnZFXA438OEfTLCcfqHSddxLMr8U1C1o3sty7pkLZivxV9 k2SLkAZmTm86qL76MMu0rWCsjw2KGoVfUpcu2Skz+1RJ2HRxepy9PS+oJHBH6EAIiKT0 SriweU9dx43A69vye8MbQ/7x6celwyfbxtDCje7B/pL2YOYpnBZAXmG4R6XlmFBlpzHU zIIB+JCbamdhEvMVnfZqSc8WEq8Ob08h3jKHTybWQtf0DWwoHE/q1x+7PQ4gsYCnSPgg nGLeVC26zVJq9dt0SaFj68BsveNgcD72gE+9VWRDJhJ5LkwFf2660LZxMJDrcdwQGjSi 7aVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=wV4Dlp1PtfRWNCEDy3OUVmwFuHR/znQviVm4udIzuj8=; b=ebHn+FMTyMSrjtOjJ/PIjc3H/UDfIVP2X3/pvG75lbX7ppxyDwlNKFVXbMYNngmtHC r68fLhtBRikZTEwaV2w0oBGqvmiPbfhQYMrlwSYRcCy8s0nTGnLnje1nvW0wiYA5MJrt jyg6W1Wl1U8++cGjpQVNj0FZWMBQAn7YsSidgfAS++hdU7vDHxWu74Yro+edvDFNhmwa NSA3JkzN3/Q+2q0RIPAi+uyM2HB0eN9ZW2u6hGHB6tz8elT5+oWJl5HJo6e/XnZTMbCm NiadIVBQcCdkGzDjG8t/qS8nKVp7tZpgggK5nqozpIBqhOz3HMagsYFT3IyDOy+CPqzA ULTQ== X-Gm-Message-State: AOAM530G28IdIF7CSWtJW0rWrCXQ1/w2cbjgVkESi7SRUX8OGu68tNsn 54Y/oJ4dXhPnG5lgYL/r5d2vbsVwr64= X-Google-Smtp-Source: ABdhPJzqov5JL3MTCYCUXsgN1WMwA/rTbUQMcSdZ1sJWb3HhqSUyDy2izboLJgJ3fC9ih0rA4dvnY2hb6wM= X-Received: by 2002:a17:902:744c:: with SMTP id e12mr39192696plt.337.1594177558861; Tue, 07 Jul 2020 20:05:58 -0700 (PDT) Date: Tue, 7 Jul 2020 20:05:49 -0700 In-Reply-To: <20200708030552.3829094-1-drosen@google.com> Message-Id: <20200708030552.3829094-2-drosen@google.com> Mime-Version: 1.0 References: <20200708030552.3829094-1-drosen@google.com> X-Mailer: git-send-email 2.27.0.383.g050319c2ae-goog Subject: [PATCH v11 1/4] unicode: Add utf8_casefold_hash From: Daniel Rosenberg To: "Theodore Ts'o" , linux-ext4@vger.kernel.org, Jaegeuk Kim , Chao Yu , linux-f2fs-devel@lists.sourceforge.net, Eric Biggers , linux-fscrypt@vger.kernel.org, Alexander Viro Cc: Andreas Dilger , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Gabriel Krisman Bertazi , kernel-team@android.com, Daniel Rosenberg , Eric Biggers Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This adds a case insensitive hash function to allow taking the hash without needing to allocate a casefolded copy of the string. The existing d_hash implementations for casefolding allocate memory within rcu-walk, by avoiding it we can be more efficient and avoid worrying about a failed allocation. Signed-off-by: Daniel Rosenberg Reviewed-by: Gabriel Krisman Bertazi Reviewed-by: Eric Biggers --- fs/unicode/utf8-core.c | 23 ++++++++++++++++++++++- include/linux/unicode.h | 3 +++ 2 files changed, 25 insertions(+), 1 deletion(-) diff --git a/fs/unicode/utf8-core.c b/fs/unicode/utf8-core.c index 2a878b739115..dc25823bfed9 100644 --- a/fs/unicode/utf8-core.c +++ b/fs/unicode/utf8-core.c @@ -6,6 +6,7 @@ #include #include #include +#include #include "utf8n.h" @@ -122,9 +123,29 @@ int utf8_casefold(const struct unicode_map *um, const struct qstr *str, } return -EINVAL; } - EXPORT_SYMBOL(utf8_casefold); +int utf8_casefold_hash(const struct unicode_map *um, const void *salt, + struct qstr *str) +{ + const struct utf8data *data = utf8nfdicf(um->version); + struct utf8cursor cur; + int c; + unsigned long hash = init_name_hash(salt); + + if (utf8ncursor(&cur, data, str->name, str->len) < 0) + return -EINVAL; + + while ((c = utf8byte(&cur))) { + if (c < 0) + return -EINVAL; + hash = partial_name_hash((unsigned char)c, hash); + } + str->hash = end_name_hash(hash); + return 0; +} +EXPORT_SYMBOL(utf8_casefold_hash); + int utf8_normalize(const struct unicode_map *um, const struct qstr *str, unsigned char *dest, size_t dlen) { diff --git a/include/linux/unicode.h b/include/linux/unicode.h index 990aa97d8049..74484d44c755 100644 --- a/include/linux/unicode.h +++ b/include/linux/unicode.h @@ -27,6 +27,9 @@ int utf8_normalize(const struct unicode_map *um, const struct qstr *str, int utf8_casefold(const struct unicode_map *um, const struct qstr *str, unsigned char *dest, size_t dlen); +int utf8_casefold_hash(const struct unicode_map *um, const void *salt, + struct qstr *str); + struct unicode_map *utf8_load(const char *version); void utf8_unload(struct unicode_map *um); -- 2.27.0.383.g050319c2ae-goog