Re: [PATCH v7 4/4] fs: unicode: Add utf8 module and a unicode layer

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Gabriel Krisman Bertazi <krisman@collabora.com>
To: Shreeya Patel <shreeya.patel@collabora.com>
Cc: tytso@mit.edu, adilger.kernel@dilger.ca, jaegeuk@kernel.org,
	chao@kernel.org, ebiggers@google.com, drosen@google.com,
	ebiggers@kernel.org, yuchao0@huawei.com,
	linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-f2fs-devel@lists.sourceforge.net,
	linux-fsdevel@vger.kernel.org, kernel@collabora.com,
	andre.almeida@collabora.com
Subject: Re: [PATCH v7 4/4] fs: unicode: Add utf8 module and a unicode layer
Date: Thu, 08 Apr 2021 15:10:16 -0400	[thread overview]
Message-ID: <875z0wvbhj.fsf@collabora.com> (raw)
In-Reply-To: <20210407144845.53266-5-shreeya.patel@collabora.com> (Shreeya Patel's message of "Wed, 7 Apr 2021 20:18:45 +0530")

Shreeya Patel <shreeya.patel@collabora.com> writes:

> utf8data.h_shipped has a large database table which is an auto-generated
> decodification trie for the unicode normalization functions.
> It is not necessary to load this large table in the kernel if no
> filesystem is using it, hence make UTF-8 encoding loadable by converting
> it into a module.
>
> Modify the file called unicode-core which will act as a layer for
> unicode subsystem. It will load the UTF-8 module and access it's functions
> whenever any filesystem that needs unicode is mounted.
> Currently, only UTF-8 encoding is supported but if any other encodings
> are supported in future then the layer file would be responsible for
> loading the desired encoding module.
>
> Also, indirect calls using function pointers are slow, use static calls to
> avoid overhead caused in case of repeated indirect calls. Static calls
> improves the performance by directly calling the functions as opposed to
> indirect calls.
>
> Signed-off-by: Shreeya Patel <shreeya.patel@collabora.com>
> ---
> Changes in v7
>   - Update the help text in Kconfig
>   - Handle the unicode_load_static_call function failure by decrementing
>     the reference.
>   - Correct the code for handling built-in utf8 option as well.
>   - Correct the synchronization for accessing utf8mod.
>   - Make changes to unicode_unload() for handling the situation where
>     utf8mod != NULL and um == NULL.
>
> Changes in v6
>   - Add spinlock to protect utf8mod and avoid NULL pointer
>     dereference.
>   - Change the static call function names for being consistent with
>     kernel coding style.
>   - Merge the unicode_load_module function with unicode_load as it is
>     not really needed to have a separate function.
>   - Use try_then_module_get instead of module_get to avoid loading the
>     module even when it is already loaded.
>   - Improve the commit message.
>
> Changes in v5
>   - Rename global variables and default static call functions for better
>     understanding
>   - Make only config UNICODE_UTF8 visible and config UNICODE to be always
>     enabled provided UNICODE_UTF8 is enabled.  
>   - Improve the documentation for Kconfig
>   - Improve the commit message.
>  
> Changes in v4
>   - Return error from the static calls instead of doing nothing and
>     succeeding even without loading the module.
>   - Remove the complete usage of utf8_ops and use static calls at all
>     places.
>   - Restore the static calls to default values when module is unloaded.
>   - Decrement the reference of module after calling the unload function.
>   - Remove spinlock as there will be no race conditions after removing
>     utf8_ops.
>
> Changes in v3
>   - Add a patch which checks if utf8 is loaded before calling utf8_unload()
>     in ext4 and f2fs filesystems
>   - Return error if strscpy() returns value < 0
>   - Correct the conditions to prevent NULL pointer dereference while
>     accessing functions via utf8_ops variable.
>   - Add spinlock to avoid race conditions.
>   - Use static_call() for preventing speculative execution attacks.
>
> Changes in v2
>   - Remove the duplicate file from the last patch.
>   - Make the wrapper functions inline.
>   - Remove msleep and use try_module_get() and module_put()
>     for ensuring that module is loaded correctly and also
>     doesn't get unloaded while in use.
>   - Resolve the warning reported by kernel test robot.
>   - Resolve all the checkpatch.pl warnings.
>
>  fs/unicode/Kconfig        |  26 +++-
>  fs/unicode/Makefile       |   5 +-
>  fs/unicode/unicode-core.c | 297 ++++++++++++++------------------------
>  fs/unicode/unicode-utf8.c | 264 +++++++++++++++++++++++++++++++++
>  include/linux/unicode.h   |  96 ++++++++++--
>  5 files changed, 483 insertions(+), 205 deletions(-)
>  create mode 100644 fs/unicode/unicode-utf8.c
>
> diff --git a/fs/unicode/Kconfig b/fs/unicode/Kconfig
> index 2c27b9a5cd6c..0c69800a2a37 100644
> --- a/fs/unicode/Kconfig
> +++ b/fs/unicode/Kconfig
> @@ -2,13 +2,31 @@
>  #
>  # UTF-8 normalization
>  #
> +# CONFIG_UNICODE will be automatically enabled if CONFIG_UNICODE_UTF8
> +# is enabled. This config option adds the unicode subsystem layer which loads
> +# the UTF-8 module whenever any filesystem needs it.
>  config UNICODE
> -	bool "UTF-8 normalization and casefolding support"
> +	bool
> +
> +config UNICODE_UTF8
> +	tristate "UTF-8 module"

"UTF-8 module" is the text that will appear in menuconfig and other
configuration utilities.  This string not very helpful to describe what
this code is about or why it is different from NLS_utf8.  People come to
this option looking for the case-insensitive feature in ext4, so I'd
prefer to keep the mention to 'casefolding'. or even improve the
original a bit to say:

tristate: "UTF-8 support for native Case-Insensitive filesystems"

Other than these and what Eric mentioned, the code looks good to me.  I
gave this series a try and it seems to work fine.

It does raise a new warning, though

/home/krisman/src/linux/fs/unicode/unicode-core.c: In function ‘unicode_load’:
/home/krisman/src/linux/include/linux/kmod.h:28:8: warning: the omitted middle operand in ‘?:’ will always be ‘true’, suggest explicit middle operand [-Wparentheses]
   28 |  ((x) ?: (__request_module(true, mod), (x)))
      |        ^
/home/krisman/src/linux/fs/unicode/unicode-core.c:123:7: note: in expansion of macro ‘try_then_request_module’
  123 |  if (!try_then_request_module(utf8mod_get(), "utf8")) {

But in this specific case, i think gcc is just being silly. What would
be the right way to avoid it?

-- 
Gabriel Krisman Bertazi

WARNING: multiple messages have this Message-ID (diff)

From: Gabriel Krisman Bertazi <krisman@collabora.com>
To: Shreeya Patel <shreeya.patel@collabora.com>
Cc: tytso@mit.edu, drosen@google.com, ebiggers@google.com,
	linux-kernel@vger.kernel.org,
	linux-f2fs-devel@lists.sourceforge.net, ebiggers@kernel.org,
	kernel@collabora.com, adilger.kernel@dilger.ca,
	linux-fsdevel@vger.kernel.org, jaegeuk@kernel.org,
	andre.almeida@collabora.com, linux-ext4@vger.kernel.org
Subject: Re: [f2fs-dev] [PATCH v7 4/4] fs: unicode: Add utf8 module and a unicode layer
Date: Thu, 08 Apr 2021 15:10:16 -0400	[thread overview]
Message-ID: <875z0wvbhj.fsf@collabora.com> (raw)
In-Reply-To: <20210407144845.53266-5-shreeya.patel@collabora.com> (Shreeya Patel's message of "Wed, 7 Apr 2021 20:18:45 +0530")

Shreeya Patel <shreeya.patel@collabora.com> writes:

> utf8data.h_shipped has a large database table which is an auto-generated
> decodification trie for the unicode normalization functions.
> It is not necessary to load this large table in the kernel if no
> filesystem is using it, hence make UTF-8 encoding loadable by converting
> it into a module.
>
> Modify the file called unicode-core which will act as a layer for
> unicode subsystem. It will load the UTF-8 module and access it's functions
> whenever any filesystem that needs unicode is mounted.
> Currently, only UTF-8 encoding is supported but if any other encodings
> are supported in future then the layer file would be responsible for
> loading the desired encoding module.
>
> Also, indirect calls using function pointers are slow, use static calls to
> avoid overhead caused in case of repeated indirect calls. Static calls
> improves the performance by directly calling the functions as opposed to
> indirect calls.
>
> Signed-off-by: Shreeya Patel <shreeya.patel@collabora.com>
> ---
> Changes in v7
>   - Update the help text in Kconfig
>   - Handle the unicode_load_static_call function failure by decrementing
>     the reference.
>   - Correct the code for handling built-in utf8 option as well.
>   - Correct the synchronization for accessing utf8mod.
>   - Make changes to unicode_unload() for handling the situation where
>     utf8mod != NULL and um == NULL.
>
> Changes in v6
>   - Add spinlock to protect utf8mod and avoid NULL pointer
>     dereference.
>   - Change the static call function names for being consistent with
>     kernel coding style.
>   - Merge the unicode_load_module function with unicode_load as it is
>     not really needed to have a separate function.
>   - Use try_then_module_get instead of module_get to avoid loading the
>     module even when it is already loaded.
>   - Improve the commit message.
>
> Changes in v5
>   - Rename global variables and default static call functions for better
>     understanding
>   - Make only config UNICODE_UTF8 visible and config UNICODE to be always
>     enabled provided UNICODE_UTF8 is enabled.  
>   - Improve the documentation for Kconfig
>   - Improve the commit message.
>  
> Changes in v4
>   - Return error from the static calls instead of doing nothing and
>     succeeding even without loading the module.
>   - Remove the complete usage of utf8_ops and use static calls at all
>     places.
>   - Restore the static calls to default values when module is unloaded.
>   - Decrement the reference of module after calling the unload function.
>   - Remove spinlock as there will be no race conditions after removing
>     utf8_ops.
>
> Changes in v3
>   - Add a patch which checks if utf8 is loaded before calling utf8_unload()
>     in ext4 and f2fs filesystems
>   - Return error if strscpy() returns value < 0
>   - Correct the conditions to prevent NULL pointer dereference while
>     accessing functions via utf8_ops variable.
>   - Add spinlock to avoid race conditions.
>   - Use static_call() for preventing speculative execution attacks.
>
> Changes in v2
>   - Remove the duplicate file from the last patch.
>   - Make the wrapper functions inline.
>   - Remove msleep and use try_module_get() and module_put()
>     for ensuring that module is loaded correctly and also
>     doesn't get unloaded while in use.
>   - Resolve the warning reported by kernel test robot.
>   - Resolve all the checkpatch.pl warnings.
>
>  fs/unicode/Kconfig        |  26 +++-
>  fs/unicode/Makefile       |   5 +-
>  fs/unicode/unicode-core.c | 297 ++++++++++++++------------------------
>  fs/unicode/unicode-utf8.c | 264 +++++++++++++++++++++++++++++++++
>  include/linux/unicode.h   |  96 ++++++++++--
>  5 files changed, 483 insertions(+), 205 deletions(-)
>  create mode 100644 fs/unicode/unicode-utf8.c
>
> diff --git a/fs/unicode/Kconfig b/fs/unicode/Kconfig
> index 2c27b9a5cd6c..0c69800a2a37 100644
> --- a/fs/unicode/Kconfig
> +++ b/fs/unicode/Kconfig
> @@ -2,13 +2,31 @@
>  #
>  # UTF-8 normalization
>  #
> +# CONFIG_UNICODE will be automatically enabled if CONFIG_UNICODE_UTF8
> +# is enabled. This config option adds the unicode subsystem layer which loads
> +# the UTF-8 module whenever any filesystem needs it.
>  config UNICODE
> -	bool "UTF-8 normalization and casefolding support"
> +	bool
> +
> +config UNICODE_UTF8
> +	tristate "UTF-8 module"

"UTF-8 module" is the text that will appear in menuconfig and other
configuration utilities.  This string not very helpful to describe what
this code is about or why it is different from NLS_utf8.  People come to
this option looking for the case-insensitive feature in ext4, so I'd
prefer to keep the mention to 'casefolding'. or even improve the
original a bit to say:

tristate: "UTF-8 support for native Case-Insensitive filesystems"

Other than these and what Eric mentioned, the code looks good to me.  I
gave this series a try and it seems to work fine.

It does raise a new warning, though

/home/krisman/src/linux/fs/unicode/unicode-core.c: In function ‘unicode_load’:
/home/krisman/src/linux/include/linux/kmod.h:28:8: warning: the omitted middle operand in ‘?:’ will always be ‘true’, suggest explicit middle operand [-Wparentheses]
   28 |  ((x) ?: (__request_module(true, mod), (x)))
      |        ^
/home/krisman/src/linux/fs/unicode/unicode-core.c:123:7: note: in expansion of macro ‘try_then_request_module’
  123 |  if (!try_then_request_module(utf8mod_get(), "utf8")) {

But in this specific case, i think gcc is just being silly. What would
be the right way to avoid it?

-- 
Gabriel Krisman Bertazi


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

next prev parent reply	other threads:[~2021-04-08 19:10 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-07 14:48 [PATCH v7 0/4] Make UTF-8 encoding loadable Shreeya Patel
2021-04-07 14:48 ` [f2fs-dev] " Shreeya Patel
2021-04-07 14:48 ` [PATCH v7 1/4] fs: unicode: Use strscpy() instead of strncpy() Shreeya Patel
2021-04-07 14:48   ` [f2fs-dev] " Shreeya Patel
2021-04-08  0:52   ` Eric Biggers
2021-04-08  0:52     ` [f2fs-dev] " Eric Biggers
2021-04-07 14:48 ` [PATCH v7 2/4] fs: unicode: Rename function names from utf8 to unicode Shreeya Patel
2021-04-07 14:48   ` [f2fs-dev] " Shreeya Patel
2021-04-07 14:48 ` [PATCH v7 3/4] fs: unicode: Rename utf8-core file to unicode-core Shreeya Patel
2021-04-07 14:48   ` [f2fs-dev] " Shreeya Patel
2021-04-07 14:48 ` [PATCH v7 4/4] fs: unicode: Add utf8 module and a unicode layer Shreeya Patel
2021-04-07 14:48   ` [f2fs-dev] " Shreeya Patel
2021-04-08  1:01   ` Eric Biggers
2021-04-08  1:01     ` [f2fs-dev] " Eric Biggers
2021-04-08 19:10   ` Gabriel Krisman Bertazi [this message]
2021-04-08 19:10     ` Gabriel Krisman Bertazi
2021-04-14 11:56     ` Shreeya Patel
2021-04-14 11:56       ` [f2fs-dev] " Shreeya Patel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=875z0wvbhj.fsf@collabora.com \
    --to=krisman@collabora.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=andre.almeida@collabora.com \
    --cc=chao@kernel.org \
    --cc=drosen@google.com \
    --cc=ebiggers@google.com \
    --cc=ebiggers@kernel.org \
    --cc=jaegeuk@kernel.org \
    --cc=kernel@collabora.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=shreeya.patel@collabora.com \
    --cc=tytso@mit.edu \
    --cc=yuchao0@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.