* [PATCH v3 0/9] tmpfs: Add case-insensitive support for tmpfs
@ 2024-09-05 19:02 André Almeida
2024-09-05 19:02 ` [PATCH v3 1/9] libfs: Create the helper function generic_ci_validate_strict_name() André Almeida
` (8 more replies)
0 siblings, 9 replies; 15+ messages in thread
From: André Almeida @ 2024-09-05 19:02 UTC (permalink / raw)
To: Hugh Dickins, Andrew Morton, Alexander Viro, Christian Brauner,
Jan Kara, krisman
Cc: linux-mm, linux-kernel, linux-fsdevel, kernel-dev,
Daniel Rosenberg, smcv, Christoph Hellwig, Theodore Ts'o,
André Almeida
Hi,
This series is based on [0].
This patchset adds support for case-insensitive file names lookups in
tmpfs. The main difference from other casefold filesystems is that tmpfs
has no information on disk, just on RAM, so we can't use mkfs to create a
case-insensitive tmpfs. For this implementation, I opted to have a mount
option for casefolding. The rest of the patchset follows a similar approach
as ext4 and f2fs.
* Use case (from the original cover letter)
The use case for this feature is similar to the use case for ext4, to
better support compatibility layers (like Wine), particularly in
combination with sandboxing/container tools (like Flatpak). Those
containerization tools can share a subset of the host filesystem with an
application. In the container, the root directory and any parent
directories required for a shared directory are on tmpfs, with the
shared directories bind-mounted into the container's view of the
filesystem.
If the host filesystem is using case-insensitive directories, then the
application can do lookups inside those directories in a
case-insensitive way, without this needing to be implemented in
user-space. However, if the host is only sharing a subset of a
case-insensitive directory with the application, then the parent
directories of the mount point will be part of the container's root
tmpfs. When the application tries to do case-insensitive lookups of
those parent directories on a case-sensitive tmpfs, the lookup will
fail.
For example, if /srv/games is a case-insensitive directory on the host,
then applications will expect /srv/games/Steam/Half-Life and
/srv/games/steam/half-life to be interchangeable; but if the
container framework is only sharing /srv/games/Steam/Half-Life and
/srv/games/Steam/Portal (and not the rest of /srv/games) with the
container, with /srv, /srv/games and /srv/games/Steam as part of the
container's tmpfs root, then making /srv/games a case-insensitive
directory inside the container would be necessary to meet that
expectation.
* Testing
I send a patch for xfstests to enable the casefold test (generic/556) for
tmpfs.[1] The test succeed.
You can test this patchset using:
sudo mount -t tmpfs -o casefold tmpfs mnt/
And making a dir case-insensitive:
mkdir mnt/dir
chattr +F mnt/dir
[0] https://lore.kernel.org/linux-fsdevel/20210323195941.69720-1-andrealmeid@collabora.com/
[1] https://lore.kernel.org/fstests/20240823173008.280917-1-andrealmeid@igalia.com/
Changes in v3:
- Renamed utf8_check_strict_name() to generic_ci_validate_strict_name(), and
reworked the big if(...) to be more clear
- Expose the latest UTF-8 version in include/linux/unicode.h
- shmem_lookup() now sets d_ops
- reworked shmem_parse_opt_casefold()
- if `mount -o casefold` has no param, load latest UTF-8 version
- using (IS_ENABLED(CONFIG_UNICODE) && IS_CASEFOLDED(dir) when possible
- Fixed bug when adding a non-casefold flag in a non-empty dir
v2: https://lore.kernel.org/lkml/20240902225511.757831-1-andrealmeid@igalia.com/
Changes in v2:
- Found and fixed a bug in utf8_load()
- Created a helper for checking strict file names (Krisman)
- Merged patch 1/ and 3/ together (Krisman)
- Reworded the explanation about d_compare (Krisman)
- Removed bool casefold from shmem_sb_info (Krisman)
- Reworked d_add(dentry, NULL) to be called as d_add(dentry, inode) (Krisman)
- Moved utf8_parse_version to common unicode code
- Fixed some smatch/sparse warnings (kernel test bot/Dan Carpenter)
v1: https://lore.kernel.org/linux-fsdevel/20240823173332.281211-1-andrealmeid@igalia.com/
André Almeida (9):
libfs: Create the helper function generic_ci_validate_strict_name()
ext4: Use generic_ci_validate_strict_name helper
unicode: Recreate utf8_parse_version()
unicode: Export latest available UTF-8 version number
libfs: Create the helper struct generic_ci_always_del_dentry_ops
tmpfs: Add casefold lookup support
tmpfs: Add flag FS_CASEFOLD_FL support for tmpfs dirs
tmpfs: Expose filesystem features via sysfs
docs: tmpfs: Add casefold options
Documentation/filesystems/tmpfs.rst | 23 +++
fs/ext4/namei.c | 3 +-
fs/libfs.c | 53 ++++++
fs/unicode/utf8-core.c | 29 ++++
fs/unicode/utf8-selftest.c | 3 -
include/linux/fs.h | 2 +
include/linux/shmem_fs.h | 6 +-
include/linux/unicode.h | 5 +
mm/shmem.c | 249 ++++++++++++++++++++++++++--
9 files changed, 355 insertions(+), 18 deletions(-)
--
2.46.0
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v3 1/9] libfs: Create the helper function generic_ci_validate_strict_name()
2024-09-05 19:02 [PATCH v3 0/9] tmpfs: Add case-insensitive support for tmpfs André Almeida
@ 2024-09-05 19:02 ` André Almeida
2024-09-05 19:02 ` [PATCH v3 2/9] ext4: Use generic_ci_validate_strict_name helper André Almeida
` (7 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: André Almeida @ 2024-09-05 19:02 UTC (permalink / raw)
To: Hugh Dickins, Andrew Morton, Alexander Viro, Christian Brauner,
Jan Kara, krisman
Cc: linux-mm, linux-kernel, linux-fsdevel, kernel-dev,
Daniel Rosenberg, smcv, Christoph Hellwig, Theodore Ts'o,
André Almeida, Gabriel Krisman Bertazi
Create a helper function for filesystems do the checks required for
casefold directories and strict encoding.
Suggested-by: Gabriel Krisman Bertazi <gabriel@krisman.be>
Signed-off-by: André Almeida <andrealmeid@igalia.com>
---
Changes from v2:
- Moved function to libfs and adpated its name
- Wrapped at 72 chars column
- Decomposed the big if (...) to be more clear
---
fs/libfs.c | 38 ++++++++++++++++++++++++++++++++++++++
include/linux/fs.h | 1 +
2 files changed, 39 insertions(+)
diff --git a/fs/libfs.c b/fs/libfs.c
index 8aa34870449f..99fb36b48708 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1928,6 +1928,44 @@ int generic_ci_match(const struct inode *parent,
return !res;
}
EXPORT_SYMBOL(generic_ci_match);
+
+/**
+ * generic_ci_validate_strict_name - Check if a given name is suitable
+ * for a directory
+ *
+ * This functions checks if the proposed filename is valid for the
+ * parent directory. That means that only valid UTF-8 filenames will be
+ * accepted for casefold directories from filesystems created with the
+ * strict encoding flag. That also means that any name will be
+ * accepted for directories that doesn't have casefold enabled, or
+ * aren't being strict with the encoding.
+ *
+ * @dir: inode of the directory where the new file will be created
+ * @name: name of the new file
+ *
+ * Return:
+ * * True if the filename is suitable for this directory. It can be
+ * true if a given name is not suitable for a strict encoding
+ * directory, but the directory being used isn't strict
+ * * False if the filename isn't suitable for this directory. This only
+ * happens when a directory is casefolded and the filesystem is strict
+ * about its encoding.
+ */
+bool generic_ci_validate_strict_name(struct inode *dir, struct qstr *name)
+{
+ if (!IS_CASEFOLDED(dir) || !sb_has_strict_encoding(dir->i_sb))
+ return true;
+
+ /*
+ * A casefold dir must have a encoding set, unless the filesystem
+ * is corrupted
+ */
+ if (WARN_ON_ONCE(!dir->i_sb->s_encoding))
+ return true;
+
+ return utf8_validate(dir->i_sb->s_encoding, name);
+}
+EXPORT_SYMBOL(generic_ci_validate_strict_name);
#endif
#ifdef CONFIG_FS_ENCRYPTION
diff --git a/include/linux/fs.h b/include/linux/fs.h
index fd34b5755c0b..937142950dfe 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3385,6 +3385,7 @@ extern int generic_ci_match(const struct inode *parent,
const struct qstr *name,
const struct qstr *folded_name,
const u8 *de_name, u32 de_name_len);
+bool generic_ci_validate_strict_name(struct inode *dir, struct qstr *name);
static inline bool sb_has_encoding(const struct super_block *sb)
{
--
2.46.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v3 2/9] ext4: Use generic_ci_validate_strict_name helper
2024-09-05 19:02 [PATCH v3 0/9] tmpfs: Add case-insensitive support for tmpfs André Almeida
2024-09-05 19:02 ` [PATCH v3 1/9] libfs: Create the helper function generic_ci_validate_strict_name() André Almeida
@ 2024-09-05 19:02 ` André Almeida
2024-09-05 19:02 ` [PATCH v3 3/9] unicode: Recreate utf8_parse_version() André Almeida
` (6 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: André Almeida @ 2024-09-05 19:02 UTC (permalink / raw)
To: Hugh Dickins, Andrew Morton, Alexander Viro, Christian Brauner,
Jan Kara, krisman
Cc: linux-mm, linux-kernel, linux-fsdevel, kernel-dev,
Daniel Rosenberg, smcv, Christoph Hellwig, Theodore Ts'o,
André Almeida, Gabriel Krisman Bertazi
Use the helper function to check the requirements for casefold
directories using strict encoding.
Suggested-by: Gabriel Krisman Bertazi <gabriel@krisman.be>
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Acked-by: Theodore Ts'o <tytso@mit.edu>
---
fs/ext4/namei.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 6a95713f9193..beca80e70b0c 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2394,8 +2394,7 @@ static int ext4_add_entry(handle_t *handle, struct dentry *dentry,
return -ENOKEY;
#if IS_ENABLED(CONFIG_UNICODE)
- if (sb_has_strict_encoding(sb) && IS_CASEFOLDED(dir) &&
- utf8_validate(sb->s_encoding, &dentry->d_name))
+ if (!generic_ci_validate_strict_name(dir, &dentry->d_name))
return -EINVAL;
#endif
--
2.46.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v3 3/9] unicode: Recreate utf8_parse_version()
2024-09-05 19:02 [PATCH v3 0/9] tmpfs: Add case-insensitive support for tmpfs André Almeida
2024-09-05 19:02 ` [PATCH v3 1/9] libfs: Create the helper function generic_ci_validate_strict_name() André Almeida
2024-09-05 19:02 ` [PATCH v3 2/9] ext4: Use generic_ci_validate_strict_name helper André Almeida
@ 2024-09-05 19:02 ` André Almeida
2024-09-05 19:02 ` [PATCH v3 4/9] unicode: Export latest available UTF-8 version number André Almeida
` (5 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: André Almeida @ 2024-09-05 19:02 UTC (permalink / raw)
To: Hugh Dickins, Andrew Morton, Alexander Viro, Christian Brauner,
Jan Kara, krisman
Cc: linux-mm, linux-kernel, linux-fsdevel, kernel-dev,
Daniel Rosenberg, smcv, Christoph Hellwig, Theodore Ts'o,
André Almeida
All filesystems that currently support UTF-8 casefold can fetch the
UTF-8 version from the filesystem metadata stored on disk. They can get
the data stored and directly match it to a integer, so they can skip the
string parsing step, which motivated the removal of this function in the
first place.
However, for tmpfs, the only way to tell the kernel which UTF-8 version
we are about to use is via mount options, using a string. Re-introduce
utf8_parse_version() to be used by tmpfs.
This version differs from the original by skipping the intermediate step
of copying the version string to an auxiliary string before calling
match_token(). This versions calls match_token() in the argument string.
utf8_parse_version() was created by 9d53690f0d4 ("unicode: implement
higher level API for string handling") and later removed by 49bd03cc7e9
("unicode: pass a UNICODE_AGE() tripple to utf8_load").
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
---
fs/unicode/utf8-core.c | 29 +++++++++++++++++++++++++++++
include/linux/unicode.h | 3 +++
2 files changed, 32 insertions(+)
diff --git a/fs/unicode/utf8-core.c b/fs/unicode/utf8-core.c
index 0400824ef493..2e852075c6d8 100644
--- a/fs/unicode/utf8-core.c
+++ b/fs/unicode/utf8-core.c
@@ -214,3 +214,32 @@ void utf8_unload(struct unicode_map *um)
}
EXPORT_SYMBOL(utf8_unload);
+/**
+ * utf8_parse_version - Parse a UTF-8 version number from a string
+ *
+ * @version: input string
+ * @maj: output major version number
+ * @min: output minor version number
+ * @rev: output minor revision number
+ *
+ * Returns 0 on success, negative code on error
+ */
+int utf8_parse_version(char *version, unsigned int *maj,
+ unsigned int *min, unsigned int *rev)
+{
+ substring_t args[3];
+ static const struct match_token token[] = {
+ {1, "%d.%d.%d"},
+ {0, NULL}
+ };
+
+ if (match_token(version, token, args) != 1)
+ return -EINVAL;
+
+ if (match_int(&args[0], maj) || match_int(&args[1], min) ||
+ match_int(&args[2], rev))
+ return -EINVAL;
+
+ return 0;
+}
+EXPORT_SYMBOL(utf8_parse_version);
diff --git a/include/linux/unicode.h b/include/linux/unicode.h
index 4d39e6e11a95..f73a78655588 100644
--- a/include/linux/unicode.h
+++ b/include/linux/unicode.h
@@ -76,4 +76,7 @@ int utf8_casefold_hash(const struct unicode_map *um, const void *salt,
struct unicode_map *utf8_load(unsigned int version);
void utf8_unload(struct unicode_map *um);
+int utf8_parse_version(char *version, unsigned int *maj, unsigned int *min,
+ unsigned int *rev);
+
#endif /* _LINUX_UNICODE_H */
--
2.46.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v3 4/9] unicode: Export latest available UTF-8 version number
2024-09-05 19:02 [PATCH v3 0/9] tmpfs: Add case-insensitive support for tmpfs André Almeida
` (2 preceding siblings ...)
2024-09-05 19:02 ` [PATCH v3 3/9] unicode: Recreate utf8_parse_version() André Almeida
@ 2024-09-05 19:02 ` André Almeida
2024-09-05 19:57 ` Gabriel Krisman Bertazi
2024-09-05 19:02 ` [PATCH v3 5/9] libfs: Create the helper struct generic_ci_always_del_dentry_ops André Almeida
` (4 subsequent siblings)
8 siblings, 1 reply; 15+ messages in thread
From: André Almeida @ 2024-09-05 19:02 UTC (permalink / raw)
To: Hugh Dickins, Andrew Morton, Alexander Viro, Christian Brauner,
Jan Kara, krisman
Cc: linux-mm, linux-kernel, linux-fsdevel, kernel-dev,
Daniel Rosenberg, smcv, Christoph Hellwig, Theodore Ts'o,
André Almeida
Export latest available UTF-8 version number so filesystems can easily
load the newest one.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
---
If this is the accepted way of doing that, I will also add something to
checkpatch to warn that modifications at fs/unicode/utf8data.c likely
need to change this define.
Other ways to implement this:
1) Having a new arg for utf8_load()
struct unicode_map *utf8_load(unsigned int version, bool latest)
{
um->tables = symbol_request(utf8_data_table);
if (latest) {
int i = um->tables->utf8agetab_size - 1;
version = um->tables->utf8agetab[i]
}
}
2) Expose utf8agetab[]
Having utf8agetab[] at include/linux/unicode.h will make easier to
programmatically find out the latest version without the need to do a
symbol_request/symbol_put of the whole utf8 table.
---
fs/unicode/utf8-selftest.c | 3 ---
include/linux/unicode.h | 2 ++
2 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/fs/unicode/utf8-selftest.c b/fs/unicode/utf8-selftest.c
index 600e15efe9ed..5ddaf27b21a6 100644
--- a/fs/unicode/utf8-selftest.c
+++ b/fs/unicode/utf8-selftest.c
@@ -17,9 +17,6 @@
static unsigned int failed_tests;
static unsigned int total_tests;
-/* Tests will be based on this version. */
-#define UTF8_LATEST UNICODE_AGE(12, 1, 0)
-
#define _test(cond, func, line, fmt, ...) do { \
total_tests++; \
if (!cond) { \
diff --git a/include/linux/unicode.h b/include/linux/unicode.h
index f73a78655588..db043ea914fd 100644
--- a/include/linux/unicode.h
+++ b/include/linux/unicode.h
@@ -16,6 +16,8 @@ struct utf8data_table;
((unsigned int)(MIN) << UNICODE_MIN_SHIFT) | \
((unsigned int)(REV)))
+#define UTF8_LATEST UNICODE_AGE(12, 1, 0)
+
static inline u8 unicode_major(unsigned int age)
{
return (age >> UNICODE_MAJ_SHIFT) & 0xff;
--
2.46.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v3 5/9] libfs: Create the helper struct generic_ci_always_del_dentry_ops
2024-09-05 19:02 [PATCH v3 0/9] tmpfs: Add case-insensitive support for tmpfs André Almeida
` (3 preceding siblings ...)
2024-09-05 19:02 ` [PATCH v3 4/9] unicode: Export latest available UTF-8 version number André Almeida
@ 2024-09-05 19:02 ` André Almeida
2024-09-05 19:02 ` [PATCH v3 6/9] tmpfs: Add casefold lookup support André Almeida
` (3 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: André Almeida @ 2024-09-05 19:02 UTC (permalink / raw)
To: Hugh Dickins, Andrew Morton, Alexander Viro, Christian Brauner,
Jan Kara, krisman
Cc: linux-mm, linux-kernel, linux-fsdevel, kernel-dev,
Daniel Rosenberg, smcv, Christoph Hellwig, Theodore Ts'o,
André Almeida
Create a helper to assign dentry_operations with the generic case
insensitive functions plus setting .d_delete as always_delete_dentry.
This is useful to in-memory casefold filesystems like tmpfs.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
---
v3: New patch
---
fs/libfs.c | 15 +++++++++++++++
include/linux/fs.h | 1 +
2 files changed, 16 insertions(+)
diff --git a/fs/libfs.c b/fs/libfs.c
index 99fb36b48708..58b39640b686 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1855,6 +1855,21 @@ static const struct dentry_operations generic_ci_dentry_ops = {
#endif
};
+/*
+ * Same as generic_ci_dentry_ops, but also set d_delete. Useful for in-memory
+ * casefold filesystems.
+ */
+const struct dentry_operations generic_ci_always_del_dentry_ops = {
+ .d_hash = generic_ci_d_hash,
+ .d_compare = generic_ci_d_compare,
+#ifdef CONFIG_FS_ENCRYPTION
+ .d_revalidate = fscrypt_d_revalidate,
+#endif
+ .d_delete = always_delete_dentry,
+};
+EXPORT_SYMBOL(generic_ci_always_del_dentry_ops);
+
+
/**
* generic_ci_match() - Match a name (case-insensitively) with a dirent.
* This is a filesystem helper for comparison with directory entries.
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 937142950dfe..254a1dcf987b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3337,6 +3337,7 @@ extern int always_delete_dentry(const struct dentry *);
extern struct inode *alloc_anon_inode(struct super_block *);
extern int simple_nosetlease(struct file *, int, struct file_lease **, void **);
extern const struct dentry_operations simple_dentry_operations;
+extern const struct dentry_operations generic_ci_always_del_dentry_ops;
extern struct dentry *simple_lookup(struct inode *, struct dentry *, unsigned int flags);
extern ssize_t generic_read_dir(struct file *, char __user *, size_t, loff_t *);
--
2.46.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v3 6/9] tmpfs: Add casefold lookup support
2024-09-05 19:02 [PATCH v3 0/9] tmpfs: Add case-insensitive support for tmpfs André Almeida
` (4 preceding siblings ...)
2024-09-05 19:02 ` [PATCH v3 5/9] libfs: Create the helper struct generic_ci_always_del_dentry_ops André Almeida
@ 2024-09-05 19:02 ` André Almeida
2024-09-05 21:28 ` Gabriel Krisman Bertazi
2024-09-05 19:02 ` [PATCH v3 7/9] tmpfs: Add flag FS_CASEFOLD_FL support for tmpfs dirs André Almeida
` (2 subsequent siblings)
8 siblings, 1 reply; 15+ messages in thread
From: André Almeida @ 2024-09-05 19:02 UTC (permalink / raw)
To: Hugh Dickins, Andrew Morton, Alexander Viro, Christian Brauner,
Jan Kara, krisman
Cc: linux-mm, linux-kernel, linux-fsdevel, kernel-dev,
Daniel Rosenberg, smcv, Christoph Hellwig, Theodore Ts'o,
André Almeida
Enable casefold lookup in tmpfs, based on the encoding defined by
userspace. That means that instead of comparing byte per byte a file
name, it compares to a case-insensitive equivalent of the Unicode
string.
* Dcache handling
There's a special need when dealing with case-insensitive dentries.
First of all, we currently invalidated every negative casefold dentries.
That happens because currently VFS code has no proper support to deal
with that, giving that it could incorrectly reuse a previous filename
for a new file that has a casefold match. For instance, this could
happen:
$ mkdir DIR
$ rm -r DIR
$ mkdir dir
$ ls
DIR/
And would be perceived as inconsistency from userspace point of view,
because even that we match files in a case-insensitive manner, we still
honor whatever is the initial filename.
Along with that, tmpfs stores only the first equivalent name dentry used
in the dcache, preventing duplications of dentries in the dcache. The
d_compare() version for casefold files uses a normalized string, so the
filename under lookup will be compared to another normalized string for
the existing file, achieving a casefolded lookup.
* Enabling casefold via mount options
Most filesystems have their data stored in disk, so casefold option need
to be enabled when building a filesystem on a device (via mkfs).
However, as tmpfs is a RAM backed filesystem, there's no disk
information and thus no mkfs to store information about casefold.
For tmpfs, create casefold options for mounting. Userspace can then
enable casefold support for a mount point using:
$ mount -t tmpfs -o casefold=utf8-12.1.0 fs_name mount_dir/
Userspace must set what Unicode standard is aiming to. The available
options depends on what the kernel Unicode subsystem supports.
And for strict encoding:
$ mount -t tmpfs -o casefold=utf8-12.1.0,strict_encoding fs_name mount_dir/
Strict encoding means that tmpfs will refuse to create invalid UTF-8
sequences. When this option is not enabled, any invalid sequence will be
treated as an opaque byte sequence, ignoring the encoding thus not being
able to be looked up in a case-insensitive way.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
---
Changes from v2:
- shmem_lookup() now sets d_ops
- reworked shmem_parse_opt_casefold()
- if `mount -o casefold` has no param, load latest UTF-8 version
- using (IS_ENABLED(CONFIG_UNICODE) && IS_CASEFOLDED(dir) when possible
---
mm/shmem.c | 142 +++++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 138 insertions(+), 4 deletions(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index 5a77acf6ac6a..6b61fc5dc0b1 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -40,6 +40,8 @@
#include <linux/fs_parser.h>
#include <linux/swapfile.h>
#include <linux/iversion.h>
+#include <linux/unicode.h>
+#include <linux/parser.h>
#include "swap.h"
static struct vfsmount *shm_mnt __ro_after_init;
@@ -123,6 +125,8 @@ struct shmem_options {
bool noswap;
unsigned short quota_types;
struct shmem_quota_limits qlimits;
+ struct unicode_map *encoding;
+ bool strict_encoding;
#define SHMEM_SEEN_BLOCKS 1
#define SHMEM_SEEN_INODES 2
#define SHMEM_SEEN_HUGE 4
@@ -3427,6 +3431,10 @@ shmem_mknod(struct mnt_idmap *idmap, struct inode *dir,
if (IS_ERR(inode))
return PTR_ERR(inode);
+ if (IS_ENABLED(CONFIG_UNICODE))
+ if (!generic_ci_validate_strict_name(dir, &dentry->d_name))
+ return -EINVAL;
+
error = simple_acl_create(dir, inode);
if (error)
goto out_iput;
@@ -3442,7 +3450,12 @@ shmem_mknod(struct mnt_idmap *idmap, struct inode *dir,
dir->i_size += BOGO_DIRENT_SIZE;
inode_set_mtime_to_ts(dir, inode_set_ctime_current(dir));
inode_inc_iversion(dir);
- d_instantiate(dentry, inode);
+
+ if (IS_ENABLED(CONFIG_UNICODE) && IS_CASEFOLDED(dir))
+ d_add(dentry, inode);
+ else
+ d_instantiate(dentry, inode);
+
dget(dentry); /* Extra count - pin the dentry in core */
return error;
@@ -3533,7 +3546,10 @@ static int shmem_link(struct dentry *old_dentry, struct inode *dir,
inc_nlink(inode);
ihold(inode); /* New dentry reference */
dget(dentry); /* Extra pinning count for the created dentry */
- d_instantiate(dentry, inode);
+ if (IS_ENABLED(CONFIG_UNICODE) && IS_CASEFOLDED(dir))
+ d_add(dentry, inode);
+ else
+ d_instantiate(dentry, inode);
out:
return ret;
}
@@ -3553,6 +3569,14 @@ static int shmem_unlink(struct inode *dir, struct dentry *dentry)
inode_inc_iversion(dir);
drop_nlink(inode);
dput(dentry); /* Undo the count from "create" - does all the work */
+
+ /*
+ * For now, VFS can't deal with case-insensitive negative dentries, so
+ * we invalidate them
+ */
+ if (IS_ENABLED(CONFIG_UNICODE) && IS_CASEFOLDED(dir))
+ d_invalidate(dentry);
+
return 0;
}
@@ -3697,7 +3721,10 @@ static int shmem_symlink(struct mnt_idmap *idmap, struct inode *dir,
dir->i_size += BOGO_DIRENT_SIZE;
inode_set_mtime_to_ts(dir, inode_set_ctime_current(dir));
inode_inc_iversion(dir);
- d_instantiate(dentry, inode);
+ if (IS_ENABLED(CONFIG_UNICODE) && IS_CASEFOLDED(dir))
+ d_add(dentry, inode);
+ else
+ d_instantiate(dentry, inode);
dget(dentry);
return 0;
@@ -4050,6 +4077,9 @@ enum shmem_param {
Opt_usrquota_inode_hardlimit,
Opt_grpquota_block_hardlimit,
Opt_grpquota_inode_hardlimit,
+ Opt_casefold_version,
+ Opt_casefold,
+ Opt_strict_encoding,
};
static const struct constant_table shmem_param_enums_huge[] = {
@@ -4081,9 +4111,62 @@ const struct fs_parameter_spec shmem_fs_parameters[] = {
fsparam_string("grpquota_block_hardlimit", Opt_grpquota_block_hardlimit),
fsparam_string("grpquota_inode_hardlimit", Opt_grpquota_inode_hardlimit),
#endif
+ fsparam_string("casefold", Opt_casefold_version),
+ fsparam_flag ("casefold", Opt_casefold),
+ fsparam_flag ("strict_encoding", Opt_strict_encoding),
{}
};
+#if IS_ENABLED(CONFIG_UNICODE)
+static int shmem_parse_opt_casefold(struct fs_context *fc, struct fs_parameter *param,
+ bool latest_version)
+{
+ struct shmem_options *ctx = fc->fs_private;
+ unsigned int maj = 0, min = 0, rev = 0, version = 0;
+ struct unicode_map *encoding;
+ char *version_str = param->string + 5;
+ int ret;
+
+ if (latest_version) {
+ version = UTF8_LATEST;
+ } else {
+ if (strncmp(param->string, "utf8-", 5))
+ return invalfc(fc, "Only UTF-8 encodings are supported "
+ "in the format: utf8-<version number>");
+
+ ret = utf8_parse_version(version_str, &maj, &min, &rev);
+ if (ret)
+ return invalfc(fc, "Invalid UTF-8 version: %s", version_str);
+
+ version = UNICODE_AGE(maj, min, rev);
+ }
+
+ encoding = utf8_load(version);
+
+ if (IS_ERR(encoding)) {
+ if (latest_version)
+ return invalfc(fc, "Failed loading latest UTF-8 version");
+ else
+ return invalfc(fc, "Failed loading UTF-8 version: %s", version_str);
+ }
+
+ if (latest_version)
+ pr_info("tmpfs: Using the latest UTF-8 version available");
+ else
+ pr_info("tmpfs: Using encoding provided by mount options: %s\n", param->string);
+
+ ctx->encoding = encoding;
+
+ return 0;
+}
+#else
+static int shmem_parse_opt_casefold(struct fs_context *fc, struct fs_parameter *param,
+ bool latest_version)
+{
+ return invalfc(fc, "tmpfs: No kernel support for casefold filesystems\n");
+}
+#endif
+
static int shmem_parse_one(struct fs_context *fc, struct fs_parameter *param)
{
struct shmem_options *ctx = fc->fs_private;
@@ -4242,6 +4325,13 @@ static int shmem_parse_one(struct fs_context *fc, struct fs_parameter *param)
"Group quota inode hardlimit too large.");
ctx->qlimits.grpquota_ihardlimit = size;
break;
+ case Opt_casefold_version:
+ return shmem_parse_opt_casefold(fc, param, false);
+ case Opt_casefold:
+ return shmem_parse_opt_casefold(fc, param, true);
+ case Opt_strict_encoding:
+ ctx->strict_encoding = true;
+ break;
}
return 0;
@@ -4471,6 +4561,11 @@ static void shmem_put_super(struct super_block *sb)
{
struct shmem_sb_info *sbinfo = SHMEM_SB(sb);
+#if IS_ENABLED(CONFIG_UNICODE)
+ if (sb->s_encoding)
+ utf8_unload(sb->s_encoding);
+#endif
+
#ifdef CONFIG_TMPFS_QUOTA
shmem_disable_quotas(sb);
#endif
@@ -4515,6 +4610,16 @@ static int shmem_fill_super(struct super_block *sb, struct fs_context *fc)
}
sb->s_export_op = &shmem_export_ops;
sb->s_flags |= SB_NOSEC | SB_I_VERSION;
+
+#if IS_ENABLED(CONFIG_UNICODE)
+ if (ctx->encoding) {
+ sb->s_encoding = ctx->encoding;
+ generic_set_sb_d_ops(sb);
+ if (ctx->strict_encoding)
+ sb->s_encoding_flags = SB_ENC_STRICT_MODE_FL;
+ }
+#endif
+
#else
sb->s_flags |= SB_NOUSER;
#endif
@@ -4704,11 +4809,38 @@ static const struct inode_operations shmem_inode_operations = {
#endif
};
+static struct dentry *shmem_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags)
+{
+ const struct dentry_operations *d_ops = &simple_dentry_operations;
+
+#if IS_ENABLED(CONFIG_UNICODE)
+ if (dentry->d_sb->s_encoding)
+ d_ops = &generic_ci_always_del_dentry_ops;
+#endif
+
+ if (dentry->d_name.len > NAME_MAX)
+ return ERR_PTR(-ENAMETOOLONG);
+
+ if (!dentry->d_sb->s_d_op)
+ d_set_d_op(dentry, d_ops);
+
+ /*
+ * For now, VFS can't deal with case-insensitive negative dentries, so
+ * we prevent them from being created
+ */
+ if (IS_ENABLED(CONFIG_UNICODE) && IS_CASEFOLDED(dir))
+ return NULL;
+
+ d_add(dentry, NULL);
+
+ return NULL;
+}
+
static const struct inode_operations shmem_dir_inode_operations = {
#ifdef CONFIG_TMPFS
.getattr = shmem_getattr,
.create = shmem_create,
- .lookup = simple_lookup,
+ .lookup = shmem_lookup,
.link = shmem_link,
.unlink = shmem_unlink,
.symlink = shmem_symlink,
@@ -4791,6 +4923,8 @@ int shmem_init_fs_context(struct fs_context *fc)
ctx->uid = current_fsuid();
ctx->gid = current_fsgid();
+ ctx->encoding = NULL;
+
fc->fs_private = ctx;
fc->ops = &shmem_fs_context_ops;
return 0;
--
2.46.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v3 7/9] tmpfs: Add flag FS_CASEFOLD_FL support for tmpfs dirs
2024-09-05 19:02 [PATCH v3 0/9] tmpfs: Add case-insensitive support for tmpfs André Almeida
` (5 preceding siblings ...)
2024-09-05 19:02 ` [PATCH v3 6/9] tmpfs: Add casefold lookup support André Almeida
@ 2024-09-05 19:02 ` André Almeida
2024-09-05 19:02 ` [PATCH v3 8/9] tmpfs: Expose filesystem features via sysfs André Almeida
2024-09-05 19:02 ` [PATCH v3 9/9] docs: tmpfs: Add casefold options André Almeida
8 siblings, 0 replies; 15+ messages in thread
From: André Almeida @ 2024-09-05 19:02 UTC (permalink / raw)
To: Hugh Dickins, Andrew Morton, Alexander Viro, Christian Brauner,
Jan Kara, krisman
Cc: linux-mm, linux-kernel, linux-fsdevel, kernel-dev,
Daniel Rosenberg, smcv, Christoph Hellwig, Theodore Ts'o,
André Almeida
Enable setting flag FS_CASEFOLD_FL for tmpfs directories, when tmpfs is
mounted with casefold support. A special check is need for this flag,
since it can't be set for non-empty directories.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
---
Changes from v2:
- Fixed bug when adding a non-casefold flag in a non-empty dir
---
include/linux/shmem_fs.h | 6 ++--
mm/shmem.c | 70 ++++++++++++++++++++++++++++++++++++----
2 files changed, 67 insertions(+), 9 deletions(-)
diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index 1d06b1e5408a..8367ca2b99d9 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -42,10 +42,10 @@ struct shmem_inode_info {
struct inode vfs_inode;
};
-#define SHMEM_FL_USER_VISIBLE FS_FL_USER_VISIBLE
+#define SHMEM_FL_USER_VISIBLE (FS_FL_USER_VISIBLE | FS_CASEFOLD_FL)
#define SHMEM_FL_USER_MODIFIABLE \
- (FS_IMMUTABLE_FL | FS_APPEND_FL | FS_NODUMP_FL | FS_NOATIME_FL)
-#define SHMEM_FL_INHERITED (FS_NODUMP_FL | FS_NOATIME_FL)
+ (FS_IMMUTABLE_FL | FS_APPEND_FL | FS_NODUMP_FL | FS_NOATIME_FL | FS_CASEFOLD_FL)
+#define SHMEM_FL_INHERITED (FS_NODUMP_FL | FS_NOATIME_FL | FS_CASEFOLD_FL)
struct shmem_quota_limits {
qsize_t usrquota_bhardlimit; /* Default user quota block hard limit */
diff --git a/mm/shmem.c b/mm/shmem.c
index 6b61fc5dc0b1..d38977fb2097 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2613,13 +2613,62 @@ static int shmem_file_open(struct inode *inode, struct file *file)
#ifdef CONFIG_TMPFS_XATTR
static int shmem_initxattrs(struct inode *, const struct xattr *, void *);
+#if IS_ENABLED(CONFIG_UNICODE)
+/*
+ * shmem_inode_casefold_flags - Deal with casefold file attribute flag
+ *
+ * The casefold file attribute needs some special checks. I can just be added to
+ * an empty dir, and can't be removed from a non-empty dir.
+ */
+static int shmem_inode_casefold_flags(struct inode *inode, unsigned int fsflags,
+ struct dentry *dentry, unsigned int *i_flags)
+{
+ unsigned int old = inode->i_flags;
+ struct super_block *sb = inode->i_sb;
+
+ if (fsflags & FS_CASEFOLD_FL) {
+ if (!(old & S_CASEFOLD)) {
+ if (!sb->s_encoding)
+ return -EOPNOTSUPP;
+
+ if (!S_ISDIR(inode->i_mode))
+ return -ENOTDIR;
+
+ if (dentry && !simple_empty(dentry))
+ return -ENOTEMPTY;
+ }
+
+ *i_flags = *i_flags | S_CASEFOLD;
+ } else if (old & S_CASEFOLD) {
+ if (dentry && !simple_empty(dentry))
+ return -ENOTEMPTY;
+ }
+
+ return 0;
+}
+#else
+static int shmem_inode_casefold_flags(struct inode *inode, unsigned int fsflags,
+ struct dentry *dentry, unsigned int *i_flags)
+{
+ if (fsflags & FS_CASEFOLD_FL)
+ return -EOPNOTSUPP;
+
+ return 0;
+}
+#endif
+
/*
* chattr's fsflags are unrelated to extended attributes,
* but tmpfs has chosen to enable them under the same config option.
*/
-static void shmem_set_inode_flags(struct inode *inode, unsigned int fsflags)
+static int shmem_set_inode_flags(struct inode *inode, unsigned int fsflags, struct dentry *dentry)
{
unsigned int i_flags = 0;
+ int ret;
+
+ ret = shmem_inode_casefold_flags(inode, fsflags, dentry, &i_flags);
+ if (ret)
+ return ret;
if (fsflags & FS_NOATIME_FL)
i_flags |= S_NOATIME;
@@ -2630,10 +2679,12 @@ static void shmem_set_inode_flags(struct inode *inode, unsigned int fsflags)
/*
* But FS_NODUMP_FL does not require any action in i_flags.
*/
- inode_set_flags(inode, i_flags, S_NOATIME | S_APPEND | S_IMMUTABLE);
+ inode_set_flags(inode, i_flags, S_NOATIME | S_APPEND | S_IMMUTABLE | S_CASEFOLD);
+
+ return 0;
}
#else
-static void shmem_set_inode_flags(struct inode *inode, unsigned int fsflags)
+static void shmem_set_inode_flags(struct inode *inode, unsigned int fsflags, struct dentry *dentry)
{
}
#define shmem_initxattrs NULL
@@ -2680,7 +2731,7 @@ static struct inode *__shmem_get_inode(struct mnt_idmap *idmap,
info->fsflags = (dir == NULL) ? 0 :
SHMEM_I(dir)->fsflags & SHMEM_FL_INHERITED;
if (info->fsflags)
- shmem_set_inode_flags(inode, info->fsflags);
+ shmem_set_inode_flags(inode, info->fsflags, NULL);
INIT_LIST_HEAD(&info->shrinklist);
INIT_LIST_HEAD(&info->swaplist);
simple_xattrs_init(&info->xattrs);
@@ -3789,16 +3840,23 @@ static int shmem_fileattr_set(struct mnt_idmap *idmap,
{
struct inode *inode = d_inode(dentry);
struct shmem_inode_info *info = SHMEM_I(inode);
+ int ret, flags;
if (fileattr_has_fsx(fa))
return -EOPNOTSUPP;
if (fa->flags & ~SHMEM_FL_USER_MODIFIABLE)
return -EOPNOTSUPP;
- info->fsflags = (info->fsflags & ~SHMEM_FL_USER_MODIFIABLE) |
+ flags = (info->fsflags & ~SHMEM_FL_USER_MODIFIABLE) |
(fa->flags & SHMEM_FL_USER_MODIFIABLE);
- shmem_set_inode_flags(inode, info->fsflags);
+ ret = shmem_set_inode_flags(inode, flags, dentry);
+
+ if (ret)
+ return ret;
+
+ info->fsflags = flags;
+
inode_set_ctime_current(inode);
inode_inc_iversion(inode);
return 0;
--
2.46.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v3 8/9] tmpfs: Expose filesystem features via sysfs
2024-09-05 19:02 [PATCH v3 0/9] tmpfs: Add case-insensitive support for tmpfs André Almeida
` (6 preceding siblings ...)
2024-09-05 19:02 ` [PATCH v3 7/9] tmpfs: Add flag FS_CASEFOLD_FL support for tmpfs dirs André Almeida
@ 2024-09-05 19:02 ` André Almeida
2024-09-05 19:02 ` [PATCH v3 9/9] docs: tmpfs: Add casefold options André Almeida
8 siblings, 0 replies; 15+ messages in thread
From: André Almeida @ 2024-09-05 19:02 UTC (permalink / raw)
To: Hugh Dickins, Andrew Morton, Alexander Viro, Christian Brauner,
Jan Kara, krisman
Cc: linux-mm, linux-kernel, linux-fsdevel, kernel-dev,
Daniel Rosenberg, smcv, Christoph Hellwig, Theodore Ts'o,
André Almeida
Expose filesystem features through sysfs, so userspace can query if
tmpfs support casefold.
This follows the same setup as defined by ext4 and f2fs to expose
casefold support to userspace.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
---
mm/shmem.c | 37 +++++++++++++++++++++++++++++++++++++
1 file changed, 37 insertions(+)
diff --git a/mm/shmem.c b/mm/shmem.c
index d38977fb2097..5da90bdde4a5 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -5424,3 +5424,40 @@ struct page *shmem_read_mapping_page_gfp(struct address_space *mapping,
return page;
}
EXPORT_SYMBOL_GPL(shmem_read_mapping_page_gfp);
+
+#if defined(CONFIG_SYSFS) && defined(CONFIG_TMPFS)
+#if IS_ENABLED(CONFIG_UNICODE)
+static DEVICE_STRING_ATTR_RO(casefold, 0444, "supported");
+#endif
+
+static struct attribute *tmpfs_attributes[] = {
+#if IS_ENABLED(CONFIG_UNICODE)
+ &dev_attr_casefold.attr.attr,
+#endif
+ NULL
+};
+
+static const struct attribute_group tmpfs_attribute_group = {
+ .attrs = tmpfs_attributes,
+ .name = "features"
+};
+
+static struct kobject *tmpfs_kobj;
+
+static int __init tmpfs_sysfs_init(void)
+{
+ int ret;
+
+ tmpfs_kobj = kobject_create_and_add("tmpfs", fs_kobj);
+ if (!tmpfs_kobj)
+ return -ENOMEM;
+
+ ret = sysfs_create_group(tmpfs_kobj, &tmpfs_attribute_group);
+ if (ret)
+ kobject_put(tmpfs_kobj);
+
+ return ret;
+}
+
+fs_initcall(tmpfs_sysfs_init);
+#endif /* CONFIG_SYSFS && CONFIG_TMPFS */
--
2.46.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v3 9/9] docs: tmpfs: Add casefold options
2024-09-05 19:02 [PATCH v3 0/9] tmpfs: Add case-insensitive support for tmpfs André Almeida
` (7 preceding siblings ...)
2024-09-05 19:02 ` [PATCH v3 8/9] tmpfs: Expose filesystem features via sysfs André Almeida
@ 2024-09-05 19:02 ` André Almeida
2024-09-05 19:48 ` Gabriel Krisman Bertazi
8 siblings, 1 reply; 15+ messages in thread
From: André Almeida @ 2024-09-05 19:02 UTC (permalink / raw)
To: Hugh Dickins, Andrew Morton, Alexander Viro, Christian Brauner,
Jan Kara, krisman
Cc: linux-mm, linux-kernel, linux-fsdevel, kernel-dev,
Daniel Rosenberg, smcv, Christoph Hellwig, Theodore Ts'o,
André Almeida
Document mounting options for casefold support in tmpfs.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
---
Documentation/filesystems/tmpfs.rst | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/Documentation/filesystems/tmpfs.rst b/Documentation/filesystems/tmpfs.rst
index 56a26c843dbe..636afd3eaf48 100644
--- a/Documentation/filesystems/tmpfs.rst
+++ b/Documentation/filesystems/tmpfs.rst
@@ -241,6 +241,27 @@ So 'mount -t tmpfs -o size=10G,nr_inodes=10k,mode=700 tmpfs /mytmpfs'
will give you tmpfs instance on /mytmpfs which can allocate 10GB
RAM/SWAP in 10240 inodes and it is only accessible by root.
+tmpfs has the following mounting options for case-insensitive lookups support:
+
+================= ==============================================================
+casefold Enable casefold support at this mount point using the given
+ argument as the encoding standard. Currently only UTF-8
+ encodings are supported. If no argument is used, it will load
+ the latest UTF-8 encoding available.
+strict_encoding Enable strict encoding at this mount point (disabled by
+ default). In this mode, the filesystem refuses to create file
+ and directory with names containing invalid UTF-8 characters.
+================= ==============================================================
+
+Note that this option doesn't enable casefold by default; one needs to set
+casefold flag per directory, setting the +F attribute in an empty directory. New
+directories within a casefolded one will inherit the flag.
+
+Example::
+
+ $ mount -t tmpfs -o casefold=utf8-12.1.0,strict_enconding fs_name /mytmpfs
+ $ mount -t tmpfs -o casefold fs_name /mytmpfs
+
:Author:
Christoph Rohland <cr@sap.com>, 1.12.01
@@ -250,3 +271,5 @@ RAM/SWAP in 10240 inodes and it is only accessible by root.
KOSAKI Motohiro, 16 Mar 2010
:Updated:
Chris Down, 13 July 2020
+:Updated:
+ André Almeida, 23 Aug 2024
--
2.46.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v3 9/9] docs: tmpfs: Add casefold options
2024-09-05 19:02 ` [PATCH v3 9/9] docs: tmpfs: Add casefold options André Almeida
@ 2024-09-05 19:48 ` Gabriel Krisman Bertazi
0 siblings, 0 replies; 15+ messages in thread
From: Gabriel Krisman Bertazi @ 2024-09-05 19:48 UTC (permalink / raw)
To: André Almeida
Cc: Hugh Dickins, Andrew Morton, Alexander Viro, Christian Brauner,
Jan Kara, krisman, linux-mm, linux-kernel, linux-fsdevel,
kernel-dev, Daniel Rosenberg, smcv, Christoph Hellwig,
Theodore Ts'o
André Almeida <andrealmeid@igalia.com> writes:
> Document mounting options for casefold support in tmpfs.
>
> Signed-off-by: André Almeida <andrealmeid@igalia.com>
> ---
> Documentation/filesystems/tmpfs.rst | 23 +++++++++++++++++++++++
> 1 file changed, 23 insertions(+)
>
> diff --git a/Documentation/filesystems/tmpfs.rst b/Documentation/filesystems/tmpfs.rst
> index 56a26c843dbe..636afd3eaf48 100644
> --- a/Documentation/filesystems/tmpfs.rst
> +++ b/Documentation/filesystems/tmpfs.rst
> @@ -241,6 +241,27 @@ So 'mount -t tmpfs -o size=10G,nr_inodes=10k,mode=700 tmpfs /mytmpfs'
> will give you tmpfs instance on /mytmpfs which can allocate 10GB
> RAM/SWAP in 10240 inodes and it is only accessible by root.
>
> +tmpfs has the following mounting options for case-insensitive lookups
> support:
lookups->lookup
> +
> +================= ==============================================================
> +casefold Enable casefold support at this mount point using the given
> + argument as the encoding standard. Currently only UTF-8
> + encodings are supported. If no argument is used, it will load
> + the latest UTF-8 encoding available.
> +strict_encoding Enable strict encoding at this mount point (disabled by
> + default). In this mode, the filesystem refuses to create file
> + and directory with names containing invalid UTF-8 characters.
> +================= ==============================================================
> +
> +Note that this option doesn't enable casefold by default;
I think this is fine as is. but if we need a new iteration, could you
perhaps rephrase this to something like:
This option doesn't render the entire filesystem case-insensitive.
One needs to still set the casefold flag per directory, by flipping +F
attribute in an empty directory. Nevertheless, new directories will
inherit the attribute. The mountpoint itself will cannot be made
case-insensitive.
> +
> +Example::
> +
> + $ mount -t tmpfs -o casefold=utf8-12.1.0,strict_enconding fs_name /mytmpfs
strict_encoding
> + $ mount -t tmpfs -o casefold fs_name /mytmpfs
> +
>
> :Author:
> Christoph Rohland <cr@sap.com>, 1.12.01
> @@ -250,3 +271,5 @@ RAM/SWAP in 10240 inodes and it is only accessible by root.
> KOSAKI Motohiro, 16 Mar 2010
> :Updated:
> Chris Down, 13 July 2020
> +:Updated:
> + André Almeida, 23 Aug 2024
--
Gabriel Krisman Bertazi
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 4/9] unicode: Export latest available UTF-8 version number
2024-09-05 19:02 ` [PATCH v3 4/9] unicode: Export latest available UTF-8 version number André Almeida
@ 2024-09-05 19:57 ` Gabriel Krisman Bertazi
0 siblings, 0 replies; 15+ messages in thread
From: Gabriel Krisman Bertazi @ 2024-09-05 19:57 UTC (permalink / raw)
To: André Almeida
Cc: Hugh Dickins, Andrew Morton, Alexander Viro, Christian Brauner,
Jan Kara, krisman, linux-mm, linux-kernel, linux-fsdevel,
kernel-dev, Daniel Rosenberg, smcv, Christoph Hellwig,
Theodore Ts'o
André Almeida <andrealmeid@igalia.com> writes:
> Export latest available UTF-8 version number so filesystems can easily
> load the newest one.
>
> Signed-off-by: André Almeida <andrealmeid@igalia.com>
> ---
>
> If this is the accepted way of doing that, I will also add something to
> checkpatch to warn that modifications at fs/unicode/utf8data.c likely
> need to change this define.
I'd do it by special casing version == 0 or -1 to utf8_load. But the
way you've done is just fine.
Acked-by: Gabriel Krisman Bertazi <krisman@suse.de>
--
Gabriel Krisman Bertazi
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 6/9] tmpfs: Add casefold lookup support
2024-09-05 19:02 ` [PATCH v3 6/9] tmpfs: Add casefold lookup support André Almeida
@ 2024-09-05 21:28 ` Gabriel Krisman Bertazi
2024-09-06 14:59 ` André Almeida
0 siblings, 1 reply; 15+ messages in thread
From: Gabriel Krisman Bertazi @ 2024-09-05 21:28 UTC (permalink / raw)
To: André Almeida
Cc: Hugh Dickins, Andrew Morton, Alexander Viro, Christian Brauner,
Jan Kara, krisman, linux-mm, linux-kernel, linux-fsdevel,
kernel-dev, Daniel Rosenberg, smcv, Christoph Hellwig,
Theodore Ts'o
Hi,
André Almeida <andrealmeid@igalia.com> writes:
> @@ -3427,6 +3431,10 @@ shmem_mknod(struct mnt_idmap *idmap, struct inode *dir,
> if (IS_ERR(inode))
> return PTR_ERR(inode);
>
> + if (IS_ENABLED(CONFIG_UNICODE))
> + if (!generic_ci_validate_strict_name(dir, &dentry->d_name))
> + return -EINVAL;
> +
if (IS_ENABLED(CONFIG_UNICODE) &&
generic_ci_validate_strict_name(dir, &dentry->d_name))
> static const struct constant_table shmem_param_enums_huge[] = {
> @@ -4081,9 +4111,62 @@ const struct fs_parameter_spec shmem_fs_parameters[] = {
> fsparam_string("grpquota_block_hardlimit", Opt_grpquota_block_hardlimit),
> fsparam_string("grpquota_inode_hardlimit", Opt_grpquota_inode_hardlimit),
> #endif
> + fsparam_string("casefold", Opt_casefold_version),
> + fsparam_flag ("casefold", Opt_casefold),
> + fsparam_flag ("strict_encoding", Opt_strict_encoding),
I don't know if it is possible, but can we do it with a single parameter?
> +static int shmem_parse_opt_casefold(struct fs_context *fc, struct fs_parameter *param,
> + bool latest_version)
Instead of the boolean, can't you check if param->string != NULL? (real
question, I never used fs_parameter.
> +{
> + struct shmem_options *ctx = fc->fs_private;
> + unsigned int maj = 0, min = 0, rev = 0, version = 0;
> + struct unicode_map *encoding;
> + char *version_str = param->string + 5;
> + int ret;
unsigned int version = UTF8_LATEST;
and kill the if/else below:
> +
> + if (latest_version) {
> + version = UTF8_LATEST;
> + } else {
> + if (strncmp(param->string, "utf8-", 5))
> + return invalfc(fc, "Only UTF-8 encodings are supported "
> + "in the format: utf8-<version number>");
> +
> + ret = utf8_parse_version(version_str, &maj, &min, &rev);
utf8_parse_version interface could return UNICODE_AGE() already, so we hide the details
from the caller. wdyt?
> + if (ret)
> + return invalfc(fc, "Invalid UTF-8 version: %s", version_str);
> +
> + version = UNICODE_AGE(maj, min, rev);
> + }
> +
> + encoding = utf8_load(version);
> +
> + if (IS_ERR(encoding)) {
> + if (latest_version)
> + return invalfc(fc, "Failed loading latest UTF-8 version");
> + else
> + return invalfc(fc, "Failed loading UTF-8 version: %s", version_str);
The following covers both legs (untested):
if (IS_ERR(encoding))
return invalfc(fc, "Failed loading UTF-8 version: utf8-%u.%u.%u\n"",
unicode_maj(version), unicode_min(version), unicode_rev(version));
> + if (latest_version)
> + pr_info("tmpfs: Using the latest UTF-8 version available");
> + else
> + pr_info("tmpfs: Using encoding provided by mount
> options: %s\n", param->string);
The following covers both legs (untested):
pr_info (fc, "tmpfs: Using encoding : utf8-%u.%u.%u\n"
unicode_maj(version), unicode_min(version), unicode_rev(version));
> +
> + ctx->encoding = encoding;
> +
> + return 0;
> +}
> +#else
> +static int shmem_parse_opt_casefold(struct fs_context *fc, struct fs_parameter *param,
> + bool latest_version)
> +{
> + return invalfc(fc, "tmpfs: No kernel support for casefold filesystems\n");
> +}
A message like "Kernel not built with CONFIG_UNICODE" immediately tells
you how to fix it.
> @@ -4515,6 +4610,16 @@ static int shmem_fill_super(struct super_block *sb, struct fs_context *fc)
> }
> sb->s_export_op = &shmem_export_ops;
> sb->s_flags |= SB_NOSEC | SB_I_VERSION;
> +
> +#if IS_ENABLED(CONFIG_UNICODE)
> + if (ctx->encoding) {
> + sb->s_encoding = ctx->encoding;
> + generic_set_sb_d_ops(sb);
This is the right place for setting d_ops (see the next comment), but you
should be loading generic_ci_always_del_dentry_ops, right?
Also, since generic_ci_always_del_dentry_ops is only used by this one,
can you move it to this file?
> +static struct dentry *shmem_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags)
> +{
> + const struct dentry_operations *d_ops = &simple_dentry_operations;
> +
> +#if IS_ENABLED(CONFIG_UNICODE)
> + if (dentry->d_sb->s_encoding)
> + d_ops = &generic_ci_always_del_dentry_ops;
> +#endif
This needs to be done at mount time through sb->s_d_op. See
https://lore.kernel.org/all/20240221171412.10710-1-krisman@suse.de/
I suppose we can do it at mount-time for
generic_ci_always_del_dentry_ops and simple_dentry_operations.
> +
> + if (dentry->d_name.len > NAME_MAX)
> + return ERR_PTR(-ENAMETOOLONG);
> +
> + if (!dentry->d_sb->s_d_op)
> + d_set_d_op(dentry, d_ops);
> +
> + /*
> + * For now, VFS can't deal with case-insensitive negative dentries, so
> + * we prevent them from being created
> + */
> + if (IS_ENABLED(CONFIG_UNICODE) && IS_CASEFOLDED(dir))
> + return NULL;
Thinking out loud:
I misunderstood always_delete_dentry before. It removes negative
dentries right after the lookup, since ->d_delete is called on dput.
But you still need this check here, IMO, to prevent the negative dentry
from ever being hashed. Otherwise it can be found by a concurrent
lookup. And you cannot drop ->d_delete from the case-insensitive
operations too, because we still wants it for !IS_CASEFOLDED(dir).
The window is that, without this code, the negative dentry dentry would
be hashed in d_add() and a concurrent lookup might find it between that
time and the d_put, where it is removed at the end of the concurrent
lookup.
All of this would hopefully go away with the negative dentry for
casefolded directories.
> +
> + d_add(dentry, NULL);
> +
> + return NULL;
> +}
The sole reason you are doing this custom function is to exclude negative
dentries from casefolded directories. I doubt we care about the extra
check being done. Can we just do it in simple_lookup?
> +
> static const struct inode_operations shmem_dir_inode_operations = {
> #ifdef CONFIG_TMPFS
> .getattr = shmem_getattr,
> .create = shmem_create,
> - .lookup = simple_lookup,
> + .lookup = shmem_lookup,
> .link = shmem_link,
> .unlink = shmem_unlink,
> .symlink = shmem_symlink,
> @@ -4791,6 +4923,8 @@ int shmem_init_fs_context(struct fs_context *fc)
> ctx->uid = current_fsuid();
> ctx->gid = current_fsgid();
>
> + ctx->encoding = NULL;
> +
> fc->fs_private = ctx;
> fc->ops = &shmem_fs_context_ops;
> return 0;
--
Gabriel Krisman Bertazi
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 6/9] tmpfs: Add casefold lookup support
2024-09-05 21:28 ` Gabriel Krisman Bertazi
@ 2024-09-06 14:59 ` André Almeida
2024-09-09 14:15 ` Gabriel Krisman Bertazi
0 siblings, 1 reply; 15+ messages in thread
From: André Almeida @ 2024-09-06 14:59 UTC (permalink / raw)
To: Gabriel Krisman Bertazi
Cc: Hugh Dickins, Andrew Morton, Alexander Viro, Christian Brauner,
Jan Kara, krisman, linux-mm, linux-kernel, linux-fsdevel,
kernel-dev, Daniel Rosenberg, smcv, Christoph Hellwig,
Theodore Ts'o
Hey!
On 9/5/24 18:28, Gabriel Krisman Bertazi wrote:
> Hi,
>
> André Almeida <andrealmeid@igalia.com> writes:
>> @@ -3427,6 +3431,10 @@ shmem_mknod(struct mnt_idmap *idmap, struct inode *dir,
>> if (IS_ERR(inode))
>> return PTR_ERR(inode);
>>
>> + if (IS_ENABLED(CONFIG_UNICODE))
>> + if (!generic_ci_validate_strict_name(dir, &dentry->d_name))
>> + return -EINVAL;
>> +
> if (IS_ENABLED(CONFIG_UNICODE) &&
> generic_ci_validate_strict_name(dir, &dentry->d_name))
>
>> static const struct constant_table shmem_param_enums_huge[] = {
>> @@ -4081,9 +4111,62 @@ const struct fs_parameter_spec shmem_fs_parameters[] = {
>> fsparam_string("grpquota_block_hardlimit", Opt_grpquota_block_hardlimit),
>> fsparam_string("grpquota_inode_hardlimit", Opt_grpquota_inode_hardlimit),
>> #endif
>> + fsparam_string("casefold", Opt_casefold_version),
>> + fsparam_flag ("casefold", Opt_casefold),
>> + fsparam_flag ("strict_encoding", Opt_strict_encoding),
> I don't know if it is possible, but can we do it with a single parameter?
I tried, but when you use casefold with no args, the code fails
somewhere before that, claiming that there's no arg.
>> +static int shmem_parse_opt_casefold(struct fs_context *fc, struct fs_parameter *param,
>> + bool latest_version)
> Instead of the boolean, can't you check if param->string != NULL? (real
> question, I never used fs_parameter.
>
>> +{
>> + struct shmem_options *ctx = fc->fs_private;
>> + unsigned int maj = 0, min = 0, rev = 0, version = 0;
>> + struct unicode_map *encoding;
>> + char *version_str = param->string + 5;
>> + int ret;
> unsigned int version = UTF8_LATEST;
>
> and kill the if/else below:
>> +
>> + if (latest_version) {
>> + version = UTF8_LATEST;
>> + } else {
>> + if (strncmp(param->string, "utf8-", 5))
>> + return invalfc(fc, "Only UTF-8 encodings are supported "
>> + "in the format: utf8-<version number>");
>> +
>> + ret = utf8_parse_version(version_str, &maj, &min, &rev);
> utf8_parse_version interface could return UNICODE_AGE() already, so we hide the details
> from the caller. wdyt?
I like it!
>
>> + if (ret)
>> + return invalfc(fc, "Invalid UTF-8 version: %s", version_str);
>> +
>> + version = UNICODE_AGE(maj, min, rev);
>> + }
>> +
>> + encoding = utf8_load(version);
>> +
>> + if (IS_ERR(encoding)) {
>> + if (latest_version)
>> + return invalfc(fc, "Failed loading latest UTF-8 version");
>> + else
>> + return invalfc(fc, "Failed loading UTF-8 version: %s", version_str);
> The following covers both legs (untested):
>
> if (IS_ERR(encoding))
> return invalfc(fc, "Failed loading UTF-8 version: utf8-%u.%u.%u\n"",
> unicode_maj(version), unicode_min(version), unicode_rev(version));
>
>> + if (latest_version)
>> + pr_info("tmpfs: Using the latest UTF-8 version available");
>> + else
>> + pr_info("tmpfs: Using encoding provided by mount
>> options: %s\n", param->string);
> The following covers both legs (untested):
>
> pr_info (fc, "tmpfs: Using encoding : utf8-%u.%u.%u\n"
> unicode_maj(version), unicode_min(version), unicode_rev(version));
>
>> +
>> + ctx->encoding = encoding;
>> +
>> + return 0;
>> +}
>> +#else
>> +static int shmem_parse_opt_casefold(struct fs_context *fc, struct fs_parameter *param,
>> + bool latest_version)
>> +{
>> + return invalfc(fc, "tmpfs: No kernel support for casefold filesystems\n");
>> +}
> A message like "Kernel not built with CONFIG_UNICODE" immediately tells
> you how to fix it.
>
>> @@ -4515,6 +4610,16 @@ static int shmem_fill_super(struct super_block *sb, struct fs_context *fc)
>> }
>> sb->s_export_op = &shmem_export_ops;
>> sb->s_flags |= SB_NOSEC | SB_I_VERSION;
>> +
>> +#if IS_ENABLED(CONFIG_UNICODE)
>> + if (ctx->encoding) {
>> + sb->s_encoding = ctx->encoding;
>> + generic_set_sb_d_ops(sb);
> This is the right place for setting d_ops (see the next comment), but you
> should be loading generic_ci_always_del_dentry_ops, right?
>
> Also, since generic_ci_always_del_dentry_ops is only used by this one,
> can you move it to this file?
>
>> +static struct dentry *shmem_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags)
>> +{
>> + const struct dentry_operations *d_ops = &simple_dentry_operations;
>> +
>> +#if IS_ENABLED(CONFIG_UNICODE)
>> + if (dentry->d_sb->s_encoding)
>> + d_ops = &generic_ci_always_del_dentry_ops;
>> +#endif
> This needs to be done at mount time through sb->s_d_op. See
>
> https://lore.kernel.org/all/20240221171412.10710-1-krisman@suse.de/
>
> I suppose we can do it at mount-time for
> generic_ci_always_del_dentry_ops and simple_dentry_operations.
>
>> +
>> + if (dentry->d_name.len > NAME_MAX)
>> + return ERR_PTR(-ENAMETOOLONG);
>> +
>> + if (!dentry->d_sb->s_d_op)
>> + d_set_d_op(dentry, d_ops);
>> +
>> + /*
>> + * For now, VFS can't deal with case-insensitive negative dentries, so
>> + * we prevent them from being created
>> + */
>> + if (IS_ENABLED(CONFIG_UNICODE) && IS_CASEFOLDED(dir))
>> + return NULL;
> Thinking out loud:
>
> I misunderstood always_delete_dentry before. It removes negative
> dentries right after the lookup, since ->d_delete is called on dput.
>
> But you still need this check here, IMO, to prevent the negative dentry
> from ever being hashed. Otherwise it can be found by a concurrent
> lookup. And you cannot drop ->d_delete from the case-insensitive
> operations too, because we still wants it for !IS_CASEFOLDED(dir).
>
> The window is that, without this code, the negative dentry dentry would
> be hashed in d_add() and a concurrent lookup might find it between that
> time and the d_put, where it is removed at the end of the concurrent
> lookup.
>
> All of this would hopefully go away with the negative dentry for
> casefolded directories.
>
>> +
>> + d_add(dentry, NULL);
>> +
>> + return NULL;
>> +}
> The sole reason you are doing this custom function is to exclude negative
> dentries from casefolded directories. I doubt we care about the extra
> check being done. Can we just do it in simple_lookup?
So, in summary:
* set d_ops at mount time to generic_ci_always_del_dentry_ops
* use simple_lookup(), get rid of shmem_lookup()
* inside of simple_lookup(), add (IS_CASEFOLDED(dir)) return NULL
Right?
>> +
>> static const struct inode_operations shmem_dir_inode_operations = {
>> #ifdef CONFIG_TMPFS
>> .getattr = shmem_getattr,
>> .create = shmem_create,
>> - .lookup = simple_lookup,
>> + .lookup = shmem_lookup,
>> .link = shmem_link,
>> .unlink = shmem_unlink,
>> .symlink = shmem_symlink,
>> @@ -4791,6 +4923,8 @@ int shmem_init_fs_context(struct fs_context *fc)
>> ctx->uid = current_fsuid();
>> ctx->gid = current_fsgid();
>>
>> + ctx->encoding = NULL;
>> +
>> fc->fs_private = ctx;
>> fc->ops = &shmem_fs_context_ops;
>> return 0;
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 6/9] tmpfs: Add casefold lookup support
2024-09-06 14:59 ` André Almeida
@ 2024-09-09 14:15 ` Gabriel Krisman Bertazi
0 siblings, 0 replies; 15+ messages in thread
From: Gabriel Krisman Bertazi @ 2024-09-09 14:15 UTC (permalink / raw)
To: André Almeida
Cc: Gabriel Krisman Bertazi, Hugh Dickins, Andrew Morton,
Alexander Viro, Christian Brauner, Jan Kara, krisman, linux-mm,
linux-kernel, linux-fsdevel, kernel-dev, Daniel Rosenberg, smcv,
Christoph Hellwig, Theodore Ts'o
André Almeida <andrealmeid@igalia.com> writes:
>> The sole reason you are doing this custom function is to exclude negative
>> dentries from casefolded directories. I doubt we care about the extra
>> check being done. Can we just do it in simple_lookup?
>
> So, in summary:
>
> * set d_ops at mount time to generic_ci_always_del_dentry_ops
> * use simple_lookup(), get rid of shmem_lookup()
> * inside of simple_lookup(), add (IS_CASEFOLDED(dir)) return NULL
>
> Right?
Yep, that's my suggestion.
--
Gabriel Krisman Bertazi
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2024-09-09 14:15 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-05 19:02 [PATCH v3 0/9] tmpfs: Add case-insensitive support for tmpfs André Almeida
2024-09-05 19:02 ` [PATCH v3 1/9] libfs: Create the helper function generic_ci_validate_strict_name() André Almeida
2024-09-05 19:02 ` [PATCH v3 2/9] ext4: Use generic_ci_validate_strict_name helper André Almeida
2024-09-05 19:02 ` [PATCH v3 3/9] unicode: Recreate utf8_parse_version() André Almeida
2024-09-05 19:02 ` [PATCH v3 4/9] unicode: Export latest available UTF-8 version number André Almeida
2024-09-05 19:57 ` Gabriel Krisman Bertazi
2024-09-05 19:02 ` [PATCH v3 5/9] libfs: Create the helper struct generic_ci_always_del_dentry_ops André Almeida
2024-09-05 19:02 ` [PATCH v3 6/9] tmpfs: Add casefold lookup support André Almeida
2024-09-05 21:28 ` Gabriel Krisman Bertazi
2024-09-06 14:59 ` André Almeida
2024-09-09 14:15 ` Gabriel Krisman Bertazi
2024-09-05 19:02 ` [PATCH v3 7/9] tmpfs: Add flag FS_CASEFOLD_FL support for tmpfs dirs André Almeida
2024-09-05 19:02 ` [PATCH v3 8/9] tmpfs: Expose filesystem features via sysfs André Almeida
2024-09-05 19:02 ` [PATCH v3 9/9] docs: tmpfs: Add casefold options André Almeida
2024-09-05 19:48 ` Gabriel Krisman Bertazi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).