* [PATCH 0/5] samples/kernfs: Add a pseudo-filesystem to demonstrate kernfs usage
@ 2025-01-21 15:36 David Reaver
2025-01-21 15:46 ` [PATCH 1/5] samples/kernfs: Adds boilerplate/README for sample_kernfs David Reaver
` (5 more replies)
0 siblings, 6 replies; 16+ messages in thread
From: David Reaver @ 2025-01-21 15:36 UTC (permalink / raw)
To: Greg Kroah-Hartman, Tejun Heo
Cc: David Reaver, Steven Rostedt, Christian Brauner, Al Viro,
Jonathan Corbet, James Bottomley, Krister Johansen, linux-fsdevel
This patch series creates a toy pseudo-filesystem built on top of kernfs in
samples/kernfs/.
kernfs underpins the sysfs and cgroup filesystems. Many kernel developers have
considered kernfs for other pseudo-filesystems [1][2] and a draft patch was
proposed to investigate moving tracefs to kernfs [3]. One reason kernfs isn't
used more is it is almost entirely undocumented; I certainly had to read almost
all of the kernfs code to implement this toy filesystem. This sample aims to
improve kernfs documentation by way of an example.
The README.rst file in the first patch describes how sample_kernfs works from a
user's perspective. Summary: the filesystem automatically populates directories
with counter files that increment every time they are read. Users can adjust the
increment via inc files. Counter files can be reset by writing a new value to
them.
Subsequent patches build the rest of the filesystem. The commits are structured
to guide readers in learning kernfs components and adapting them to build their
own filesystems. If reviewers would prefer this all to be in one commit, I'm
happy to do that too. Initially, I included a more complex example where you
could read the sum of all child directory counters in a parent directory, but I
didn't want to complicate the sample too much and distract from kernfs. I’m
happy to remove the inc file if reviewers feel it's unnecessary. It is funny how
even a toy can suffer from feature creep :)
This is my first substantial kernel patch, so I welcome feedback on any trivial
errors. I tested this filesystem with all of the CONFIG_DEBUG_* and similar
options I could find and I ensured none of them report any issues. They were
particularly useful when debugging a deadlock that required replacing
kernfs_remove() with kernfs_remove_self(), and discovering a memory leak fixed
with kernfs_put().
In the future, I hope to contribute further by writing documentation for kernfs
and exploring the possibility of porting debugfs and/or tracefs to kernfs (like
completing the draft in [3]). I'm curious if the reviewers feel any of those
ideas are worth doing right now.
Link: https://lwn.net/Articles/960088/ [1]
Link: https://lwn.net/Articles/981155/ [2]
Link: https://lore.kernel.org/all/20240131-tracefs-kernfs-v1-0-f20e2e9a8d61@kernel.org/ [3]
David Reaver (5):
samples/kernfs: Adds boilerplate/README for sample_kernfs
samples/kernfs: Make filesystem mountable
samples/kernfs: Add counter file to each directory
samples/kernfs: Allow creating and removing directories
samples/kernfs: Add inc file to allow changing counter increment
MAINTAINERS | 1 +
samples/Kconfig | 6 +
samples/Makefile | 1 +
samples/kernfs/Makefile | 3 +
samples/kernfs/README.rst | 55 ++++++
samples/kernfs/sample_kernfs.c | 321 +++++++++++++++++++++++++++++++++
6 files changed, 387 insertions(+)
create mode 100644 samples/kernfs/Makefile
create mode 100644 samples/kernfs/README.rst
create mode 100644 samples/kernfs/sample_kernfs.c
base-commit: fda5e3f284002ea55dac1c98c1498d6dd684046e
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/5] samples/kernfs: Adds boilerplate/README for sample_kernfs
2025-01-21 15:36 [PATCH 0/5] samples/kernfs: Add a pseudo-filesystem to demonstrate kernfs usage David Reaver
@ 2025-01-21 15:46 ` David Reaver
2025-01-21 15:46 ` [PATCH 2/5] samples/kernfs: Make filesystem mountable David Reaver
` (4 subsequent siblings)
5 siblings, 0 replies; 16+ messages in thread
From: David Reaver @ 2025-01-21 15:46 UTC (permalink / raw)
To: Greg Kroah-Hartman, Tejun Heo
Cc: David Reaver, Steven Rostedt, Christian Brauner, Al Viro,
Jonathan Corbet, James Bottomley, Krister Johansen, linux-fsdevel
Adds the necessary Kconfig/Makefile boilerplate to get sample_kernfs
compiled into the kernel. Also adds a README.rst file to describe how the
filesystem works from a user's perspective.
Signed-off-by: David Reaver <me@davidreaver.com>
---
MAINTAINERS | 1 +
samples/Kconfig | 6 ++++
samples/Makefile | 1 +
samples/kernfs/Makefile | 3 ++
samples/kernfs/README.rst | 55 ++++++++++++++++++++++++++++++++++
samples/kernfs/sample_kernfs.c | 20 +++++++++++++
6 files changed, 86 insertions(+)
create mode 100644 samples/kernfs/Makefile
create mode 100644 samples/kernfs/README.rst
create mode 100644 samples/kernfs/sample_kernfs.c
diff --git a/MAINTAINERS b/MAINTAINERS
index 0fa7c5728f1e..5791aced4b93 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12702,6 +12702,7 @@ S: Supported
T: git git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git
F: fs/kernfs/
F: include/linux/kernfs.h
+F: samples/kernfs/
KEXEC
M: Eric Biederman <ebiederm@xmission.com>
diff --git a/samples/Kconfig b/samples/Kconfig
index b288d9991d27..968294ffb35d 100644
--- a/samples/Kconfig
+++ b/samples/Kconfig
@@ -291,6 +291,12 @@ config SAMPLE_CGROUP
help
Build samples that demonstrate the usage of the cgroup API.
+config SAMPLE_KERNFS
+ bool "Build sample_kernfs pseudo-filesystem."
+ help
+ Build a sample pseudo-filesystem that demonstrates the use of the
+ kernfs API. The filesystem name is sample_kernfs.
+
source "samples/rust/Kconfig"
endif # SAMPLES
diff --git a/samples/Makefile b/samples/Makefile
index b85fa64390c5..e024e76e396d 100644
--- a/samples/Makefile
+++ b/samples/Makefile
@@ -9,6 +9,7 @@ obj-$(CONFIG_SAMPLE_CONNECTOR) += connector/
obj-$(CONFIG_SAMPLE_FANOTIFY_ERROR) += fanotify/
subdir-$(CONFIG_SAMPLE_HIDRAW) += hidraw
obj-$(CONFIG_SAMPLE_HW_BREAKPOINT) += hw_breakpoint/
+obj-$(CONFIG_SAMPLE_KERNFS) += kernfs/
obj-$(CONFIG_SAMPLE_KDB) += kdb/
obj-$(CONFIG_SAMPLE_KFIFO) += kfifo/
obj-$(CONFIG_SAMPLE_KOBJECT) += kobject/
diff --git a/samples/kernfs/Makefile b/samples/kernfs/Makefile
new file mode 100644
index 000000000000..3bd2e4773b91
--- /dev/null
+++ b/samples/kernfs/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+obj-$(CONFIG_SAMPLE_KERNFS) += sample_kernfs.o
diff --git a/samples/kernfs/README.rst b/samples/kernfs/README.rst
new file mode 100644
index 000000000000..e0e747514df1
--- /dev/null
+++ b/samples/kernfs/README.rst
@@ -0,0 +1,55 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===================================================
+Sample pseudo-filesystem built on top of ``kernfs``
+===================================================
+
+This directory contains a kernel module that implements a pseudo-filesystem
+built on top of ``kernfs`` and it demonstrates the basic of how to use ``kernfs``.
+
+Usage
+=====
+
+Compile your kernel with ``CONFIG_SAMPLE_KERNFS=y`` and create a
+``sample_kernfs`` mount with::
+
+ # mkdir /sample_kernfs
+ # mount -t sample_kernfs none /sample_kernfs
+
+Filesystem layout
+=================
+
+The filesystem contains a tree of counters. Here is an example, where
+``sample_kernfs`` is mounted at ``/sample_kernfs``::
+
+ /sample_kernfs
+ ├── counter
+ ├── inc
+ ├── sub1/
+ │ ├── counter
+ │ └── inc
+ └── sub2/
+ ├── counter
+ ├── inc
+ ├── sub3/
+ │ ├── counter
+ │ └── inc
+ └── sub4/
+ ├── counter
+ └── inc
+
+When a directory is created, it is automatically populated with two files:
+``counter`` and ``inc``. ``counter`` reports the current count for that node,
+and every time it is read it increments by the value in ``inc``. ``counter`` can
+be reset to a given value by writing that value to the ``counter`` file::
+
+ $ cat counter
+ 1
+ $ cat counter
+ 2
+ $ echo 4 > counter
+ $ cat counter
+ 5
+ $ echo 3 > inc
+ $ cat counter
+ 8
diff --git a/samples/kernfs/sample_kernfs.c b/samples/kernfs/sample_kernfs.c
new file mode 100644
index 000000000000..82d4b73a4534
--- /dev/null
+++ b/samples/kernfs/sample_kernfs.c
@@ -0,0 +1,20 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * A sample kernel module showing how to build a pseudo-filesystem on top of
+ * kernfs.
+ */
+
+#define pr_fmt(fmt) "%s: " fmt, __func__
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+
+static int __init sample_kernfs_init(void)
+{
+ pr_info("Loaded sample_kernfs module.\n");
+ return 0;
+}
+
+module_init(sample_kernfs_init)
+MODULE_DESCRIPTION("Sample kernel module showing how to use kernfs");
+MODULE_LICENSE("GPL");
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 2/5] samples/kernfs: Make filesystem mountable
2025-01-21 15:36 [PATCH 0/5] samples/kernfs: Add a pseudo-filesystem to demonstrate kernfs usage David Reaver
2025-01-21 15:46 ` [PATCH 1/5] samples/kernfs: Adds boilerplate/README for sample_kernfs David Reaver
@ 2025-01-21 15:46 ` David Reaver
2025-01-21 15:46 ` [PATCH 3/5] samples/kernfs: Add counter file to each directory David Reaver
` (3 subsequent siblings)
5 siblings, 0 replies; 16+ messages in thread
From: David Reaver @ 2025-01-21 15:46 UTC (permalink / raw)
To: Greg Kroah-Hartman, Tejun Heo
Cc: David Reaver, Steven Rostedt, Christian Brauner, Al Viro,
Jonathan Corbet, James Bottomley, Krister Johansen, linux-fsdevel
Implements the bare minimum functionality to safely mount and unmount the
sample_kernfs filesystem.
Signed-off-by: David Reaver <me@davidreaver.com>
---
samples/kernfs/sample_kernfs.c | 69 +++++++++++++++++++++++++++++++++-
1 file changed, 68 insertions(+), 1 deletion(-)
diff --git a/samples/kernfs/sample_kernfs.c b/samples/kernfs/sample_kernfs.c
index 82d4b73a4534..3ea8411a72ae 100644
--- a/samples/kernfs/sample_kernfs.c
+++ b/samples/kernfs/sample_kernfs.c
@@ -6,12 +6,79 @@
#define pr_fmt(fmt) "%s: " fmt, __func__
+#include <linux/fs.h>
+#include <linux/fs_context.h>
+#include <linux/kernfs.h>
#include <linux/kernel.h>
#include <linux/module.h>
+#define SAMPLE_KERNFS_MAGIC 0x8d000ff0
+
+static void sample_kernfs_fs_context_free(struct fs_context *fc)
+{
+ struct kernfs_fs_context *kfc = fc->fs_private;
+
+ kernfs_free_fs_context(fc);
+ kfree(kfc);
+}
+
+static const struct fs_context_operations sample_kernfs_fs_context_ops = {
+ .get_tree = kernfs_get_tree,
+ .free = sample_kernfs_fs_context_free,
+};
+
+static int sample_kernfs_init_fs_context(struct fs_context *fc)
+{
+ struct kernfs_fs_context *kfc;
+ struct kernfs_root *root;
+ int err;
+
+ kfc = kzalloc(sizeof(struct kernfs_fs_context), GFP_KERNEL);
+ if (!kfc)
+ return -ENOMEM;
+
+ root = kernfs_create_root(NULL, 0, NULL);
+ if (IS_ERR(root)) {
+ err = PTR_ERR(root);
+ goto err_free_kfc;
+ }
+
+ kfc->root = root;
+ kfc->magic = SAMPLE_KERNFS_MAGIC;
+ fc->fs_private = kfc;
+ fc->ops = &sample_kernfs_fs_context_ops;
+ fc->global = true;
+
+ return 0;
+
+err_free_kfc:
+ kfree(kfc);
+ return err;
+}
+
+static void sample_kernfs_kill_sb(struct super_block *sb)
+{
+ struct kernfs_root *root = kernfs_root_from_sb(sb);
+
+ kernfs_kill_sb(sb);
+ kernfs_destroy_root(root);
+}
+
+static struct file_system_type sample_kernfs_fs_type = {
+ .name = "sample_kernfs",
+ .init_fs_context = sample_kernfs_init_fs_context,
+ .kill_sb = sample_kernfs_kill_sb,
+ .fs_flags = FS_USERNS_MOUNT,
+};
+
static int __init sample_kernfs_init(void)
{
- pr_info("Loaded sample_kernfs module.\n");
+ int err;
+
+ err = register_filesystem(&sample_kernfs_fs_type);
+ if (err)
+ return err;
+
return 0;
}
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 3/5] samples/kernfs: Add counter file to each directory
2025-01-21 15:36 [PATCH 0/5] samples/kernfs: Add a pseudo-filesystem to demonstrate kernfs usage David Reaver
2025-01-21 15:46 ` [PATCH 1/5] samples/kernfs: Adds boilerplate/README for sample_kernfs David Reaver
2025-01-21 15:46 ` [PATCH 2/5] samples/kernfs: Make filesystem mountable David Reaver
@ 2025-01-21 15:46 ` David Reaver
2025-01-21 15:47 ` [PATCH 4/5] samples/kernfs: Allow creating and removing directories David Reaver
` (2 subsequent siblings)
5 siblings, 0 replies; 16+ messages in thread
From: David Reaver @ 2025-01-21 15:46 UTC (permalink / raw)
To: Greg Kroah-Hartman, Tejun Heo
Cc: David Reaver, Steven Rostedt, Christian Brauner, Al Viro,
Jonathan Corbet, James Bottomley, Krister Johansen, linux-fsdevel
The counter file is automatically added to all sample_kernfs
directories (including the root directory). This demonstrates how to tie an
internal datastructure -- sample_kernfs_directory in this case -- to kernfs
nodes via kernfs_node->priv. Also demonstrates how to read and write simple
integer values to/from kernfs files.
Signed-off-by: David Reaver <me@davidreaver.com>
---
samples/kernfs/sample_kernfs.c | 110 ++++++++++++++++++++++++++++++++-
1 file changed, 108 insertions(+), 2 deletions(-)
diff --git a/samples/kernfs/sample_kernfs.c b/samples/kernfs/sample_kernfs.c
index 3ea8411a72ae..b6d44fc3b935 100644
--- a/samples/kernfs/sample_kernfs.c
+++ b/samples/kernfs/sample_kernfs.c
@@ -14,6 +14,93 @@
#define SAMPLE_KERNFS_MAGIC 0x8d000ff0
+/**
+ * struct sample_kernfs_directory - Represents a directory in the pseudo-filesystem
+ * @count: Holds the current count in the counter file.
+ */
+struct sample_kernfs_directory {
+ atomic64_t count;
+};
+
+static struct sample_kernfs_directory *sample_kernfs_create_dir(void)
+{
+ struct sample_kernfs_directory *dir;
+
+ dir = kzalloc(sizeof(struct sample_kernfs_directory), GFP_KERNEL);
+ if (!dir)
+ return NULL;
+
+ return dir;
+}
+
+static struct sample_kernfs_directory *kernfs_of_to_dir(struct kernfs_open_file *of)
+{
+ struct kernfs_node *dir_kn = kernfs_get_parent(of->kn);
+ struct sample_kernfs_directory *dir = dir_kn->priv;
+
+ /* kernfs_get_parent adds a reference; drop it with kernfs_put */
+ kernfs_put(dir_kn);
+
+ return dir;
+}
+
+static int sample_kernfs_counter_seq_show(struct seq_file *sf, void *v)
+{
+ struct kernfs_open_file *of = sf->private;
+ struct sample_kernfs_directory *counter_dir = kernfs_of_to_dir(of);
+ u64 count = atomic64_inc_return(&counter_dir->count);
+
+ seq_printf(sf, "%llu\n", count);
+
+ return 0;
+}
+
+static ssize_t sample_kernfs_counter_write(struct kernfs_open_file *of, char *buf,
+ size_t nbytes, loff_t off)
+{
+ struct sample_kernfs_directory *counter_dir = kernfs_of_to_dir(of);
+ int ret;
+ u64 new_value;
+
+ ret = kstrtou64(strstrip(buf), 10, &new_value);
+ if (ret)
+ return ret;
+
+ atomic64_set(&counter_dir->count, new_value);
+
+ return nbytes;
+}
+
+static struct kernfs_ops counter_kf_ops = {
+ .seq_show = sample_kernfs_counter_seq_show,
+ .write = sample_kernfs_counter_write,
+};
+
+static int sample_kernfs_add_file(struct kernfs_node *dir_kn, const char *name,
+ struct kernfs_ops *ops)
+{
+ struct kernfs_node *kn;
+
+ kn = __kernfs_create_file(dir_kn, name, 0666, current_fsuid(),
+ current_fsgid(), 0, ops, NULL, NULL, NULL);
+
+ if (IS_ERR(kn))
+ return PTR_ERR(kn);
+
+ return 0;
+}
+
+static int sample_kernfs_populate_dir(struct kernfs_node *dir_kn)
+{
+ int err;
+
+ err = sample_kernfs_add_file(dir_kn, "counter", &counter_kf_ops);
+ if (err)
+ return err;
+
+ return 0;
+}
+
static void sample_kernfs_fs_context_free(struct fs_context *fc)
{
struct kernfs_fs_context *kfc = fc->fs_private;
@@ -30,6 +117,7 @@ static const struct fs_context_operations sample_kernfs_fs_context_ops = {
static int sample_kernfs_init_fs_context(struct fs_context *fc)
{
struct kernfs_fs_context *kfc;
+ struct sample_kernfs_directory *root_dir;
struct kernfs_root *root;
int err;
@@ -37,10 +125,17 @@ static int sample_kernfs_init_fs_context(struct fs_context *fc)
if (!kfc)
return -ENOMEM;
- root = kernfs_create_root(NULL, 0, NULL);
+ root_dir = sample_kernfs_create_dir();
+ if (!root_dir) {
+ err = -ENOMEM;
+ goto err_free_kfc;
+ }
+
+ /* dir gets stored in root->priv so we can access it later. */
+ root = kernfs_create_root(NULL, 0, root_dir);
if (IS_ERR(root)) {
err = PTR_ERR(root);
- goto err_free_kfc;
+ goto err_free_dir;
}
kfc->root = root;
@@ -49,8 +144,16 @@ static int sample_kernfs_init_fs_context(struct fs_context *fc)
fc->ops = &sample_kernfs_fs_context_ops;
fc->global = true;
+ err = sample_kernfs_populate_dir(kernfs_root_to_node(root));
+ if (err)
+ goto err_free_root;
+
return 0;
+err_free_root:
+ kernfs_destroy_root(root);
+err_free_dir:
+ kfree(root_dir);
err_free_kfc:
kfree(kfc);
return err;
@@ -59,9 +162,12 @@ static int sample_kernfs_init_fs_context(struct fs_context *fc)
static void sample_kernfs_kill_sb(struct super_block *sb)
{
struct kernfs_root *root = kernfs_root_from_sb(sb);
+ struct kernfs_node *root_kn = kernfs_root_to_node(root);
+ struct sample_kernfs_directory *root_dir = root_kn->priv;
kernfs_kill_sb(sb);
kernfs_destroy_root(root);
+ kfree(root_dir);
}
static struct file_system_type sample_kernfs_fs_type = {
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 4/5] samples/kernfs: Allow creating and removing directories
2025-01-21 15:36 [PATCH 0/5] samples/kernfs: Add a pseudo-filesystem to demonstrate kernfs usage David Reaver
` (2 preceding siblings ...)
2025-01-21 15:46 ` [PATCH 3/5] samples/kernfs: Add counter file to each directory David Reaver
@ 2025-01-21 15:47 ` David Reaver
2025-01-21 15:47 ` [PATCH 5/5] samples/kernfs: Add inc file to allow changing counter increment David Reaver
2025-01-28 6:08 ` [PATCH 0/5] samples/kernfs: Add a pseudo-filesystem to demonstrate kernfs usage Christoph Hellwig
5 siblings, 0 replies; 16+ messages in thread
From: David Reaver @ 2025-01-21 15:47 UTC (permalink / raw)
To: Greg Kroah-Hartman, Tejun Heo
Cc: David Reaver, Steven Rostedt, Christian Brauner, Al Viro,
Jonathan Corbet, James Bottomley, Krister Johansen, linux-fsdevel
Users can mkdir and rmdir sample_kernfs directories, similar to how cgroups
are added and removed in the cgroup pseudo-filesystem. New directories
automatically get a counter file.
kernfs doesn't expose functions to traverse child nodes. We demonstrate how
to keep track of child nodes ourselves in sample_kernfs_directory.
Removing a directory is surprisingly tricky and can deadlock if you use
kernfs_remove() instead of kernfs_remove_self(), so a comment explains the
motivation for using kernfs_remove_self(). I also added a comment
explaining the lack of locking when manipulating the subdirs/children
lists.
Signed-off-by: David Reaver <me@davidreaver.com>
---
samples/kernfs/sample_kernfs.c | 94 ++++++++++++++++++++++++++++++++--
1 file changed, 91 insertions(+), 3 deletions(-)
diff --git a/samples/kernfs/sample_kernfs.c b/samples/kernfs/sample_kernfs.c
index b6d44fc3b935..e632b5f66924 100644
--- a/samples/kernfs/sample_kernfs.c
+++ b/samples/kernfs/sample_kernfs.c
@@ -17,9 +17,13 @@
/**
* struct sample_kernfs_directory - Represents a directory in the pseudo-filesystem
* @count: Holds the current count in the counter file.
+ * @subdirs: Holds the list of this directory's subdirectories.
+ * @siblings: Used to add this dir to parent's subdirs list.
*/
struct sample_kernfs_directory {
atomic64_t count;
+ struct list_head subdirs;
+ struct list_head siblings;
};
static struct sample_kernfs_directory *sample_kernfs_create_dir(void)
@@ -30,6 +34,9 @@ static struct sample_kernfs_directory *sample_kernfs_create_dir(void)
if (!dir)
return NULL;
+ INIT_LIST_HEAD(&dir->subdirs);
+ INIT_LIST_HEAD(&dir->siblings);
+
return dir;
}
@@ -101,6 +108,87 @@ static int sample_kernfs_populate_dir(struct kernfs_node *dir_kn)
return 0;
}
+static void sample_kernfs_remove_subtree(struct sample_kernfs_directory *dir)
+{
+ struct sample_kernfs_directory *child, *tmp;
+
+ /*
+ * Recursively remove children. This approach is acceptable for this
+ * sample since we expect the tree depth to remain small and manageable.
+ * For real-world filesystems, an iterative approach should be used to
+ * avoid stack overflows.
+ *
+ * Also, we could be more careful with locking our lists, but kernfs
+ * holds a tree-wide lock before calling our rmdir, so we should be
+ * safe.
+ */
+ list_for_each_entry_safe(child, tmp, &dir->subdirs, siblings) {
+ sample_kernfs_remove_subtree(child);
+ }
+
+ /* Remove this directory from its parent's subdirs list */
+ list_del(&dir->siblings);
+
+ kfree(dir);
+}
+
+static int sample_kernfs_mkdir(struct kernfs_node *parent_kn, const char *name, umode_t mode)
+{
+ struct kernfs_node *dir_kn;
+ struct sample_kernfs_directory *dir, *parent_dir;
+ int ret;
+
+ dir = sample_kernfs_create_dir();
+ if (!dir)
+ return -ENOMEM;
+
+ /* dir gets stored in dir_kn->priv so we can access it later. */
+ dir_kn = kernfs_create_dir_ns(parent_kn, name, mode, current_fsuid(),
+ current_fsgid(), dir, NULL);
+
+ if (IS_ERR(dir_kn)) {
+ ret = PTR_ERR(dir_kn);
+ goto err_free_dir;
+ }
+
+ ret = sample_kernfs_populate_dir(dir_kn);
+ if (ret)
+ goto err_free_dir_kn;
+
+ /* Add directory to parent->subdirs */
+ parent_dir = parent_kn->priv;
+ list_add(&dir->siblings, &parent_dir->subdirs);
+
+ return 0;
+
+err_free_dir_kn:
+ kernfs_remove(dir_kn);
+err_free_dir:
+ sample_kernfs_remove_subtree(dir);
+ return ret;
+}
+
+static int sample_kernfs_rmdir(struct kernfs_node *kn)
+{
+ struct sample_kernfs_directory *dir = kn->priv;
+
+ /*
+ * kernfs_remove_self avoids a deadlock by breaking active protection;
+ * see kernfs_break_active_protection(). This is required since
+ * kernfs_iop_rmdir() holds a tree-wide lock.
+ */
+ kernfs_remove_self(kn);
+
+ sample_kernfs_remove_subtree(dir);
+
+ return 0;
+}
+
+static struct kernfs_syscall_ops sample_kernfs_kf_syscall_ops = {
+ .mkdir = sample_kernfs_mkdir,
+ .rmdir = sample_kernfs_rmdir,
+};
+
static void sample_kernfs_fs_context_free(struct fs_context *fc)
{
struct kernfs_fs_context *kfc = fc->fs_private;
@@ -132,7 +220,7 @@ static int sample_kernfs_init_fs_context(struct fs_context *fc)
}
/* dir gets stored in root->priv so we can access it later. */
- root = kernfs_create_root(NULL, 0, root_dir);
+ root = kernfs_create_root(&sample_kernfs_kf_syscall_ops, 0, root_dir);
if (IS_ERR(root)) {
err = PTR_ERR(root);
goto err_free_dir;
@@ -153,7 +241,7 @@ static int sample_kernfs_init_fs_context(struct fs_context *fc)
err_free_root:
kernfs_destroy_root(root);
err_free_dir:
- kfree(root_dir);
+ sample_kernfs_remove_subtree(root_dir);
err_free_kfc:
kfree(kfc);
return err;
@@ -167,7 +255,7 @@ static void sample_kernfs_kill_sb(struct super_block *sb)
kernfs_kill_sb(sb);
kernfs_destroy_root(root);
- kfree(root_dir);
+ sample_kernfs_remove_subtree(root_dir);
}
static struct file_system_type sample_kernfs_fs_type = {
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 5/5] samples/kernfs: Add inc file to allow changing counter increment
2025-01-21 15:36 [PATCH 0/5] samples/kernfs: Add a pseudo-filesystem to demonstrate kernfs usage David Reaver
` (3 preceding siblings ...)
2025-01-21 15:47 ` [PATCH 4/5] samples/kernfs: Allow creating and removing directories David Reaver
@ 2025-01-21 15:47 ` David Reaver
2025-01-28 6:08 ` [PATCH 0/5] samples/kernfs: Add a pseudo-filesystem to demonstrate kernfs usage Christoph Hellwig
5 siblings, 0 replies; 16+ messages in thread
From: David Reaver @ 2025-01-21 15:47 UTC (permalink / raw)
To: Greg Kroah-Hartman, Tejun Heo
Cc: David Reaver, Steven Rostedt, Christian Brauner, Al Viro,
Jonathan Corbet, James Bottomley, Krister Johansen, linux-fsdevel
A file called inc is automatically added to sample_kernfs directories.
Users can read and write unsigned integers to this file. The value stored
in inc determines how much counter values are incremented every time they
are read.
Signed-off-by: David Reaver <me@davidreaver.com>
---
samples/kernfs/sample_kernfs.c | 42 +++++++++++++++++++++++++++++++++-
1 file changed, 41 insertions(+), 1 deletion(-)
diff --git a/samples/kernfs/sample_kernfs.c b/samples/kernfs/sample_kernfs.c
index e632b5f66924..3d1e7fb4ecc5 100644
--- a/samples/kernfs/sample_kernfs.c
+++ b/samples/kernfs/sample_kernfs.c
@@ -17,11 +17,13 @@
/**
* struct sample_kernfs_directory - Represents a directory in the pseudo-filesystem
* @count: Holds the current count in the counter file.
+ * @inc: Amount to increment count by. Value of inc file.
* @subdirs: Holds the list of this directory's subdirectories.
* @siblings: Used to add this dir to parent's subdirs list.
*/
struct sample_kernfs_directory {
atomic64_t count;
+ atomic64_t inc;
struct list_head subdirs;
struct list_head siblings;
};
@@ -34,6 +36,7 @@ static struct sample_kernfs_directory *sample_kernfs_create_dir(void)
if (!dir)
return NULL;
+ atomic64_set(&dir->inc, 1);
INIT_LIST_HEAD(&dir->subdirs);
INIT_LIST_HEAD(&dir->siblings);
@@ -55,7 +58,8 @@ static int sample_kernfs_counter_seq_show(struct seq_file *sf, void *v)
{
struct kernfs_open_file *of = sf->private;
struct sample_kernfs_directory *counter_dir = kernfs_of_to_dir(of);
- u64 count = atomic64_inc_return(&counter_dir->count);
+ u64 inc = atomic64_read(&counter_dir->inc);
+ u64 count = atomic64_add_return(inc, &counter_dir->count);
seq_printf(sf, "%llu\n", count);
@@ -83,6 +87,38 @@ static struct kernfs_ops counter_kf_ops = {
.write = sample_kernfs_counter_write,
};
+static int sample_kernfs_inc_seq_show(struct seq_file *sf, void *v)
+{
+ struct kernfs_open_file *of = sf->private;
+ struct sample_kernfs_directory *counter_dir = kernfs_of_to_dir(of);
+ u64 inc = atomic64_read(&counter_dir->inc);
+
+ seq_printf(sf, "%llu\n", inc);
+
+ return 0;
+}
+
+static ssize_t sample_kernfs_inc_write(struct kernfs_open_file *of, char *buf,
+ size_t nbytes, loff_t off)
+{
+ struct sample_kernfs_directory *counter_dir = kernfs_of_to_dir(of);
+ int ret;
+ u64 new_value;
+
+ ret = kstrtou64(strstrip(buf), 10, &new_value);
+ if (ret)
+ return ret;
+
+ atomic64_set(&counter_dir->inc, new_value);
+
+ return nbytes;
+}
+
+static struct kernfs_ops inc_kf_ops = {
+ .seq_show = sample_kernfs_inc_seq_show,
+ .write = sample_kernfs_inc_write,
+};
+
static int sample_kernfs_add_file(struct kernfs_node *dir_kn, const char *name,
struct kernfs_ops *ops)
{
@@ -105,6 +141,10 @@ static int sample_kernfs_populate_dir(struct kernfs_node *dir_kn)
if (err)
return err;
+ err = sample_kernfs_add_file(dir_kn, "inc", &inc_kf_ops);
+ if (err)
+ return err;
+
return 0;
}
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 0/5] samples/kernfs: Add a pseudo-filesystem to demonstrate kernfs usage
2025-01-21 15:36 [PATCH 0/5] samples/kernfs: Add a pseudo-filesystem to demonstrate kernfs usage David Reaver
` (4 preceding siblings ...)
2025-01-21 15:47 ` [PATCH 5/5] samples/kernfs: Add inc file to allow changing counter increment David Reaver
@ 2025-01-28 6:08 ` Christoph Hellwig
2025-01-28 15:27 ` Steven Rostedt
5 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2025-01-28 6:08 UTC (permalink / raw)
To: David Reaver
Cc: Greg Kroah-Hartman, Tejun Heo, Steven Rostedt, Christian Brauner,
Al Viro, Jonathan Corbet, James Bottomley, Krister Johansen,
linux-fsdevel
On Tue, Jan 21, 2025 at 07:36:34AM -0800, David Reaver wrote:
> This patch series creates a toy pseudo-filesystem built on top of kernfs in
> samples/kernfs/.
Is that a good idea? kernfs and the interactions with the users of it
is a pretty convoluted mess. I'd much prefer people writing their
pseudo file systems to the VFS APIs over spreading kernfs usage further.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/5] samples/kernfs: Add a pseudo-filesystem to demonstrate kernfs usage
2025-01-28 6:08 ` [PATCH 0/5] samples/kernfs: Add a pseudo-filesystem to demonstrate kernfs usage Christoph Hellwig
@ 2025-01-28 15:27 ` Steven Rostedt
2025-01-28 22:05 ` Linus Torvalds
0 siblings, 1 reply; 16+ messages in thread
From: Steven Rostedt @ 2025-01-28 15:27 UTC (permalink / raw)
To: Christoph Hellwig
Cc: David Reaver, Greg Kroah-Hartman, Tejun Heo, Christian Brauner,
Al Viro, Jonathan Corbet, James Bottomley, Krister Johansen,
linux-fsdevel, Linus Torvalds
On Mon, 27 Jan 2025 22:08:29 -0800
Christoph Hellwig <hch@infradead.org> wrote:
> On Tue, Jan 21, 2025 at 07:36:34AM -0800, David Reaver wrote:
> > This patch series creates a toy pseudo-filesystem built on top of kernfs in
> > samples/kernfs/.
>
> Is that a good idea? kernfs and the interactions with the users of it
> is a pretty convoluted mess. I'd much prefer people writing their
> pseudo file systems to the VFS APIs over spreading kernfs usage further.
I have to disagree with this. As someone that uses a pseudo file system to
interact with my subsystem, I really don't want to have to know the
intrinsics of the virtual file system layer just so I can interact via the
file system. Not knowing how to do that properly was what got me in trouble
with Linus is the first place.
The VFS layer is best for developing file systems that are for storage.
Like XFS, ext4, bcachefs, etc. And yes, if you are developing a new layout
of storage, then you should know the VFS APIs.
But pseudo file systems are a completely different beast. The files are not
for storage, but for control of the kernel. They map to control objects.
For tracefs, there's a "current_tracer". If you write "function" to it, it
starts the function tracer. It has to maintain state, but only for the life
of the boot, and not across boots. All of debugfs is the same way, and
unfortunately, the kernel API for debugfs is wrong. It uses dentries as the
handle to the files, which it should not be doing. dentry is a complex
internal cache element within VFS, and I assumed that because debugfs used
it, it was OK to use it as well, and that's where my arguments with Linus
stemmed from.
For people like myself that only need a way to have a control interface via
the file system, kernfs appears to cover that. Maybe kernfs isn't
implemented the way you like? If that's the case, we should fix that. But
from my point of view, it would be really great if I can create a file
system control interface without having to know anything about how VFS is
implemented.
BTW, I was going to work on converting debugfs over to kernfs if I ever got
the chance (or mentor someone else to do it). Whether it's kernfs or
something else, it would be really great to have a kernel abstraction layer
that creates a pseudo file system without having to create a pseudo file
system. debugfs was that, and became very popular, but it was done incorrectly.
-- Steve
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/5] samples/kernfs: Add a pseudo-filesystem to demonstrate kernfs usage
2025-01-28 15:27 ` Steven Rostedt
@ 2025-01-28 22:05 ` Linus Torvalds
2025-01-28 22:42 ` Steven Rostedt
0 siblings, 1 reply; 16+ messages in thread
From: Linus Torvalds @ 2025-01-28 22:05 UTC (permalink / raw)
To: Steven Rostedt
Cc: Christoph Hellwig, David Reaver, Greg Kroah-Hartman, Tejun Heo,
Christian Brauner, Al Viro, Jonathan Corbet, James Bottomley,
Krister Johansen, linux-fsdevel
On Tue, 28 Jan 2025 at 07:27, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Mon, 27 Jan 2025 22:08:29 -0800
> Christoph Hellwig <hch@infradead.org> wrote:
> >
> > Is that a good idea? kernfs and the interactions with the users of it
> > is a pretty convoluted mess. I'd much prefer people writing their
> > pseudo file systems to the VFS APIs over spreading kernfs usage further.
>
> I have to disagree with this. As someone that uses a pseudo file system to
> interact with my subsystem, I really don't want to have to know the
> intrinsics of the virtual file system layer just so I can interact via the
> file system. Not knowing how to do that properly was what got me in trouble
> with Linus is the first place.
Well, honestly, you were doing some odd things.
For a *simple* filesystem that actually acts as a filesystem, all you
need is in libfs with things like &simple_dir_operations etc.
And we have a *lot* of perfectly regular users of things like that.
Not like the ftrace mess that had very *non*-filesystem semantics with
separate lifetime confusion etc, and that tried to maintain a separate
notion of permissions etc.
To make matters worse, tracefs than had a completely different model
for events, and these interacted oddly in non-filesystem ways.
In other words, all the tracefs problems were self-inflicted, and a
lot of them were because you wanted to go behind the vfs layers back
because you had millions of nodes but didn't want to have millions of
inodes etc.
That's not normal.
I mean, you can pretty much literally look at ramfs:
fs/ramfs/inode.c
and it is a real example filesystem that does a lot of things, but
almost all of it is just using the direct vfs helpers (simple_lookup /
simple_link/ simple_rmdir etc etc). It plays *zero* games with
dentries.
Or look at fs/pstore.
Or any number of other examples.
And no, nobody should *EVER* look at the horror that is tracefs and eventfs.
Linus
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/5] samples/kernfs: Add a pseudo-filesystem to demonstrate kernfs usage
2025-01-28 22:05 ` Linus Torvalds
@ 2025-01-28 22:42 ` Steven Rostedt
2025-01-28 22:51 ` Tejun Heo
0 siblings, 1 reply; 16+ messages in thread
From: Steven Rostedt @ 2025-01-28 22:42 UTC (permalink / raw)
To: Linus Torvalds
Cc: Christoph Hellwig, David Reaver, Greg Kroah-Hartman, Tejun Heo,
Christian Brauner, Al Viro, Jonathan Corbet, James Bottomley,
Krister Johansen, linux-fsdevel
On Tue, 28 Jan 2025 14:05:05 -0800
Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
> Well, honestly, you were doing some odd things.
Some of those odd things were because of the use of the dentry as a handle,
which also required making an inode for every file. When the number of
event files blew up to 10s of thousands, that caused a lot of memory to be
used.
>
> For a *simple* filesystem that actually acts as a filesystem, all you
> need is in libfs with things like &simple_dir_operations etc.
>
> And we have a *lot* of perfectly regular users of things like that.
> Not like the ftrace mess that had very *non*-filesystem semantics with
> separate lifetime confusion etc, and that tried to maintain a separate
> notion of permissions etc.
I would also say that the proc file system is rather messy. But that's very
old and has a long history which probably built up its complexity.
>
> To make matters worse, tracefs than had a completely different model
> for events, and these interacted oddly in non-filesystem ways.
Ideally, I rather it not have done it that way. To save memory, since every
event in eventfs has the same files, it was better to just make a single
array that represents those files for every event. That saved over 20
megabytes per tracing instance.
>
> In other words, all the tracefs problems were self-inflicted, and a
> lot of them were because you wanted to go behind the vfs layers back
> because you had millions of nodes but didn't want to have millions of
> inodes etc.
>
> That's not normal.
>
> I mean, you can pretty much literally look at ramfs:
>
> fs/ramfs/inode.c
>
> and it is a real example filesystem that does a lot of things, but
> almost all of it is just using the direct vfs helpers (simple_lookup /
> simple_link/ simple_rmdir etc etc). It plays *zero* games with
> dentries.
It's also a storage file system. It's just that it stores to memory which
looks like it simply uses the page cache where it never needs to write it
to disk. It's not a good example for a control interface.
>
> Or look at fs/pstore.
Another storage device.
>
> Or any number of other examples.
>
> And no, nobody should *EVER* look at the horror that is tracefs and eventfs.
I believe kernfs is to cover control interfaces like sysfs and debugfs,
that actually changes kernel behavior when their files are written to. It's
also likely why procfs is such a mess because that too is a control
interface.
Yes, eventfs is "special", but tracefs could easily be converted to kernfs.
I believe Christian even wrote a POC that did that.
-- Steve
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/5] samples/kernfs: Add a pseudo-filesystem to demonstrate kernfs usage
2025-01-28 22:42 ` Steven Rostedt
@ 2025-01-28 22:51 ` Tejun Heo
2025-01-28 23:29 ` Steven Rostedt
0 siblings, 1 reply; 16+ messages in thread
From: Tejun Heo @ 2025-01-28 22:51 UTC (permalink / raw)
To: Steven Rostedt
Cc: Linus Torvalds, Christoph Hellwig, David Reaver,
Greg Kroah-Hartman, Christian Brauner, Al Viro, Jonathan Corbet,
James Bottomley, Krister Johansen, linux-fsdevel
On Tue, Jan 28, 2025 at 05:42:57PM -0500, Steven Rostedt wrote:
...
> I believe kernfs is to cover control interfaces like sysfs and debugfs,
> that actually changes kernel behavior when their files are written to. It's
> also likely why procfs is such a mess because that too is a control
> interface.
Just for context, kernfs is factored out from sysfs. One of the factors
which drove the design was memory overhead. On large systems (IIRC
especially with iSCSI), there can be a huge number of sysfs nodes and
allocating a dentry and inode pair for each file made some machines run out
of memory during boot, so sysfs implemented memory-backed filesystem store
which then made its interface to its users to depart from the VFS layer.
This requirement holds for cgroup too - there are systems with a *lot* of
cgroups and the associated interface files and we don't want to pin a dentry
and inode for all of them.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/5] samples/kernfs: Add a pseudo-filesystem to demonstrate kernfs usage
2025-01-28 22:51 ` Tejun Heo
@ 2025-01-28 23:29 ` Steven Rostedt
2025-01-28 23:38 ` Tejun Heo
0 siblings, 1 reply; 16+ messages in thread
From: Steven Rostedt @ 2025-01-28 23:29 UTC (permalink / raw)
To: Tejun Heo
Cc: Linus Torvalds, Christoph Hellwig, David Reaver,
Greg Kroah-Hartman, Christian Brauner, Al Viro, Jonathan Corbet,
James Bottomley, Krister Johansen, linux-fsdevel
On Tue, 28 Jan 2025 12:51:47 -1000
Tejun Heo <tj@kernel.org> wrote:
> Just for context, kernfs is factored out from sysfs. One of the factors
> which drove the design was memory overhead. On large systems (IIRC
> especially with iSCSI), there can be a huge number of sysfs nodes and
> allocating a dentry and inode pair for each file made some machines run out
> of memory during boot, so sysfs implemented memory-backed filesystem store
> which then made its interface to its users to depart from the VFS layer.
> This requirement holds for cgroup too - there are systems with a *lot* of
> cgroups and the associated interface files and we don't want to pin a dentry
> and inode for all of them.
>
Right. And going back to ramfs, it too has a dentry and inode for every
file that is created. Thus, if you have a lot of files, you'll have a lot
of memory dedicated to their dentry and inodes that will never be freed.
The ramfs_create() and ramfs_mkdir() both call ramfs_mknod() which does a
d_instantiate() and a dget() on the dentry so they are persistent until
they are deleted or a reboot happens.
What I did for eventfs, and what I believe kernfs does, is to create a
small descriptor to represent the control data and reference them like what
you would have on disk. That is, the control elements (like an trace event
descriptor) is really what is on "disk". When someone does an "ls" to the
pseudo file system, there needs to be a way for the VFS layer to query the
control structures like how a normal file system would query that data
stored on disk, and then let the VFS layer create the dentry and inodes
when referenced, and more importantly, free them when they are no longer
referenced and there's memory pressure.
I believe kernfs does the same thing. And my point is, it would be nice to
have an abstract layer that represent control descriptors that may be
around for the entirety of the boot (like trace events are) without needing
to pin a dentry and inode for each one of theses files. Currently, that
abstract layer is kernfs.
-- Steve
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/5] samples/kernfs: Add a pseudo-filesystem to demonstrate kernfs usage
2025-01-28 23:29 ` Steven Rostedt
@ 2025-01-28 23:38 ` Tejun Heo
2025-01-29 0:02 ` Steven Rostedt
0 siblings, 1 reply; 16+ messages in thread
From: Tejun Heo @ 2025-01-28 23:38 UTC (permalink / raw)
To: Steven Rostedt
Cc: Linus Torvalds, Christoph Hellwig, David Reaver,
Greg Kroah-Hartman, Christian Brauner, Al Viro, Jonathan Corbet,
James Bottomley, Krister Johansen, linux-fsdevel
On Tue, Jan 28, 2025 at 06:29:57PM -0500, Steven Rostedt wrote:
> What I did for eventfs, and what I believe kernfs does, is to create a
> small descriptor to represent the control data and reference them like what
> you would have on disk. That is, the control elements (like an trace event
> descriptor) is really what is on "disk". When someone does an "ls" to the
> pseudo file system, there needs to be a way for the VFS layer to query the
> control structures like how a normal file system would query that data
> stored on disk, and then let the VFS layer create the dentry and inodes
> when referenced, and more importantly, free them when they are no longer
> referenced and there's memory pressure.
Yeap, that's exactly what kernfs does.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/5] samples/kernfs: Add a pseudo-filesystem to demonstrate kernfs usage
2025-01-28 23:38 ` Tejun Heo
@ 2025-01-29 0:02 ` Steven Rostedt
2025-02-03 15:05 ` Greg Kroah-Hartman
0 siblings, 1 reply; 16+ messages in thread
From: Steven Rostedt @ 2025-01-29 0:02 UTC (permalink / raw)
To: Tejun Heo
Cc: Linus Torvalds, Christoph Hellwig, David Reaver,
Greg Kroah-Hartman, Christian Brauner, Al Viro, Jonathan Corbet,
James Bottomley, Krister Johansen, linux-fsdevel
On Tue, 28 Jan 2025 13:38:42 -1000
Tejun Heo <tj@kernel.org> wrote:
> On Tue, Jan 28, 2025 at 06:29:57PM -0500, Steven Rostedt wrote:
> > What I did for eventfs, and what I believe kernfs does, is to create a
> > small descriptor to represent the control data and reference them like what
> > you would have on disk. That is, the control elements (like an trace event
> > descriptor) is really what is on "disk". When someone does an "ls" to the
> > pseudo file system, there needs to be a way for the VFS layer to query the
> > control structures like how a normal file system would query that data
> > stored on disk, and then let the VFS layer create the dentry and inodes
> > when referenced, and more importantly, free them when they are no longer
> > referenced and there's memory pressure.
>
> Yeap, that's exactly what kernfs does.
And eventfs goes one step further. Because there's a full directory layout
that's identical for every event, it has a single descriptor for directory
and not for file. As there can be over 10 files per directory/event I
didn't want to waste even that memory. This is why I couldn't use kernfs
for eventfs, as I was able to still save a couple of megabytes by not
having the files have any descriptor representing them (besides a single
array for all events).
-- Steve
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/5] samples/kernfs: Add a pseudo-filesystem to demonstrate kernfs usage
2025-01-29 0:02 ` Steven Rostedt
@ 2025-02-03 15:05 ` Greg Kroah-Hartman
2025-02-03 16:12 ` David Reaver
0 siblings, 1 reply; 16+ messages in thread
From: Greg Kroah-Hartman @ 2025-02-03 15:05 UTC (permalink / raw)
To: Steven Rostedt
Cc: Tejun Heo, Linus Torvalds, Christoph Hellwig, David Reaver,
Christian Brauner, Al Viro, Jonathan Corbet, James Bottomley,
Krister Johansen, linux-fsdevel
On Tue, Jan 28, 2025 at 07:02:24PM -0500, Steven Rostedt wrote:
> On Tue, 28 Jan 2025 13:38:42 -1000
> Tejun Heo <tj@kernel.org> wrote:
>
> > On Tue, Jan 28, 2025 at 06:29:57PM -0500, Steven Rostedt wrote:
> > > What I did for eventfs, and what I believe kernfs does, is to create a
> > > small descriptor to represent the control data and reference them like what
> > > you would have on disk. That is, the control elements (like an trace event
> > > descriptor) is really what is on "disk". When someone does an "ls" to the
> > > pseudo file system, there needs to be a way for the VFS layer to query the
> > > control structures like how a normal file system would query that data
> > > stored on disk, and then let the VFS layer create the dentry and inodes
> > > when referenced, and more importantly, free them when they are no longer
> > > referenced and there's memory pressure.
> >
> > Yeap, that's exactly what kernfs does.
>
> And eventfs goes one step further. Because there's a full directory layout
> that's identical for every event, it has a single descriptor for directory
> and not for file. As there can be over 10 files per directory/event I
> didn't want to waste even that memory. This is why I couldn't use kernfs
> for eventfs, as I was able to still save a couple of megabytes by not
> having the files have any descriptor representing them (besides a single
> array for all events).
Ok, that's fine, but the original point of "are you sure you want to use
kernfs for anything other than what we have today" remains. It's only a
limited set of use cases that kernfs is good for, libfs is still the
best place to start out for a virtual filesystem. The fact that the
majority of our "fake" filesystems are using libfs and not kernfs is
semi-proof of that?
Or is it proof that kernfs is just too undocumented that no one wants to
move to it? I don't know, but adding samples like this really isn't the
answer to that, the answer would be moving an existing libfs
implementation to use kernfs and then that patch series would be the
example to follow for others.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/5] samples/kernfs: Add a pseudo-filesystem to demonstrate kernfs usage
2025-02-03 15:05 ` Greg Kroah-Hartman
@ 2025-02-03 16:12 ` David Reaver
0 siblings, 0 replies; 16+ messages in thread
From: David Reaver @ 2025-02-03 16:12 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: Steven Rostedt, Tejun Heo, Linus Torvalds, Christoph Hellwig,
Christian Brauner, Al Viro, Jonathan Corbet, James Bottomley,
Krister Johansen, linux-fsdevel
Greg Kroah-Hartman <gregkh@linuxfoundation.org> writes:
> On Tue, Jan 28, 2025 at 07:02:24PM -0500, Steven Rostedt wrote:
>>
>> And eventfs goes one step further. Because there's a full directory layout
>> that's identical for every event, it has a single descriptor for directory
>> and not for file. As there can be over 10 files per directory/event I
>> didn't want to waste even that memory. This is why I couldn't use kernfs
>> for eventfs, as I was able to still save a couple of megabytes by not
>> having the files have any descriptor representing them (besides a single
>> array for all events).
>
> Ok, that's fine, but the original point of "are you sure you want to use
> kernfs for anything other than what we have today" remains. It's only a
> limited set of use cases that kernfs is good for, libfs is still the
> best place to start out for a virtual filesystem. The fact that the
> majority of our "fake" filesystems are using libfs and not kernfs is
> semi-proof of that?
>
> Or is it proof that kernfs is just too undocumented that no one wants to
> move to it? I don't know, but adding samples like this really isn't the
> answer to that, the answer would be moving an existing libfs
> implementation to use kernfs and then that patch series would be the
> example to follow for others.
>
> thanks,
>
> greg k-h
Thanks for reviewing the patch, Greg!
I put this sample together with the idea that some documentation is
better than none. I researched how kernfs could be useful in tracefs and
debugfs, but I haven't looked deeply into other virtual filesystems, so
I may have overestimated how well kernfs fits other use cases. From this
discussion, I see that a real libfs-to-kernfs port would provide a
better understanding of kernfs' viability elsewhere and also serve as
documentation.
Thanks for the discussion, folks! I learned a lot from this thread.
Thanks,
David Reaver
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2025-02-03 16:12 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-21 15:36 [PATCH 0/5] samples/kernfs: Add a pseudo-filesystem to demonstrate kernfs usage David Reaver
2025-01-21 15:46 ` [PATCH 1/5] samples/kernfs: Adds boilerplate/README for sample_kernfs David Reaver
2025-01-21 15:46 ` [PATCH 2/5] samples/kernfs: Make filesystem mountable David Reaver
2025-01-21 15:46 ` [PATCH 3/5] samples/kernfs: Add counter file to each directory David Reaver
2025-01-21 15:47 ` [PATCH 4/5] samples/kernfs: Allow creating and removing directories David Reaver
2025-01-21 15:47 ` [PATCH 5/5] samples/kernfs: Add inc file to allow changing counter increment David Reaver
2025-01-28 6:08 ` [PATCH 0/5] samples/kernfs: Add a pseudo-filesystem to demonstrate kernfs usage Christoph Hellwig
2025-01-28 15:27 ` Steven Rostedt
2025-01-28 22:05 ` Linus Torvalds
2025-01-28 22:42 ` Steven Rostedt
2025-01-28 22:51 ` Tejun Heo
2025-01-28 23:29 ` Steven Rostedt
2025-01-28 23:38 ` Tejun Heo
2025-01-29 0:02 ` Steven Rostedt
2025-02-03 15:05 ` Greg Kroah-Hartman
2025-02-03 16:12 ` David Reaver
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox