* [PATCH RFC] fuse: check if system-wide io_uring is enabled
@ 2025-10-21 20:31 Bernd Schubert
2025-10-21 21:56 ` Jens Axboe
0 siblings, 1 reply; 4+ messages in thread
From: Bernd Schubert @ 2025-10-21 20:31 UTC (permalink / raw)
To: Miklos Szeredi, Jens Axboe
Cc: linux-fsdevel, io-uring, Pavel Begunkov, Joanne Koong,
Luis Henriques, Bernd Schubert
Add check_system_io_uring() to determine if system-wide io_uring is
available for a FUSE mount. This is useful because FUSE io_uring
can only be enabled if the system allows it. Main issue with
fuse-io-uring is that the mount point hangs until queues are
initialized. If system wide io-uring is disabled queues cannot
be initialized and the mount will hang till forcefully umounted.
Libfuse solves that by setting up the ring before replying
to FUSE_INIT, but we also have to consider other implementations
and might get easily missed in development.
When mount specifies user_id and group_id (e.g., via unprivileged
fusermount with s-bit) not equal 0, the permission check must use
the daemon's credentials, not the mount task's (root) credentials.
Otherwise io_uring_allowed() incorrectly allows io_uring due to
root's CAP_SYS_ADMIN capability.
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
---
fs/fuse/fuse_i.h | 3 +++
fs/fuse/inode.c | 45 ++++++++++++++++++++++++++++++++++++++-
include/linux/io_uring.h | 1 +
include/linux/io_uring/io_uring.h | 7 ++++++
io_uring/io_uring.c | 4 +++-
5 files changed, 58 insertions(+), 2 deletions(-)
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index c2f2a48156d6c52c8db87a5c092f51d1627deae9..d566e6d3fd19c0eb0d2ee384b734f3950e2e105a 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -907,6 +907,9 @@ struct fuse_conn {
/* Is synchronous FUSE_INIT allowed? */
unsigned int sync_init:1;
+ /* If system IO-uring possible */
+ unsigned int system_io_uring:1;
+
/* Use io_uring for communication */
unsigned int io_uring;
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index d1babf56f25470fcc08fe400467b3450e8b7464a..6dcbaec9b369c689bc423da64b95f16e38ac0311 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -25,6 +25,7 @@
#include <linux/sched.h>
#include <linux/exportfs.h>
#include <linux/posix_acl.h>
+#include <linux/io_uring/io_uring.h>
#include <linux/pid_namespace.h>
#include <uapi/linux/magic.h>
@@ -1519,7 +1520,7 @@ static struct fuse_init_args *fuse_new_init(struct fuse_mount *fm)
* This is just an information flag for fuse server. No need to check
* the reply - server is either sending IORING_OP_URING_CMD or not.
*/
- if (fuse_uring_enabled())
+ if (fm->fc->system_io_uring && fuse_uring_enabled())
flags |= FUSE_OVER_IO_URING;
ia->in.flags = flags;
@@ -1935,6 +1936,46 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
}
EXPORT_SYMBOL_GPL(fuse_fill_super_common);
+/* Check if system wide io-uring is enabled */
+static void check_system_io_uring(struct fuse_conn *fc, struct fuse_fs_context *ctx)
+{
+ struct cred *new_cred = NULL;
+ const struct cred *old_cred = NULL;
+ int allowed;
+
+ /*
+ * Mount might be from an unprivileged user using s-bit
+ * fusermount, the check if system wide io-uring is enabled
+ * needs to drop privileges
+ * then.
+ */
+ if (ctx->user_id.val != 0 && ctx->group_id.val != 0) {
+ new_cred = prepare_creds();
+ if (!new_cred)
+ return;
+
+ cap_clear(new_cred->cap_effective);
+ cap_clear(new_cred->cap_permitted);
+ cap_clear(new_cred->cap_inheritable);
+
+ if (ctx->user_id_present)
+ new_cred->uid = new_cred->euid = ctx->user_id;
+
+ if (ctx->group_id_present)
+ new_cred->gid = new_cred->egid = new_cred->fsgid = ctx->group_id;
+
+ old_cred = override_creds(new_cred);
+ }
+
+ allowed = io_uring_allowed();
+ fc->system_io_uring = io_uring_allowed() == 0;
+
+ if (old_cred)
+ revert_creds(old_cred);
+ if (new_cred)
+ put_cred(new_cred);
+}
+
static int fuse_fill_super(struct super_block *sb, struct fs_context *fsc)
{
struct fuse_fs_context *ctx = fsc->fs_private;
@@ -1962,6 +2003,8 @@ static int fuse_fill_super(struct super_block *sb, struct fs_context *fsc)
fm = get_fuse_mount_super(sb);
+ check_system_io_uring(fm->fc, ctx);
+
return fuse_send_init(fm);
}
diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h
index 85fe4e6b275c7de260ea9a8552b8e1c3e7f7e5ec..eaee221b1ed566fcba5a01885e6a4b9073026f93 100644
--- a/include/linux/io_uring.h
+++ b/include/linux/io_uring.h
@@ -12,6 +12,7 @@ void __io_uring_free(struct task_struct *tsk);
void io_uring_unreg_ringfd(void);
const char *io_uring_get_opcode(u8 opcode);
bool io_is_uring_fops(struct file *file);
+int io_uring_allowed(void);
static inline void io_uring_files_cancel(void)
{
diff --git a/include/linux/io_uring/io_uring.h b/include/linux/io_uring/io_uring.h
new file mode 100644
index 0000000000000000000000000000000000000000..a28d58ea218ff7cc7518a66bd37ece1eacee30fb
--- /dev/null
+++ b/include/linux/io_uring/io_uring.h
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef _LINUX_IO_URING_H
+#define _LINUX_IO_URING_H
+
+int io_uring_allowed(void);
+
+#endif
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 820ef05276667e74c259723bf9f3c605cf9d0505..52cb209d4c7499620ae5d8b7ad1362810e84821f 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -76,6 +76,7 @@
#include <trace/events/io_uring.h>
#include <uapi/linux/io_uring.h>
+#include <linux/io_uring/io_uring.h>
#include "io-wq.h"
@@ -3936,7 +3937,7 @@ static long io_uring_setup(u32 entries, struct io_uring_params __user *params)
return io_uring_create(entries, &p, params);
}
-static inline int io_uring_allowed(void)
+int io_uring_allowed(void)
{
int disabled = READ_ONCE(sysctl_io_uring_disabled);
kgid_t io_uring_group;
@@ -3957,6 +3958,7 @@ static inline int io_uring_allowed(void)
allowed_lsm:
return security_uring_allowed();
}
+EXPORT_SYMBOL_GPL(io_uring_allowed);
SYSCALL_DEFINE2(io_uring_setup, u32, entries,
struct io_uring_params __user *, params)
---
base-commit: 6548d364a3e850326831799d7e3ea2d7bb97ba08
change-id: 20251021-io-uring-fix-check-systemwide-io-uring-enable-f290e75be229
Best regards,
--
Bernd Schubert <bschubert@ddn.com>
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH RFC] fuse: check if system-wide io_uring is enabled
2025-10-21 20:31 [PATCH RFC] fuse: check if system-wide io_uring is enabled Bernd Schubert
@ 2025-10-21 21:56 ` Jens Axboe
2025-10-21 22:08 ` Bernd Schubert
0 siblings, 1 reply; 4+ messages in thread
From: Jens Axboe @ 2025-10-21 21:56 UTC (permalink / raw)
To: Bernd Schubert, Miklos Szeredi
Cc: linux-fsdevel, io-uring, Pavel Begunkov, Joanne Koong,
Luis Henriques
On 10/21/25 2:31 PM, Bernd Schubert wrote:
> Add check_system_io_uring() to determine if system-wide io_uring is
> available for a FUSE mount. This is useful because FUSE io_uring
> can only be enabled if the system allows it. Main issue with
> fuse-io-uring is that the mount point hangs until queues are
> initialized. If system wide io-uring is disabled queues cannot
> be initialized and the mount will hang till forcefully umounted.
> Libfuse solves that by setting up the ring before replying
> to FUSE_INIT, but we also have to consider other implementations
> and might get easily missed in development.
>
> When mount specifies user_id and group_id (e.g., via unprivileged
> fusermount with s-bit) not equal 0, the permission check must use
> the daemon's credentials, not the mount task's (root) credentials.
> Otherwise io_uring_allowed() incorrectly allows io_uring due to
> root's CAP_SYS_ADMIN capability.
Rather than need various heuristics, it'd be a lot better if asking for
fuse-io_uring would just not "hang" at mount time and be able to recover
better?
There are also other considerations that may mean that part of init will
fail, doesn't seem like the best idea to me to attempt to catch all of
this rather than just be able to gracefully handle errors at
initialization time.
--
Jens Axboe
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH RFC] fuse: check if system-wide io_uring is enabled
2025-10-21 21:56 ` Jens Axboe
@ 2025-10-21 22:08 ` Bernd Schubert
2025-10-21 22:13 ` Jens Axboe
0 siblings, 1 reply; 4+ messages in thread
From: Bernd Schubert @ 2025-10-21 22:08 UTC (permalink / raw)
To: Jens Axboe, Miklos Szeredi
Cc: linux-fsdevel, io-uring, Pavel Begunkov, Joanne Koong,
Luis Henriques
On 10/21/25 23:56, Jens Axboe wrote:
> On 10/21/25 2:31 PM, Bernd Schubert wrote:
>> Add check_system_io_uring() to determine if system-wide io_uring is
>> available for a FUSE mount. This is useful because FUSE io_uring
>> can only be enabled if the system allows it. Main issue with
>> fuse-io-uring is that the mount point hangs until queues are
>> initialized. If system wide io-uring is disabled queues cannot
>> be initialized and the mount will hang till forcefully umounted.
>> Libfuse solves that by setting up the ring before replying
>> to FUSE_INIT, but we also have to consider other implementations
>> and might get easily missed in development.
>>
>> When mount specifies user_id and group_id (e.g., via unprivileged
>> fusermount with s-bit) not equal 0, the permission check must use
>> the daemon's credentials, not the mount task's (root) credentials.
>> Otherwise io_uring_allowed() incorrectly allows io_uring due to
>> root's CAP_SYS_ADMIN capability.
>
> Rather than need various heuristics, it'd be a lot better if asking for
> fuse-io_uring would just not "hang" at mount time and be able to recover
> better?
We can consider this as well. Issue is that fuse has a limit on
background requests that is protected with a lock. And there is lock order
to handle. Initially I didn't have this hanging mount, until I handled
this background request limit in fuse-io-uring with the lock order.
I.e. when one switches from /dev/fuse read/write to io-uring lock order
changes.
A way to avoid that issue is to split the background request limit equally
between queues. Although I wouldn't like to do that before fallback
to other queues is possible - which brings its own discussion points
https://lore.kernel.org/r/20251003-reduced-nr-ring-queues_3-v2-0-742ff1a8fc58@ddn.com
>
> There are also other considerations that may mean that part of init will
> fail, doesn't seem like the best idea to me to attempt to catch all of
> this rather than just be able to gracefully handle errors at
> initialization time.
It is still doesn't seem to be right to me that fuse advertizes io-uring
in FUSE_INIT to the daemon, when system wide io-uring is disabled.
Thanks,
Bernd
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH RFC] fuse: check if system-wide io_uring is enabled
2025-10-21 22:08 ` Bernd Schubert
@ 2025-10-21 22:13 ` Jens Axboe
0 siblings, 0 replies; 4+ messages in thread
From: Jens Axboe @ 2025-10-21 22:13 UTC (permalink / raw)
To: Bernd Schubert, Miklos Szeredi
Cc: linux-fsdevel, io-uring, Pavel Begunkov, Joanne Koong,
Luis Henriques
> On 10/21/25 23:56, Jens Axboe wrote:
>> On 10/21/25 2:31 PM, Bernd Schubert wrote:
>>> Add check_system_io_uring() to determine if systee-wide io_uring is
>>> available for a FUSE mount. This is useful because FUSE io_uring
>>> can only be enabled if the system allows it. Main issue with
>>> fuse-io-uring is that the mount point hangs until queues are
>>> initialized. If system wide io-uring is disabled queues cannot
>>> be initialized and the mount will hang till forcefully umounted.
>>> Libfuse solves that by setting up the ring before replying
>>> to FUSE_INIT, but we also have to consider other implementations
>>> and might get easily missed in development.
>>>
>>> When mount specifies user_id and group_id (e.g., via unprivileged
>>> fusermount with s-bit) not equal 0, the permission check must use
>>> the daemon's credentials, not the mount task's (root) credentials.
>>> Otherwise io_uring_allowed() incorrectly allows io_uring due to
>>> root's CAP_SYS_ADMIN capability.
>>
>> Rather than need various heuristics, it'd be a lot better if asking for
>> fuse-io_uring would just not "hang" at mount time and be able to recover
>> better?
>
> We can consider this as well. Issue is that fuse has a limit on
> background requests that is protected with a lock. And there is lock order
> to handle. Initially I didn't have this hanging mount, until I handled
> this background request limit in fuse-io-uring with the lock order.
> I.e. when one switches from /dev/fuse read/write to io-uring lock order
> changes.
> A way to avoid that issue is to split the background request limit equally
> between queues. Although I wouldn't like to do that before fallback
> to other queues is possible - which brings its own discussion points
>
> https://lore.kernel.org/r/20251003-reduced-nr-ring-queues_3-v2-0-742ff1a8fc58@ddn.com
In any case, I do think it's just wrong to both need to add heuristics,
and then still not be able to catch all of the cases where limitations
prevent you from initializing without hanging. That does seem like the
crux of the issue to me, and this more of a work-around than anything
else.
>> There are also other considerations that may mean that part of init will
>> fail, doesn't seem like the best idea to me to attempt to catch all of
>> this rather than just be able to gracefully handle errors at
>> initialization time.
>
> It is still doesn't seem to be right to me that fuse advertizes io-uring
> in FUSE_INIT to the daemon, when system wide io-uring is disabled.
On the surface, I agree. But I don't think you can catch all the cases
anyway, or if you could, it'd be fragile and may change. And then it's
just a bit of false pretense. I'd just view it as a "the kernel supports
the feature", which is true.
--
Jens Axboe
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-10-21 22:13 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-21 20:31 [PATCH RFC] fuse: check if system-wide io_uring is enabled Bernd Schubert
2025-10-21 21:56 ` Jens Axboe
2025-10-21 22:08 ` Bernd Schubert
2025-10-21 22:13 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).