[PATCH v6 0/2] fuse: add timeout option for requests

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v6 0/2] fuse: add timeout option for requests
@ 2024-08-30 16:26 Joanne Koong
  2024-08-30 16:26 ` [PATCH v6 1/2] fuse: add optional kernel-enforced timeout " Joanne Koong
  2024-08-30 16:26 ` [PATCH v6 2/2] fuse: add default_request_timeout and max_request_timeout sysctls Joanne Koong
  0 siblings, 2 replies; 20+ messages in thread
From: Joanne Koong @ 2024-08-30 16:26 UTC (permalink / raw)
  To: miklos, linux-fsdevel
  Cc: josef, bernd.schubert, jefflexu, laoar.shao, kernel-team

There are situations where fuse servers can become unresponsive or
stuck, for example if the server is in a deadlock. Currently, there's
no good way to detect if a server is stuck and needs to be killed
manually.

This patchset adds a timeout option for requests where if the server does not
reply to the request by the time the timeout elapses, the connection will be
aborted. This patchset also adds two dynamically configurable fuse sysctls
"default_request_timeout" and "max_request_timeout" for controlling/enforcing
timeout behavior system-wide.

Existing systems running fuse servers will not be affected unless they
explicitly opt into the timeout.

v5:
https://lore.kernel.org/linux-fsdevel/20240826203234.4079338-1-joannelkoong@gmail.com/
Changes from v5 -> v6:
- Gate sysctl.o behind CONFIG_SYSCTL in makefile (kernel test robot)
- Reword/clarify last sentence in cover letter (Miklos)

v4:
https://lore.kernel.org/linux-fsdevel/20240813232241.2369855-1-joannelkoong@gmail.com/
Changes from v4 -> v5:
- Change timeout behavior from aborting request to aborting connection
  (Miklos)
- Clarify wording for sysctl documentation (Jingbo)

v3:
https://lore.kernel.org/linux-fsdevel/20240808190110.3188039-1-joannelkoong@gmail.com/
Changes from v3 -> v4:
- Fix wording on some comments to make it more clear
- Use simpler logic for timer (eg remove extra if checks, use mod timer API)
  (Josef)
- Sanity-check should be on FR_FINISHING not FR_FINISHED (Jingbo)
- Fix comment for "processing queue", add req->fpq = NULL safeguard  (Bernd)

v2:
https://lore.kernel.org/linux-fsdevel/20240730002348.3431931-1-joannelkoong@gmail.com/
Changes from v2 -> v3:
- Disarm / rearm timer in dev_do_read to handle race conditions (Bernrd)
- Disarm timer in error handling for fatal interrupt (Yafang)
- Clean up do_fuse_request_end (Jingbo)
- Add timer for notify retrieve requests 
- Fix kernel test robot errors for #define no-op functions

v1:
https://lore.kernel.org/linux-fsdevel/20240717213458.1613347-1-joannelkoong@gmail.com/
Changes from v1 -> v2:
- Add timeout for background requests
- Handle resend race condition
- Add sysctls


Joanne Koong (2):
  fuse: add optional kernel-enforced timeout for requests
  fuse: add default_request_timeout and max_request_timeout sysctls

 Documentation/admin-guide/sysctl/fs.rst | 31 ++++++++++++++++++
 fs/fuse/Makefile                        |  1 +
 fs/fuse/dev.c                           | 26 ++++++++++++++-
 fs/fuse/fuse_i.h                        | 24 ++++++++++++++
 fs/fuse/inode.c                         | 24 ++++++++++++++
 fs/fuse/sysctl.c                        | 42 +++++++++++++++++++++++++
 6 files changed, 147 insertions(+), 1 deletion(-)
 create mode 100644 fs/fuse/sysctl.c

-- 
2.43.5


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v6 1/2] fuse: add optional kernel-enforced timeout for requests
  2024-08-30 16:26 [PATCH v6 0/2] fuse: add timeout option for requests Joanne Koong
@ 2024-08-30 16:26 ` Joanne Koong
  2024-09-02 10:37   ` Miklos Szeredi
  2024-08-30 16:26 ` [PATCH v6 2/2] fuse: add default_request_timeout and max_request_timeout sysctls Joanne Koong
  1 sibling, 1 reply; 20+ messages in thread
From: Joanne Koong @ 2024-08-30 16:26 UTC (permalink / raw)
  To: miklos, linux-fsdevel
  Cc: josef, bernd.schubert, jefflexu, laoar.shao, kernel-team

There are situations where fuse servers can become unresponsive or
stuck, for example if the server is in a deadlock. Currently, there's
no good way to detect if a server is stuck and needs to be killed
manually.

This commit adds an option for enforcing a timeout (in seconds) on
requests where if the timeout elapses without a reply from the server,
the connection will be automatically aborted.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 fs/fuse/dev.c    | 26 +++++++++++++++++++++++++-
 fs/fuse/fuse_i.h |  8 ++++++++
 fs/fuse/inode.c  |  7 +++++++
 3 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 9eb191b5c4de..a4ec817074a2 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -40,6 +40,16 @@ static struct fuse_dev *fuse_get_dev(struct file *file)
 	return READ_ONCE(file->private_data);
 }
 
+static void fuse_request_timeout(struct timer_list *timer)
+{
+	struct fuse_req *req = container_of(timer, struct fuse_req, timer);
+	struct fuse_conn *fc = req->fm->fc;
+
+	req->timer.function = NULL;
+
+	fuse_abort_conn(fc);
+}
+
 static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
 {
 	INIT_LIST_HEAD(&req->list);
@@ -48,6 +58,8 @@ static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
 	refcount_set(&req->count, 1);
 	__set_bit(FR_PENDING, &req->flags);
 	req->fm = fm;
+	if (fm->fc->req_timeout)
+		timer_setup(&req->timer, fuse_request_timeout, 0);
 }
 
 static struct fuse_req *fuse_request_alloc(struct fuse_mount *fm, gfp_t flags)
@@ -283,6 +295,9 @@ void fuse_request_end(struct fuse_req *req)
 	struct fuse_conn *fc = fm->fc;
 	struct fuse_iqueue *fiq = &fc->iq;
 
+	if (req->timer.function)
+		timer_delete_sync(&req->timer);
+
 	if (test_and_set_bit(FR_FINISHED, &req->flags))
 		goto put_request;
 
@@ -393,6 +408,8 @@ static void request_wait_answer(struct fuse_req *req)
 		if (test_bit(FR_PENDING, &req->flags)) {
 			list_del(&req->list);
 			spin_unlock(&fiq->lock);
+			if (req->timer.function)
+				timer_delete_sync(&req->timer);
 			__fuse_put_request(req);
 			req->out.h.error = -EINTR;
 			return;
@@ -409,7 +426,8 @@ static void request_wait_answer(struct fuse_req *req)
 
 static void __fuse_request_send(struct fuse_req *req)
 {
-	struct fuse_iqueue *fiq = &req->fm->fc->iq;
+	struct fuse_conn *fc = req->fm->fc;
+	struct fuse_iqueue *fiq = &fc->iq;
 
 	BUG_ON(test_bit(FR_BACKGROUND, &req->flags));
 	spin_lock(&fiq->lock);
@@ -421,6 +439,8 @@ static void __fuse_request_send(struct fuse_req *req)
 		/* acquire extra reference, since request is still needed
 		   after fuse_request_end() */
 		__fuse_get_request(req);
+		if (req->timer.function)
+			mod_timer(&req->timer, jiffies + fc->req_timeout);
 		queue_request_and_unlock(fiq, req);
 
 		request_wait_answer(req);
@@ -539,6 +559,8 @@ static bool fuse_request_queue_background(struct fuse_req *req)
 		if (fc->num_background == fc->max_background)
 			fc->blocked = 1;
 		list_add_tail(&req->list, &fc->bg_queue);
+		if (req->timer.function)
+			mod_timer(&req->timer, jiffies + fc->req_timeout);
 		flush_bg_queue(fc);
 		queued = true;
 	}
@@ -594,6 +616,8 @@ static int fuse_simple_notify_reply(struct fuse_mount *fm,
 
 	spin_lock(&fiq->lock);
 	if (fiq->connected) {
+		if (req->timer.function)
+			mod_timer(&req->timer, jiffies + fm->fc->req_timeout);
 		queue_request_and_unlock(fiq, req);
 	} else {
 		err = -ENODEV;
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index f23919610313..97dacafa4289 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -435,6 +435,9 @@ struct fuse_req {
 
 	/** fuse_mount this request belongs to */
 	struct fuse_mount *fm;
+
+	/** timer for request replies, if timeout option is enabled */
+	struct timer_list timer;
 };
 
 struct fuse_iqueue;
@@ -574,6 +577,8 @@ struct fuse_fs_context {
 	enum fuse_dax_mode dax_mode;
 	unsigned int max_read;
 	unsigned int blksize;
+	/*  Request timeout (in seconds). 0 = no timeout (infinite wait) */
+	unsigned int req_timeout;
 	const char *subtype;
 
 	/* DAX device, may be NULL */
@@ -633,6 +638,9 @@ struct fuse_conn {
 	/** Constrain ->max_pages to this value during feature negotiation */
 	unsigned int max_pages_limit;
 
+	/* Request timeout (in jiffies). 0 = no timeout (infinite wait) */
+	unsigned long req_timeout;
+
 	/** Input queue */
 	struct fuse_iqueue iq;
 
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 99e44ea7d875..9e69006fc026 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -733,6 +733,7 @@ enum {
 	OPT_ALLOW_OTHER,
 	OPT_MAX_READ,
 	OPT_BLKSIZE,
+	OPT_REQUEST_TIMEOUT,
 	OPT_ERR
 };
 
@@ -747,6 +748,7 @@ static const struct fs_parameter_spec fuse_fs_parameters[] = {
 	fsparam_u32	("max_read",		OPT_MAX_READ),
 	fsparam_u32	("blksize",		OPT_BLKSIZE),
 	fsparam_string	("subtype",		OPT_SUBTYPE),
+	fsparam_u32	("request_timeout",	OPT_REQUEST_TIMEOUT),
 	{}
 };
 
@@ -830,6 +832,10 @@ static int fuse_parse_param(struct fs_context *fsc, struct fs_parameter *param)
 		ctx->blksize = result.uint_32;
 		break;
 
+	case OPT_REQUEST_TIMEOUT:
+		ctx->req_timeout = result.uint_32;
+		break;
+
 	default:
 		return -EINVAL;
 	}
@@ -1724,6 +1730,7 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
 	fc->group_id = ctx->group_id;
 	fc->legacy_opts_show = ctx->legacy_opts_show;
 	fc->max_read = max_t(unsigned int, 4096, ctx->max_read);
+	fc->req_timeout = ctx->req_timeout * HZ;
 	fc->destroy = ctx->destroy;
 	fc->no_control = ctx->no_control;
 	fc->no_force_umount = ctx->no_force_umount;
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v6 2/2] fuse: add default_request_timeout and max_request_timeout sysctls
  2024-08-30 16:26 [PATCH v6 0/2] fuse: add timeout option for requests Joanne Koong
  2024-08-30 16:26 ` [PATCH v6 1/2] fuse: add optional kernel-enforced timeout " Joanne Koong
@ 2024-08-30 16:26 ` Joanne Koong
  1 sibling, 0 replies; 20+ messages in thread
From: Joanne Koong @ 2024-08-30 16:26 UTC (permalink / raw)
  To: miklos, linux-fsdevel
  Cc: josef, bernd.schubert, jefflexu, laoar.shao, kernel-team,
	Bernd Schubert

Introduce two new sysctls, "default_request_timeout" and
"max_request_timeout". These control how long (in seconds) a server can
take to reply to a request. If the server does not reply by the timeout,
then the connection will be aborted.

"default_request_timeout" sets the default timeout if no timeout is
specified by the fuse server on mount. 0 (default) indicates no default
timeout should be enforced. If the server did set a timeout, then
default_request_timeout will be ignored.

"max_request_timeout" sets the upper bound on how long the server may
take to reply to a request. 0 (default) indicates no maximum timeout. If
the max_request_timeout is set and the fuse server attempts to set a
timeout greater than max_request_timeout, the system will default to
max_request_timeout. Similarly, if default_request_timeout is greater
than max_request_timeout, the system will default to
max_request_timeout. If the server does not request a timeout and
default_request_timeout is set to 0 but max_request_timeout is set, then
the timeout will be max_request_timeout.

$ sysctl -a | grep fuse
fs.fuse.default_request_timeout = 0
fs.fuse.max_request_timeout = 0

$ echo 0x100000000 | sudo tee /proc/sys/fs/fuse/default_request_timeout
tee: /proc/sys/fs/fuse/default_request_timeout: Invalid argument

$ echo 0xFFFFFFFF | sudo tee /proc/sys/fs/fuse/default_request_timeout
0xFFFFFFFF

$ sysctl -a | grep fuse
fs.fuse.default_request_timeout = 4294967295
fs.fuse.max_request_timeout = 0

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
---
 Documentation/admin-guide/sysctl/fs.rst | 31 ++++++++++++++++++
 fs/fuse/Makefile                        |  1 +
 fs/fuse/fuse_i.h                        | 16 ++++++++++
 fs/fuse/inode.c                         | 19 ++++++++++-
 fs/fuse/sysctl.c                        | 42 +++++++++++++++++++++++++
 5 files changed, 108 insertions(+), 1 deletion(-)
 create mode 100644 fs/fuse/sysctl.c

diff --git a/Documentation/admin-guide/sysctl/fs.rst b/Documentation/admin-guide/sysctl/fs.rst
index 47499a1742bd..3d5a2b5cbba0 100644
--- a/Documentation/admin-guide/sysctl/fs.rst
+++ b/Documentation/admin-guide/sysctl/fs.rst
@@ -332,3 +332,34 @@ Each "watch" costs roughly 90 bytes on a 32-bit kernel, and roughly 160 bytes
 on a 64-bit one.
 The current default value for ``max_user_watches`` is 4% of the
 available low memory, divided by the "watch" cost in bytes.
+
+5. /proc/sys/fs/fuse - Configuration options for FUSE filesystems
+=====================================================================
+
+This directory contains the following configuration options for FUSE
+filesystems:
+
+``/proc/sys/fs/fuse/default_request_timeout`` is a read/write file for
+setting/getting the default timeout (in seconds) for a fuse server to
+reply to a kernel-issued request in the event where the server did not
+specify a timeout at mount. If the server set a timeout,
+then default_request_timeout will be ignored.  The default
+"default_request_timeout" is set to 0. 0 indicates a no-op (eg
+requests will not have a default request timeout set if no timeout was
+specified by the server).
+
+``/proc/sys/fs/fuse/max_request_timeout`` is a read/write file for
+setting/getting the maximum timeout (in seconds) for a fuse server to
+reply to a kernel-issued request. A value greater than 0 automatically opts
+the server into a timeout that will be at most "max_request_timeout", even if
+the server did not specify a timeout and default_request_timeout is set to 0.
+If max_request_timeout is greater than 0 and the server set a timeout greater
+than max_request_timeout or default_request_timeout is set to greater than
+max_request_timeout, the system will use max_request_timeout as the timeout.
+0 indicates a no-op (eg requests will not have an upper bound on the timeout
+and if the server did not request a timeout and default_request_timeout was not
+set, there will be no timeout).
+
+Please note that for the timeout options, if the server does not respond to
+the request by the time the timeout elapses, then the connection to the fuse
+server will be aborted.
diff --git a/fs/fuse/Makefile b/fs/fuse/Makefile
index 6e0228c6d0cb..dabc852a7ff1 100644
--- a/fs/fuse/Makefile
+++ b/fs/fuse/Makefile
@@ -11,5 +11,6 @@ fuse-y := dev.o dir.o file.o inode.o control.o xattr.o acl.o readdir.o ioctl.o
 fuse-y += iomode.o
 fuse-$(CONFIG_FUSE_DAX) += dax.o
 fuse-$(CONFIG_FUSE_PASSTHROUGH) += passthrough.o
+fuse-$(CONFIG_SYSCTL) += sysctl.o
 
 virtiofs-y := virtio_fs.o
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 97dacafa4289..04daf366735d 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -47,6 +47,14 @@
 /** Number of dentries for each connection in the control filesystem */
 #define FUSE_CTL_NUM_DENTRIES 5
 
+/*
+ * Default timeout (in seconds) for the server to reply to a request
+ * if no timeout was specified on mount
+ */
+extern u32 fuse_default_req_timeout;
+/** Max timeout (in seconds) for the server to reply to a request */
+extern u32 fuse_max_req_timeout;
+
 /** List of active connections */
 extern struct list_head fuse_conn_list;
 
@@ -1480,4 +1488,12 @@ ssize_t fuse_passthrough_splice_write(struct pipe_inode_info *pipe,
 				      size_t len, unsigned int flags);
 ssize_t fuse_passthrough_mmap(struct file *file, struct vm_area_struct *vma);
 
+#ifdef CONFIG_SYSCTL
+int fuse_sysctl_register(void);
+void fuse_sysctl_unregister(void);
+#else
+static inline int fuse_sysctl_register(void) { return 0; }
+static inline void fuse_sysctl_unregister(void) { return; }
+#endif /* CONFIG_SYSCTL */
+
 #endif /* _FS_FUSE_I_H */
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 9e69006fc026..cf333448f2d3 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -35,6 +35,10 @@ DEFINE_MUTEX(fuse_mutex);
 
 static int set_global_limit(const char *val, const struct kernel_param *kp);
 
+/* default is no timeout */
+u32 fuse_default_req_timeout = 0;
+u32 fuse_max_req_timeout = 0;
+
 unsigned max_user_bgreq;
 module_param_call(max_user_bgreq, set_global_limit, param_get_uint,
 		  &max_user_bgreq, 0644);
@@ -1678,6 +1682,7 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
 	struct fuse_conn *fc = fm->fc;
 	struct inode *root;
 	struct dentry *root_dentry;
+	u32 req_timeout;
 	int err;
 
 	err = -EINVAL;
@@ -1730,10 +1735,16 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
 	fc->group_id = ctx->group_id;
 	fc->legacy_opts_show = ctx->legacy_opts_show;
 	fc->max_read = max_t(unsigned int, 4096, ctx->max_read);
-	fc->req_timeout = ctx->req_timeout * HZ;
 	fc->destroy = ctx->destroy;
 	fc->no_control = ctx->no_control;
 	fc->no_force_umount = ctx->no_force_umount;
+	req_timeout = ctx->req_timeout ?: fuse_default_req_timeout;
+	if (!fuse_max_req_timeout)
+		fc->req_timeout = req_timeout * HZ;
+	else if (!req_timeout)
+		fc->req_timeout = fuse_max_req_timeout * HZ;
+	else
+		fc->req_timeout = min(req_timeout, fuse_max_req_timeout) * HZ;
 
 	err = -ENOMEM;
 	root = fuse_get_root_inode(sb, ctx->rootmode);
@@ -2046,8 +2057,14 @@ static int __init fuse_fs_init(void)
 	if (err)
 		goto out3;
 
+	err = fuse_sysctl_register();
+	if (err)
+		goto out4;
+
 	return 0;
 
+ out4:
+	unregister_filesystem(&fuse_fs_type);
  out3:
 	unregister_fuseblk();
  out2:
diff --git a/fs/fuse/sysctl.c b/fs/fuse/sysctl.c
new file mode 100644
index 000000000000..c87bb0ecbfa9
--- /dev/null
+++ b/fs/fuse/sysctl.c
@@ -0,0 +1,42 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+* linux/fs/fuse/fuse_sysctl.c
+*
+* Sysctl interface to fuse parameters
+*/
+#include <linux/sysctl.h>
+
+#include "fuse_i.h"
+
+static struct ctl_table_header *fuse_table_header;
+
+static struct ctl_table fuse_sysctl_table[] = {
+	{
+		.procname	= "default_request_timeout",
+		.data		= &fuse_default_req_timeout,
+		.maxlen		= sizeof(fuse_default_req_timeout),
+		.mode		= 0644,
+		.proc_handler	= proc_douintvec,
+	},
+	{
+		.procname	= "max_request_timeout",
+		.data		= &fuse_max_req_timeout,
+		.maxlen		= sizeof(fuse_max_req_timeout),
+		.mode		= 0644,
+		.proc_handler	= proc_douintvec,
+	},
+};
+
+int fuse_sysctl_register(void)
+{
+	fuse_table_header = register_sysctl("fs/fuse", fuse_sysctl_table);
+	if (!fuse_table_header)
+		return -ENOMEM;
+	return 0;
+}
+
+void fuse_sysctl_unregister(void)
+{
+	unregister_sysctl_table(fuse_table_header);
+	fuse_table_header = NULL;
+}
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 1/2] fuse: add optional kernel-enforced timeout for requests
  2024-08-30 16:26 ` [PATCH v6 1/2] fuse: add optional kernel-enforced timeout " Joanne Koong
@ 2024-09-02 10:37   ` Miklos Szeredi
  2024-09-02 10:50     ` Bernd Schubert
                       ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Miklos Szeredi @ 2024-09-02 10:37 UTC (permalink / raw)
  To: Joanne Koong
  Cc: linux-fsdevel, josef, bernd.schubert, jefflexu, laoar.shao,
	kernel-team

On Fri, 30 Aug 2024 at 18:27, Joanne Koong <joannelkoong@gmail.com> wrote:
>
> There are situations where fuse servers can become unresponsive or
> stuck, for example if the server is in a deadlock. Currently, there's
> no good way to detect if a server is stuck and needs to be killed
> manually.
>
> This commit adds an option for enforcing a timeout (in seconds) on
> requests where if the timeout elapses without a reply from the server,
> the connection will be automatically aborted.

Okay.

I'm not sure what the overhead (scheduling and memory) of timers, but
starting one for each request seems excessive.

Can we make the timeout per-connection instead of per request?

I.e. When the first request is sent, the timer is started. When a
reply is received but there are still outstanding requests, the timer
is reset.  When the last reply is received, the timer is stopped.

This should handle the frozen server case just as well.  It may not
perfectly handle the case when the server is still alive but for some
reason one or more requests get stuck, while others are still being
processed.   The latter case is unlikely to be an issue in practice,
IMO.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 1/2] fuse: add optional kernel-enforced timeout for requests
  2024-09-02 10:37   ` Miklos Szeredi
@ 2024-09-02 10:50     ` Bernd Schubert
  2024-09-02 11:11       ` Miklos Szeredi
  2024-09-03 17:25     ` Joanne Koong
  2024-09-27 19:36     ` Joanne Koong
  2 siblings, 1 reply; 20+ messages in thread
From: Bernd Schubert @ 2024-09-02 10:50 UTC (permalink / raw)
  To: Miklos Szeredi, Joanne Koong
  Cc: linux-fsdevel, josef, jefflexu, laoar.shao, kernel-team



On 9/2/24 12:37, Miklos Szeredi wrote:
> On Fri, 30 Aug 2024 at 18:27, Joanne Koong <joannelkoong@gmail.com> wrote:
>>
>> There are situations where fuse servers can become unresponsive or
>> stuck, for example if the server is in a deadlock. Currently, there's
>> no good way to detect if a server is stuck and needs to be killed
>> manually.
>>
>> This commit adds an option for enforcing a timeout (in seconds) on
>> requests where if the timeout elapses without a reply from the server,
>> the connection will be automatically aborted.
> 
> Okay.
> 
> I'm not sure what the overhead (scheduling and memory) of timers, but
> starting one for each request seems excessive.
> 
> Can we make the timeout per-connection instead of per request?
> 
> I.e. When the first request is sent, the timer is started. When a
> reply is received but there are still outstanding requests, the timer
> is reset.  When the last reply is received, the timer is stopped.
> 
> This should handle the frozen server case just as well.  It may not
> perfectly handle the case when the server is still alive but for some
> reason one or more requests get stuck, while others are still being
> processed.   The latter case is unlikely to be an issue in practice,
> IMO.

In case of distributed servers, it can easily happen that one server has
an issue, while other servers still process requests. Especially when
these are just requests that read/getattr/etc and do not write, i.e.
accessing the stuck server is not needed by other servers. So in my
opinion not so unlikely. Although for such cases not difficult to
timeout within the fuse server.


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 1/2] fuse: add optional kernel-enforced timeout for requests
  2024-09-02 10:50     ` Bernd Schubert
@ 2024-09-02 11:11       ` Miklos Szeredi
  0 siblings, 0 replies; 20+ messages in thread
From: Miklos Szeredi @ 2024-09-02 11:11 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Joanne Koong, linux-fsdevel, josef, jefflexu, laoar.shao,
	kernel-team

On Mon, 2 Sept 2024 at 12:50, Bernd Schubert <bernd.schubert@fastmail.fm> wrote:

> In case of distributed servers, it can easily happen that one server has
> an issue, while other servers still process requests. Especially when
> these are just requests that read/getattr/etc and do not write, i.e.
> accessing the stuck server is not needed by other servers. So in my
> opinion not so unlikely. Although for such cases not difficult to
> timeout within the fuse server.

Exactly.  Normally the kernel should not need to time out fuse
requests, and it might be actively detrimental to do so.

The main case this wants to solve is a deadlocked server due to
programming error, AFAICS.   And this would only work in environments
where requests are  guaranteed to complete within some time period.

So if the server needs to handle request timeouts, then it should
*not* rely on the kernel timeout.   The kernel timeout should be a
safeguard against broken or malicious servers, not an aid to implement
request timeouts.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 1/2] fuse: add optional kernel-enforced timeout for requests
  2024-09-02 10:37   ` Miklos Szeredi
  2024-09-02 10:50     ` Bernd Schubert
@ 2024-09-03 17:25     ` Joanne Koong
  2024-09-03 22:37       ` Bernd Schubert
  2024-09-27 19:36     ` Joanne Koong
  2 siblings, 1 reply; 20+ messages in thread
From: Joanne Koong @ 2024-09-03 17:25 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: linux-fsdevel, josef, bernd.schubert, jefflexu, laoar.shao,
	kernel-team

On Mon, Sep 2, 2024 at 3:38 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Fri, 30 Aug 2024 at 18:27, Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > There are situations where fuse servers can become unresponsive or
> > stuck, for example if the server is in a deadlock. Currently, there's
> > no good way to detect if a server is stuck and needs to be killed
> > manually.
> >
> > This commit adds an option for enforcing a timeout (in seconds) on
> > requests where if the timeout elapses without a reply from the server,
> > the connection will be automatically aborted.
>
> Okay.
>
> I'm not sure what the overhead (scheduling and memory) of timers, but
> starting one for each request seems excessive.
>
> Can we make the timeout per-connection instead of per request?
>
> I.e. When the first request is sent, the timer is started. When a
> reply is received but there are still outstanding requests, the timer
> is reset.  When the last reply is received, the timer is stopped.
>
> This should handle the frozen server case just as well.  It may not
> perfectly handle the case when the server is still alive but for some
> reason one or more requests get stuck, while others are still being
> processed.   The latter case is unlikely to be an issue in practice,
> IMO.

In that case, if the timeout is per-connection instead of per-request
and we're not stringent about some requests getting stuck, maybe it
makes more sense to just do this in userspace (libfuse) then? That
seems pretty simple with having a watchdog thread that periodically
(according to whatever specified timeout) checks if the number of
requests serviced is increasing when
/sys/fs/fuse/connections/*/waiting is non-zero.

If there are multiple server threads (eg libfuse's fuse_loop_mt
interface) and say, all of them are deadlocked except for 1 that is
actively servicing requests, then this wouldn't catch that case, but
even if this per-connection timeout was enforced in the kernel
instead, it wouldn't catch that case either.

So maybe this logic should just be moved to libfuse then? For this
we'd need to pass the connection's device id (fc->dev) to userspace
which i don't think we currently do, but that seems pretty simple. The
one downside I see is that this doesn't let sysadmins enforce an
automatic system-wide "max timeout" against malicious fuse servers but
if we are having the timeout be per-connection instead of per-request,
then a malicious server could still be malicious anyways (eg
deadlocking all threads except for 1).

Curious to hear what your and Bernrd's thoughts on this are.

Thanks,
Joanne
>
> Thanks,
> Miklos

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 1/2] fuse: add optional kernel-enforced timeout for requests
  2024-09-03 17:25     ` Joanne Koong
@ 2024-09-03 22:37       ` Bernd Schubert
  2024-09-04 17:23         ` Joanne Koong
  0 siblings, 1 reply; 20+ messages in thread
From: Bernd Schubert @ 2024-09-03 22:37 UTC (permalink / raw)
  To: Joanne Koong, Miklos Szeredi
  Cc: linux-fsdevel, josef, jefflexu, laoar.shao, kernel-team



On 9/3/24 19:25, Joanne Koong wrote:
> On Mon, Sep 2, 2024 at 3:38 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
>>
>> On Fri, 30 Aug 2024 at 18:27, Joanne Koong <joannelkoong@gmail.com> wrote:
>>>
>>> There are situations where fuse servers can become unresponsive or
>>> stuck, for example if the server is in a deadlock. Currently, there's
>>> no good way to detect if a server is stuck and needs to be killed
>>> manually.
>>>
>>> This commit adds an option for enforcing a timeout (in seconds) on
>>> requests where if the timeout elapses without a reply from the server,
>>> the connection will be automatically aborted.
>>
>> Okay.
>>
>> I'm not sure what the overhead (scheduling and memory) of timers, but
>> starting one for each request seems excessive.
>>
>> Can we make the timeout per-connection instead of per request?
>>
>> I.e. When the first request is sent, the timer is started. When a
>> reply is received but there are still outstanding requests, the timer
>> is reset.  When the last reply is received, the timer is stopped.
>>
>> This should handle the frozen server case just as well.  It may not
>> perfectly handle the case when the server is still alive but for some
>> reason one or more requests get stuck, while others are still being
>> processed.   The latter case is unlikely to be an issue in practice,
>> IMO.
> 
> In that case, if the timeout is per-connection instead of per-request
> and we're not stringent about some requests getting stuck, maybe it
> makes more sense to just do this in userspace (libfuse) then? That
> seems pretty simple with having a watchdog thread that periodically
> (according to whatever specified timeout) checks if the number of
> requests serviced is increasing when
> /sys/fs/fuse/connections/*/waiting is non-zero.
> 
> If there are multiple server threads (eg libfuse's fuse_loop_mt
> interface) and say, all of them are deadlocked except for 1 that is
> actively servicing requests, then this wouldn't catch that case, but
> even if this per-connection timeout was enforced in the kernel
> instead, it wouldn't catch that case either.
> 
> So maybe this logic should just be moved to libfuse then? For this
> we'd need to pass the connection's device id (fc->dev) to userspace
> which i don't think we currently do, but that seems pretty simple. The
> one downside I see is that this doesn't let sysadmins enforce an
> automatic system-wide "max timeout" against malicious fuse servers but
> if we are having the timeout be per-connection instead of per-request,
> then a malicious server could still be malicious anyways (eg
> deadlocking all threads except for 1).
> 
> Curious to hear what your and Bernrd's thoughts on this are.


I have question here, does it need to be an exact timeout or could it be
an interval/epoch? Let's say you timeout based on epoch lists? Example

1) epoch-a starts, requests are added to epoch-a list.
2) epoch-b starts, epoch-a list should get empty
3) epoch-c starts, epoch-b list should get empty, kill the connection if
epoch-a list is not empty (epoch-c list should not be needed, as epoch-a
list can be used, once confirmed it is empty.)


Here timeout would be epoch-a + epoch-b, i.e.
max-timeout <= 2 * epoch-time.
We could have more epochs/list-heads to make it more fine grained.


From my point of view that should be a rather cheap, as it just
adding/removing requests from list and checking for timeout if a list is
empty. With the caveat that it is not precise anymore.

That could be implemented in kernel and/or libfuse?


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 1/2] fuse: add optional kernel-enforced timeout for requests
  2024-09-03 22:37       ` Bernd Schubert
@ 2024-09-04 17:23         ` Joanne Koong
  2024-09-17 22:00           ` Bernd Schubert
  0 siblings, 1 reply; 20+ messages in thread
From: Joanne Koong @ 2024-09-04 17:23 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Miklos Szeredi, linux-fsdevel, josef, jefflexu, laoar.shao,
	kernel-team

On Tue, Sep 3, 2024 at 3:38 PM Bernd Schubert
<bernd.schubert@fastmail.fm> wrote:
>
> On 9/3/24 19:25, Joanne Koong wrote:
> > On Mon, Sep 2, 2024 at 3:38 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >>
> >> On Fri, 30 Aug 2024 at 18:27, Joanne Koong <joannelkoong@gmail.com> wrote:
> >>>
> >>> There are situations where fuse servers can become unresponsive or
> >>> stuck, for example if the server is in a deadlock. Currently, there's
> >>> no good way to detect if a server is stuck and needs to be killed
> >>> manually.
> >>>
> >>> This commit adds an option for enforcing a timeout (in seconds) on
> >>> requests where if the timeout elapses without a reply from the server,
> >>> the connection will be automatically aborted.
> >>
> >> Okay.
> >>
> >> I'm not sure what the overhead (scheduling and memory) of timers, but
> >> starting one for each request seems excessive.
> >>
> >> Can we make the timeout per-connection instead of per request?
> >>
> >> I.e. When the first request is sent, the timer is started. When a
> >> reply is received but there are still outstanding requests, the timer
> >> is reset.  When the last reply is received, the timer is stopped.
> >>
> >> This should handle the frozen server case just as well.  It may not
> >> perfectly handle the case when the server is still alive but for some
> >> reason one or more requests get stuck, while others are still being
> >> processed.   The latter case is unlikely to be an issue in practice,
> >> IMO.
> >
> > In that case, if the timeout is per-connection instead of per-request
> > and we're not stringent about some requests getting stuck, maybe it
> > makes more sense to just do this in userspace (libfuse) then? That
> > seems pretty simple with having a watchdog thread that periodically
> > (according to whatever specified timeout) checks if the number of
> > requests serviced is increasing when
> > /sys/fs/fuse/connections/*/waiting is non-zero.
> >
> > If there are multiple server threads (eg libfuse's fuse_loop_mt
> > interface) and say, all of them are deadlocked except for 1 that is
> > actively servicing requests, then this wouldn't catch that case, but
> > even if this per-connection timeout was enforced in the kernel
> > instead, it wouldn't catch that case either.
> >
> > So maybe this logic should just be moved to libfuse then? For this
> > we'd need to pass the connection's device id (fc->dev) to userspace
> > which i don't think we currently do, but that seems pretty simple. The
> > one downside I see is that this doesn't let sysadmins enforce an
> > automatic system-wide "max timeout" against malicious fuse servers but
> > if we are having the timeout be per-connection instead of per-request,
> > then a malicious server could still be malicious anyways (eg
> > deadlocking all threads except for 1).
> >
> > Curious to hear what your and Bernrd's thoughts on this are.
>
>
> I have question here, does it need to be an exact timeout or could it be
> an interval/epoch? Let's say you timeout based on epoch lists? Example
>
> 1) epoch-a starts, requests are added to epoch-a list.
> 2) epoch-b starts, epoch-a list should get empty
> 3) epoch-c starts, epoch-b list should get empty, kill the connection if
> epoch-a list is not empty (epoch-c list should not be needed, as epoch-a
> list can be used, once confirmed it is empty.)
>
>
> Here timeout would be epoch-a + epoch-b, i.e.
> max-timeout <= 2 * epoch-time.
> We could have more epochs/list-heads to make it more fine grained.
>
>
> From my point of view that should be a rather cheap, as it just
> adding/removing requests from list and checking for timeout if a list is
> empty. With the caveat that it is not precise anymore.

I like this idea a lot. I like that it enforces per-request behavior
and guarantees that any stalled request will abort the connection. I
think it's fine for the timeout to be an interval/epoch so long as the
documentation explicitly makes that clear. I think this would need to
be done in the kernel instead of libfuse because if the server is in a
deadlock when there are no pending requests in the lists and then the
kernel sends requests to the server, none of the requests will make it
to the list for the timer handler to detect any issues.

Before I make this change for v7, Miklos what are your thoughts on
this direction?

Thanks,
Joanne

>
> That could be implemented in kernel and/or libfuse?
>
>
> Thanks,
> Bernd

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 1/2] fuse: add optional kernel-enforced timeout for requests
  2024-09-04 17:23         ` Joanne Koong
@ 2024-09-17 22:00           ` Bernd Schubert
  2024-09-18  7:29             ` Miklos Szeredi
  0 siblings, 1 reply; 20+ messages in thread
From: Bernd Schubert @ 2024-09-17 22:00 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Miklos Szeredi, linux-fsdevel, josef, jefflexu, laoar.shao,
	kernel-team, Jakob Blomer



Hi Joanne,

On 9/4/24 19:23, Joanne Koong wrote:
> On Tue, Sep 3, 2024 at 3:38 PM Bernd Schubert

>>
>>
>> I have question here, does it need to be an exact timeout or could it be
>> an interval/epoch? Let's say you timeout based on epoch lists? Example
>>
>> 1) epoch-a starts, requests are added to epoch-a list.
>> 2) epoch-b starts, epoch-a list should get empty
>> 3) epoch-c starts, epoch-b list should get empty, kill the connection if
>> epoch-a list is not empty (epoch-c list should not be needed, as epoch-a
>> list can be used, once confirmed it is empty.)
>>
>>
>> Here timeout would be epoch-a + epoch-b, i.e.
>> max-timeout <= 2 * epoch-time.
>> We could have more epochs/list-heads to make it more fine grained.
>>
>>
>> From my point of view that should be a rather cheap, as it just
>> adding/removing requests from list and checking for timeout if a list is
>> empty. With the caveat that it is not precise anymore.
> 
> I like this idea a lot. I like that it enforces per-request behavior
> and guarantees that any stalled request will abort the connection. I
> think it's fine for the timeout to be an interval/epoch so long as the
> documentation explicitly makes that clear. I think this would need to
> be done in the kernel instead of libfuse because if the server is in a
> deadlock when there are no pending requests in the lists and then the
> kernel sends requests to the server, none of the requests will make it
> to the list for the timer handler to detect any issues.
> 
> Before I make this change for v7, Miklos what are your thoughts on
> this direction?

we briefly discussed it with Miklos and Miklos agreed that epoch list
should be fine (would be great if you could quickly confirm, Miklos).

In the mean time I have another use case for timeout lists. Basically
Jakob from Cern (in CC) is asking for way to stop requests to
fuse-server and then to resume. I think that can be done easily through
notifications and unsetting (and later setting) fc->initialized. Demo
patch follows around tomorrow, but then Jakob actually wants to know
when it is safe to restart fuse-server (or part of it). That is where
the epoch timeout list would be handy - reply to the notification should
happen when the lists got empty, i.e. no request is handled anymore.
I think like this is better than FUSE_NOTIFY_RESEND, as that has an
issue with non-idempotent requests.


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 1/2] fuse: add optional kernel-enforced timeout for requests
  2024-09-17 22:00           ` Bernd Schubert
@ 2024-09-18  7:29             ` Miklos Szeredi
  2024-09-18  9:12               ` Bernd Schubert
  0 siblings, 1 reply; 20+ messages in thread
From: Miklos Szeredi @ 2024-09-18  7:29 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Joanne Koong, linux-fsdevel, josef, jefflexu, laoar.shao,
	kernel-team, Jakob Blomer

On Wed, 18 Sept 2024 at 00:00, Bernd Schubert
<bernd.schubert@fastmail.fm> wrote:

> > I like this idea a lot. I like that it enforces per-request behavior
> > and guarantees that any stalled request will abort the connection. I
> > think it's fine for the timeout to be an interval/epoch so long as the
> > documentation explicitly makes that clear. I think this would need to
> > be done in the kernel instead of libfuse because if the server is in a
> > deadlock when there are no pending requests in the lists and then the
> > kernel sends requests to the server, none of the requests will make it
> > to the list for the timer handler to detect any issues.
> >
> > Before I make this change for v7, Miklos what are your thoughts on
> > this direction?

I have no problem with epochs, but there are scalability issue with
shared lists.  So I'm not sure.  Maybe the individual timers would be
the best solution but I don't know the overhead and the scalability of
that solution either.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 1/2] fuse: add optional kernel-enforced timeout for requests
  2024-09-18  7:29             ` Miklos Szeredi
@ 2024-09-18  9:12               ` Bernd Schubert
  0 siblings, 0 replies; 20+ messages in thread
From: Bernd Schubert @ 2024-09-18  9:12 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Joanne Koong, linux-fsdevel, josef, jefflexu, laoar.shao,
	kernel-team, Jakob Blomer

On September 18, 2024 9:29:44 AM GMT+02:00, Miklos Szeredi <miklos@szeredi.hu> wrote:
>On Wed, 18 Sept 2024 at 00:00, Bernd Schubert
><bernd.schubert@fastmail.fm> wrote:
>
>> > I like this idea a lot. I like that it enforces per-request behavior
>> > and guarantees that any stalled request will abort the connection. I
>> > think it's fine for the timeout to be an interval/epoch so long as the
>> > documentation explicitly makes that clear. I think this would need to
>> > be done in the kernel instead of libfuse because if the server is in a
>> > deadlock when there are no pending requests in the lists and then the
>> > kernel sends requests to the server, none of the requests will make it
>> > to the list for the timer handler to detect any issues.
>> >
>> > Before I make this change for v7, Miklos what are your thoughts on
>> > this direction?
>
>I have no problem with epochs, but there are scalability issue with
>shared lists.  So I'm not sure.  Maybe the individual timers would be
>the best solution but I don't know the overhead and the scalability of
>that solution either.
>
>Thanks,
>Miklos

In principle we could also have lists per device and then a bitmap when the lists are empty. But now certainly more complex, especially as a device clone can happen any time.


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 1/2] fuse: add optional kernel-enforced timeout for requests
  2024-09-02 10:37   ` Miklos Szeredi
  2024-09-02 10:50     ` Bernd Schubert
  2024-09-03 17:25     ` Joanne Koong
@ 2024-09-27 19:36     ` Joanne Koong
  2024-09-28  8:43       ` Bernd Schubert
  2 siblings, 1 reply; 20+ messages in thread
From: Joanne Koong @ 2024-09-27 19:36 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: linux-fsdevel, josef, bernd.schubert, jefflexu, laoar.shao,
	kernel-team

On Mon, Sep 2, 2024 at 3:38 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Fri, 30 Aug 2024 at 18:27, Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > There are situations where fuse servers can become unresponsive or
> > stuck, for example if the server is in a deadlock. Currently, there's
> > no good way to detect if a server is stuck and needs to be killed
> > manually.
> >
> > This commit adds an option for enforcing a timeout (in seconds) on
> > requests where if the timeout elapses without a reply from the server,
> > the connection will be automatically aborted.
>
> Okay.
>
> I'm not sure what the overhead (scheduling and memory) of timers, but
> starting one for each request seems excessive.

I ran some benchmarks on this using the passthrough_ll server and saw
roughly a 1.5% drop in throughput (from ~775 MiB/s to ~765 MiB/s):
fio --name randwrite --ioengine=sync --thread --invalidate=1
--runtime=300 --ramp_time=10 --rw=randwrite --size=1G --numjobs=4
--bs=4k --alloc-size 98304 --allrandrepeat=1 --randseed=12345
--group_reporting=1 --directory=/root/fuse_mount

Instead of attaching a timer to each request, I think we can instead
do the following:
* add a "start_time" field to each request tracking (in jiffies) when
the request was started
* add a new list to the connection that all requests get enqueued
onto. When the request is completed, it gets dequeued from this list
* have a timer for the connection that fires off every 10 seconds or
so. When this timer is fired, it checks if "jiffies > req->start_time
+ fc->req_timeout" against the head of the list to check if the
timeout has expired and we need to abort the request. We only need to
check against the head of the list because we know every other request
after this was started later in time. I think we could even just use
the fc->lock for this instead of needing a separate lock. In the worst
case, this grants a 10 second upper bound on the timeout a user
requests (eg if the user requests 2 minutes, in the worst case the
timeout would trigger at 2 minutes and 10 seconds).

Also, now that we're aborting the connection entirely on a timeout
instead of just aborting the request, maybe it makes sense to change
the timeout granularity to minutes instead of seconds. I'm envisioning
that this timeout mechanism will mostly be used as a safeguard against
malicious or buggy servers with a high timeout configured (eg 10
minutes), and minutes seems like a nicer interface for users than them
having to convert that to seconds.

Let me know if I've missed anything with this approach but if not,
then I'll submit v7 with this change.


Thanks,
Joanne

>
> Can we make the timeout per-connection instead of per request?
>
> I.e. When the first request is sent, the timer is started. When a
> reply is received but there are still outstanding requests, the timer
> is reset.  When the last reply is received, the timer is stopped.
>
> This should handle the frozen server case just as well.  It may not
> perfectly handle the case when the server is still alive but for some
> reason one or more requests get stuck, while others are still being
> processed.   The latter case is unlikely to be an issue in practice,
> IMO.
>
> Thanks,
> Miklos

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 1/2] fuse: add optional kernel-enforced timeout for requests
  2024-09-27 19:36     ` Joanne Koong
@ 2024-09-28  8:43       ` Bernd Schubert
  2024-10-01 17:03         ` Joanne Koong
  0 siblings, 1 reply; 20+ messages in thread
From: Bernd Schubert @ 2024-09-28  8:43 UTC (permalink / raw)
  To: Joanne Koong, Miklos Szeredi
  Cc: linux-fsdevel, josef, jefflexu, laoar.shao, kernel-team

Hi Joanne,

On 9/27/24 21:36, Joanne Koong wrote:
> On Mon, Sep 2, 2024 at 3:38 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
>>
>> On Fri, 30 Aug 2024 at 18:27, Joanne Koong <joannelkoong@gmail.com> wrote:
>>>
>>> There are situations where fuse servers can become unresponsive or
>>> stuck, for example if the server is in a deadlock. Currently, there's
>>> no good way to detect if a server is stuck and needs to be killed
>>> manually.
>>>
>>> This commit adds an option for enforcing a timeout (in seconds) on
>>> requests where if the timeout elapses without a reply from the server,
>>> the connection will be automatically aborted.
>>
>> Okay.
>>
>> I'm not sure what the overhead (scheduling and memory) of timers, but
>> starting one for each request seems excessive.
> 
> I ran some benchmarks on this using the passthrough_ll server and saw
> roughly a 1.5% drop in throughput (from ~775 MiB/s to ~765 MiB/s):
> fio --name randwrite --ioengine=sync --thread --invalidate=1
> --runtime=300 --ramp_time=10 --rw=randwrite --size=1G --numjobs=4
> --bs=4k --alloc-size 98304 --allrandrepeat=1 --randseed=12345
> --group_reporting=1 --directory=/root/fuse_mount
> 
> Instead of attaching a timer to each request, I think we can instead
> do the following:
> * add a "start_time" field to each request tracking (in jiffies) when
> the request was started
> * add a new list to the connection that all requests get enqueued
> onto. When the request is completed, it gets dequeued from this list
> * have a timer for the connection that fires off every 10 seconds or
> so. When this timer is fired, it checks if "jiffies > req->start_time
> + fc->req_timeout" against the head of the list to check if the
> timeout has expired and we need to abort the request. We only need to
> check against the head of the list because we know every other request
> after this was started later in time. I think we could even just use
> the fc->lock for this instead of needing a separate lock. In the worst
> case, this grants a 10 second upper bound on the timeout a user
> requests (eg if the user requests 2 minutes, in the worst case the
> timeout would trigger at 2 minutes and 10 seconds).
> 
> Also, now that we're aborting the connection entirely on a timeout
> instead of just aborting the request, maybe it makes sense to change
> the timeout granularity to minutes instead of seconds. I'm envisioning
> that this timeout mechanism will mostly be used as a safeguard against
> malicious or buggy servers with a high timeout configured (eg 10
> minutes), and minutes seems like a nicer interface for users than them
> having to convert that to seconds.
> 
> Let me know if I've missed anything with this approach but if not,
> then I'll submit v7 with this change.


sounds great to me. Just, could we do this per fuse_dev to avoid a
single lock for all cores?


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 1/2] fuse: add optional kernel-enforced timeout for requests
  2024-09-28  8:43       ` Bernd Schubert
@ 2024-10-01 17:03         ` Joanne Koong
  2024-10-01 17:12           ` Joanne Koong
  2024-10-07 18:39           ` Joanne Koong
  0 siblings, 2 replies; 20+ messages in thread
From: Joanne Koong @ 2024-10-01 17:03 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Miklos Szeredi, linux-fsdevel, josef, jefflexu, laoar.shao,
	kernel-team

On Sat, Sep 28, 2024 at 1:43 AM Bernd Schubert
<bernd.schubert@fastmail.fm> wrote:
>
> Hi Joanne,
>
> On 9/27/24 21:36, Joanne Koong wrote:
> > On Mon, Sep 2, 2024 at 3:38 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >>
> >> On Fri, 30 Aug 2024 at 18:27, Joanne Koong <joannelkoong@gmail.com> wrote:
> >>>
> >>> There are situations where fuse servers can become unresponsive or
> >>> stuck, for example if the server is in a deadlock. Currently, there's
> >>> no good way to detect if a server is stuck and needs to be killed
> >>> manually.
> >>>
> >>> This commit adds an option for enforcing a timeout (in seconds) on
> >>> requests where if the timeout elapses without a reply from the server,
> >>> the connection will be automatically aborted.
> >>
> >> Okay.
> >>
> >> I'm not sure what the overhead (scheduling and memory) of timers, but
> >> starting one for each request seems excessive.
> >
> > I ran some benchmarks on this using the passthrough_ll server and saw
> > roughly a 1.5% drop in throughput (from ~775 MiB/s to ~765 MiB/s):
> > fio --name randwrite --ioengine=sync --thread --invalidate=1
> > --runtime=300 --ramp_time=10 --rw=randwrite --size=1G --numjobs=4
> > --bs=4k --alloc-size 98304 --allrandrepeat=1 --randseed=12345
> > --group_reporting=1 --directory=/root/fuse_mount
> >
> > Instead of attaching a timer to each request, I think we can instead
> > do the following:
> > * add a "start_time" field to each request tracking (in jiffies) when
> > the request was started
> > * add a new list to the connection that all requests get enqueued
> > onto. When the request is completed, it gets dequeued from this list
> > * have a timer for the connection that fires off every 10 seconds or
> > so. When this timer is fired, it checks if "jiffies > req->start_time
> > + fc->req_timeout" against the head of the list to check if the
> > timeout has expired and we need to abort the request. We only need to
> > check against the head of the list because we know every other request
> > after this was started later in time. I think we could even just use
> > the fc->lock for this instead of needing a separate lock. In the worst
> > case, this grants a 10 second upper bound on the timeout a user
> > requests (eg if the user requests 2 minutes, in the worst case the
> > timeout would trigger at 2 minutes and 10 seconds).
> >
> > Also, now that we're aborting the connection entirely on a timeout
> > instead of just aborting the request, maybe it makes sense to change
> > the timeout granularity to minutes instead of seconds. I'm envisioning
> > that this timeout mechanism will mostly be used as a safeguard against
> > malicious or buggy servers with a high timeout configured (eg 10
> > minutes), and minutes seems like a nicer interface for users than them
> > having to convert that to seconds.
> >
> > Let me know if I've missed anything with this approach but if not,
> > then I'll submit v7 with this change.
>
>
> sounds great to me. Just, could we do this per fuse_dev to avoid a
> single lock for all cores?
>

Will do! thanks for the suggestion - in that case, I'll add its own
spinlock for it too then.

Thanks,
Joanne

>
> Thanks,
> Bernd

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 1/2] fuse: add optional kernel-enforced timeout for requests
  2024-10-01 17:03         ` Joanne Koong
@ 2024-10-01 17:12           ` Joanne Koong
  2024-10-07 18:39           ` Joanne Koong
  1 sibling, 0 replies; 20+ messages in thread
From: Joanne Koong @ 2024-10-01 17:12 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Miklos Szeredi, linux-fsdevel, josef, jefflexu, laoar.shao,
	kernel-team

On Tue, Oct 1, 2024 at 10:03 AM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Sat, Sep 28, 2024 at 1:43 AM Bernd Schubert
> <bernd.schubert@fastmail.fm> wrote:
> >
> > Hi Joanne,
> >
> > On 9/27/24 21:36, Joanne Koong wrote:
> > > On Mon, Sep 2, 2024 at 3:38 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
> > >>
> > >> On Fri, 30 Aug 2024 at 18:27, Joanne Koong <joannelkoong@gmail.com> wrote:
> > >>>
> > >>> There are situations where fuse servers can become unresponsive or
> > >>> stuck, for example if the server is in a deadlock. Currently, there's
> > >>> no good way to detect if a server is stuck and needs to be killed
> > >>> manually.
> > >>>
> > >>> This commit adds an option for enforcing a timeout (in seconds) on
> > >>> requests where if the timeout elapses without a reply from the server,
> > >>> the connection will be automatically aborted.
> > >>
> > >> Okay.
> > >>
> > >> I'm not sure what the overhead (scheduling and memory) of timers, but
> > >> starting one for each request seems excessive.
> > >
> > > I ran some benchmarks on this using the passthrough_ll server and saw
> > > roughly a 1.5% drop in throughput (from ~775 MiB/s to ~765 MiB/s):
> > > fio --name randwrite --ioengine=sync --thread --invalidate=1
> > > --runtime=300 --ramp_time=10 --rw=randwrite --size=1G --numjobs=4
> > > --bs=4k --alloc-size 98304 --allrandrepeat=1 --randseed=12345
> > > --group_reporting=1 --directory=/root/fuse_mount
> > >
> > > Instead of attaching a timer to each request, I think we can instead
> > > do the following:
> > > * add a "start_time" field to each request tracking (in jiffies) when
> > > the request was started
> > > * add a new list to the connection that all requests get enqueued
> > > onto. When the request is completed, it gets dequeued from this list
> > > * have a timer for the connection that fires off every 10 seconds or
> > > so. When this timer is fired, it checks if "jiffies > req->start_time
> > > + fc->req_timeout" against the head of the list to check if the
> > > timeout has expired and we need to abort the request. We only need to
> > > check against the head of the list because we know every other request
> > > after this was started later in time. I think we could even just use
> > > the fc->lock for this instead of needing a separate lock. In the worst
> > > case, this grants a 10 second upper bound on the timeout a user
> > > requests (eg if the user requests 2 minutes, in the worst case the
> > > timeout would trigger at 2 minutes and 10 seconds).
> > >
> > > Also, now that we're aborting the connection entirely on a timeout
> > > instead of just aborting the request, maybe it makes sense to change
> > > the timeout granularity to minutes instead of seconds. I'm envisioning
> > > that this timeout mechanism will mostly be used as a safeguard against
> > > malicious or buggy servers with a high timeout configured (eg 10
> > > minutes), and minutes seems like a nicer interface for users than them
> > > having to convert that to seconds.
> > >
> > > Let me know if I've missed anything with this approach but if not,
> > > then I'll submit v7 with this change.
> >
> >
> > sounds great to me. Just, could we do this per fuse_dev to avoid a
> > single lock for all cores?
> >
>
> Will do! thanks for the suggestion - in that case, I'll add its own
> spinlock for it too then.

Actually, looking at this some more, we can just put this in the
"struct fuse_pqueue" and use the fpq spinlock since the check for
whether any requests timed out will be very quick (eg just checking
against the first entry in the list).
>
> Thanks,
> Joanne
>
> >
> > Thanks,
> > Bernd

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 1/2] fuse: add optional kernel-enforced timeout for requests
  2024-10-01 17:03         ` Joanne Koong
  2024-10-01 17:12           ` Joanne Koong
@ 2024-10-07 18:39           ` Joanne Koong
  2024-10-07 20:02             ` Bernd Schubert
  1 sibling, 1 reply; 20+ messages in thread
From: Joanne Koong @ 2024-10-07 18:39 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Miklos Szeredi, linux-fsdevel, josef, jefflexu, laoar.shao,
	kernel-team

On Tue, Oct 1, 2024 at 10:03 AM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Sat, Sep 28, 2024 at 1:43 AM Bernd Schubert
> <bernd.schubert@fastmail.fm> wrote:
> >
> > Hi Joanne,
> >
> > On 9/27/24 21:36, Joanne Koong wrote:
> > > On Mon, Sep 2, 2024 at 3:38 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
> > >>
> > >> On Fri, 30 Aug 2024 at 18:27, Joanne Koong <joannelkoong@gmail.com> wrote:
> > >>>
> > >>> There are situations where fuse servers can become unresponsive or
> > >>> stuck, for example if the server is in a deadlock. Currently, there's
> > >>> no good way to detect if a server is stuck and needs to be killed
> > >>> manually.
> > >>>
> > >>> This commit adds an option for enforcing a timeout (in seconds) on
> > >>> requests where if the timeout elapses without a reply from the server,
> > >>> the connection will be automatically aborted.
> > >>
> > >> Okay.
> > >>
> > >> I'm not sure what the overhead (scheduling and memory) of timers, but
> > >> starting one for each request seems excessive.
> > >
> > > I ran some benchmarks on this using the passthrough_ll server and saw
> > > roughly a 1.5% drop in throughput (from ~775 MiB/s to ~765 MiB/s):
> > > fio --name randwrite --ioengine=sync --thread --invalidate=1
> > > --runtime=300 --ramp_time=10 --rw=randwrite --size=1G --numjobs=4
> > > --bs=4k --alloc-size 98304 --allrandrepeat=1 --randseed=12345
> > > --group_reporting=1 --directory=/root/fuse_mount
> > >
> > > Instead of attaching a timer to each request, I think we can instead
> > > do the following:
> > > * add a "start_time" field to each request tracking (in jiffies) when
> > > the request was started
> > > * add a new list to the connection that all requests get enqueued
> > > onto. When the request is completed, it gets dequeued from this list
> > > * have a timer for the connection that fires off every 10 seconds or
> > > so. When this timer is fired, it checks if "jiffies > req->start_time
> > > + fc->req_timeout" against the head of the list to check if the
> > > timeout has expired and we need to abort the request. We only need to
> > > check against the head of the list because we know every other request
> > > after this was started later in time. I think we could even just use
> > > the fc->lock for this instead of needing a separate lock. In the worst
> > > case, this grants a 10 second upper bound on the timeout a user
> > > requests (eg if the user requests 2 minutes, in the worst case the
> > > timeout would trigger at 2 minutes and 10 seconds).
> > >
> > > Also, now that we're aborting the connection entirely on a timeout
> > > instead of just aborting the request, maybe it makes sense to change
> > > the timeout granularity to minutes instead of seconds. I'm envisioning
> > > that this timeout mechanism will mostly be used as a safeguard against
> > > malicious or buggy servers with a high timeout configured (eg 10
> > > minutes), and minutes seems like a nicer interface for users than them
> > > having to convert that to seconds.
> > >
> > > Let me know if I've missed anything with this approach but if not,
> > > then I'll submit v7 with this change.
> >
> >
> > sounds great to me. Just, could we do this per fuse_dev to avoid a
> > single lock for all cores?
> >
>
> Will do! thanks for the suggestion - in that case, I'll add its own
> spinlock for it too then.

I realized while working on v7 that we can't do this per fuse device
because the request is only associated with a device once it's read in
by the server (eg fuse_dev_do_read).

I ran some rough preliminary benchmarks with
./libfuse/build/example/passthrough_ll  -o writeback -o max_threads=4
-o source=~/fstests ~/fuse_mount
and
fio --name randwrite --ioengine=sync --thread --invalidate=1
--runtime=300 --ramp_time=10 --rw=randwrite --size=1G --numjobs=4
--bs=4k --alloc-size 98304 --allrandrepeat=1 --randseed=12345
--group_reporting=1 --directory=fuse_mount

and didn't see any noticeable difference in throughput (~37 MiB/sec on
my system) with vs without the timeout.


Thanks,
Joanne

>
> Thanks,
> Joanne
>
> >
> > Thanks,
> > Bernd

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 1/2] fuse: add optional kernel-enforced timeout for requests
  2024-10-07 18:39           ` Joanne Koong
@ 2024-10-07 20:02             ` Bernd Schubert
  2024-10-08 16:26               ` Joanne Koong
  0 siblings, 1 reply; 20+ messages in thread
From: Bernd Schubert @ 2024-10-07 20:02 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Miklos Szeredi, linux-fsdevel, josef, jefflexu, laoar.shao,
	kernel-team



On 10/7/24 20:39, Joanne Koong wrote:
> On Tue, Oct 1, 2024 at 10:03 AM Joanne Koong <joannelkoong@gmail.com> wrote:
>>
>> On Sat, Sep 28, 2024 at 1:43 AM Bernd Schubert
>> <bernd.schubert@fastmail.fm> wrote:
>>>
>>> Hi Joanne,
>>>
>>> On 9/27/24 21:36, Joanne Koong wrote:
>>>> On Mon, Sep 2, 2024 at 3:38 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
>>>>>
>>>>> On Fri, 30 Aug 2024 at 18:27, Joanne Koong <joannelkoong@gmail.com> wrote:
>>>>>>
>>>>>> There are situations where fuse servers can become unresponsive or
>>>>>> stuck, for example if the server is in a deadlock. Currently, there's
>>>>>> no good way to detect if a server is stuck and needs to be killed
>>>>>> manually.
>>>>>>
>>>>>> This commit adds an option for enforcing a timeout (in seconds) on
>>>>>> requests where if the timeout elapses without a reply from the server,
>>>>>> the connection will be automatically aborted.
>>>>>
>>>>> Okay.
>>>>>
>>>>> I'm not sure what the overhead (scheduling and memory) of timers, but
>>>>> starting one for each request seems excessive.
>>>>
>>>> I ran some benchmarks on this using the passthrough_ll server and saw
>>>> roughly a 1.5% drop in throughput (from ~775 MiB/s to ~765 MiB/s):
>>>> fio --name randwrite --ioengine=sync --thread --invalidate=1
>>>> --runtime=300 --ramp_time=10 --rw=randwrite --size=1G --numjobs=4
>>>> --bs=4k --alloc-size 98304 --allrandrepeat=1 --randseed=12345
>>>> --group_reporting=1 --directory=/root/fuse_mount
>>>>
>>>> Instead of attaching a timer to each request, I think we can instead
>>>> do the following:
>>>> * add a "start_time" field to each request tracking (in jiffies) when
>>>> the request was started
>>>> * add a new list to the connection that all requests get enqueued
>>>> onto. When the request is completed, it gets dequeued from this list
>>>> * have a timer for the connection that fires off every 10 seconds or
>>>> so. When this timer is fired, it checks if "jiffies > req->start_time
>>>> + fc->req_timeout" against the head of the list to check if the
>>>> timeout has expired and we need to abort the request. We only need to
>>>> check against the head of the list because we know every other request
>>>> after this was started later in time. I think we could even just use
>>>> the fc->lock for this instead of needing a separate lock. In the worst
>>>> case, this grants a 10 second upper bound on the timeout a user
>>>> requests (eg if the user requests 2 minutes, in the worst case the
>>>> timeout would trigger at 2 minutes and 10 seconds).
>>>>
>>>> Also, now that we're aborting the connection entirely on a timeout
>>>> instead of just aborting the request, maybe it makes sense to change
>>>> the timeout granularity to minutes instead of seconds. I'm envisioning
>>>> that this timeout mechanism will mostly be used as a safeguard against
>>>> malicious or buggy servers with a high timeout configured (eg 10
>>>> minutes), and minutes seems like a nicer interface for users than them
>>>> having to convert that to seconds.
>>>>
>>>> Let me know if I've missed anything with this approach but if not,
>>>> then I'll submit v7 with this change.
>>>
>>>
>>> sounds great to me. Just, could we do this per fuse_dev to avoid a
>>> single lock for all cores?
>>>
>>
>> Will do! thanks for the suggestion - in that case, I'll add its own
>> spinlock for it too then.
> 
> I realized while working on v7 that we can't do this per fuse device
> because the request is only associated with a device once it's read in
> by the server (eg fuse_dev_do_read).
> 
> I ran some rough preliminary benchmarks with
> ./libfuse/build/example/passthrough_ll  -o writeback -o max_threads=4
> -o source=~/fstests ~/fuse_mount
> and
> fio --name randwrite --ioengine=sync --thread --invalidate=1
> --runtime=300 --ramp_time=10 --rw=randwrite --size=1G --numjobs=4
> --bs=4k --alloc-size 98304 --allrandrepeat=1 --randseed=12345
> --group_reporting=1 --directory=fuse_mount
> 
> and didn't see any noticeable difference in throughput (~37 MiB/sec on
> my system) with vs without the timeout.
> 


That is not much, isn't your limit the backend? I wonder what would happen
with 25GB/s I'm testing with. Wouldn't it make sense for this to test with 
sequential large IO? And possibly with the passthrough-hp branch that
bypasses IO? And a NUMA system probably would be helpful as well. 
I.e. to test the effect on the kernel side without having an IO limited
system?


With the io-uring interface requests stay in queues from the in-coming CPU
so easier to achieve there, although I wonder for your use-case if it
wouldn't be sufficient to start the timer only when the request is on
the way to fuse-server? One disadvantage I see is that virtiofs would need
to be specially handled.


Thanks,
Bernd



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 1/2] fuse: add optional kernel-enforced timeout for requests
  2024-10-07 20:02             ` Bernd Schubert
@ 2024-10-08 16:26               ` Joanne Koong
  2024-10-08 19:00                 ` Joanne Koong
  0 siblings, 1 reply; 20+ messages in thread
From: Joanne Koong @ 2024-10-08 16:26 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Miklos Szeredi, linux-fsdevel, josef, jefflexu, laoar.shao,
	kernel-team

On Mon, Oct 7, 2024 at 1:02 PM Bernd Schubert
<bernd.schubert@fastmail.fm> wrote:
>
> On 10/7/24 20:39, Joanne Koong wrote:
> > On Tue, Oct 1, 2024 at 10:03 AM Joanne Koong <joannelkoong@gmail.com> wrote:
> >>
> >> On Sat, Sep 28, 2024 at 1:43 AM Bernd Schubert
> >> <bernd.schubert@fastmail.fm> wrote:
> >>>
> >>> Hi Joanne,
> >>>
> >>> On 9/27/24 21:36, Joanne Koong wrote:
> >>>> On Mon, Sep 2, 2024 at 3:38 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >>>>>
> >>>>> On Fri, 30 Aug 2024 at 18:27, Joanne Koong <joannelkoong@gmail.com> wrote:
> >>>>>>
> >>>>>> There are situations where fuse servers can become unresponsive or
> >>>>>> stuck, for example if the server is in a deadlock. Currently, there's
> >>>>>> no good way to detect if a server is stuck and needs to be killed
> >>>>>> manually.
> >>>>>>
> >>>>>> This commit adds an option for enforcing a timeout (in seconds) on
> >>>>>> requests where if the timeout elapses without a reply from the server,
> >>>>>> the connection will be automatically aborted.
> >>>>>
> >>>>> Okay.
> >>>>>
> >>>>> I'm not sure what the overhead (scheduling and memory) of timers, but
> >>>>> starting one for each request seems excessive.
> >>>>
> >>>> I ran some benchmarks on this using the passthrough_ll server and saw
> >>>> roughly a 1.5% drop in throughput (from ~775 MiB/s to ~765 MiB/s):
> >>>> fio --name randwrite --ioengine=sync --thread --invalidate=1
> >>>> --runtime=300 --ramp_time=10 --rw=randwrite --size=1G --numjobs=4
> >>>> --bs=4k --alloc-size 98304 --allrandrepeat=1 --randseed=12345
> >>>> --group_reporting=1 --directory=/root/fuse_mount
> >>>>
> >>>> Instead of attaching a timer to each request, I think we can instead
> >>>> do the following:
> >>>> * add a "start_time" field to each request tracking (in jiffies) when
> >>>> the request was started
> >>>> * add a new list to the connection that all requests get enqueued
> >>>> onto. When the request is completed, it gets dequeued from this list
> >>>> * have a timer for the connection that fires off every 10 seconds or
> >>>> so. When this timer is fired, it checks if "jiffies > req->start_time
> >>>> + fc->req_timeout" against the head of the list to check if the
> >>>> timeout has expired and we need to abort the request. We only need to
> >>>> check against the head of the list because we know every other request
> >>>> after this was started later in time. I think we could even just use
> >>>> the fc->lock for this instead of needing a separate lock. In the worst
> >>>> case, this grants a 10 second upper bound on the timeout a user
> >>>> requests (eg if the user requests 2 minutes, in the worst case the
> >>>> timeout would trigger at 2 minutes and 10 seconds).
> >>>>
> >>>> Also, now that we're aborting the connection entirely on a timeout
> >>>> instead of just aborting the request, maybe it makes sense to change
> >>>> the timeout granularity to minutes instead of seconds. I'm envisioning
> >>>> that this timeout mechanism will mostly be used as a safeguard against
> >>>> malicious or buggy servers with a high timeout configured (eg 10
> >>>> minutes), and minutes seems like a nicer interface for users than them
> >>>> having to convert that to seconds.
> >>>>
> >>>> Let me know if I've missed anything with this approach but if not,
> >>>> then I'll submit v7 with this change.
> >>>
> >>>
> >>> sounds great to me. Just, could we do this per fuse_dev to avoid a
> >>> single lock for all cores?
> >>>
> >>
> >> Will do! thanks for the suggestion - in that case, I'll add its own
> >> spinlock for it too then.
> >
> > I realized while working on v7 that we can't do this per fuse device
> > because the request is only associated with a device once it's read in
> > by the server (eg fuse_dev_do_read).
> >
> > I ran some rough preliminary benchmarks with
> > ./libfuse/build/example/passthrough_ll  -o writeback -o max_threads=4
> > -o source=~/fstests ~/fuse_mount
> > and
> > fio --name randwrite --ioengine=sync --thread --invalidate=1
> > --runtime=300 --ramp_time=10 --rw=randwrite --size=1G --numjobs=4
> > --bs=4k --alloc-size 98304 --allrandrepeat=1 --randseed=12345
> > --group_reporting=1 --directory=fuse_mount
> >
> > and didn't see any noticeable difference in throughput (~37 MiB/sec on
> > my system) with vs without the timeout.
> >
>
>
> That is not much, isn't your limit the backend? I wonder what would happen
> with 25GB/s I'm testing with. Wouldn't it make sense for this to test with
> sequential large IO? And possibly with the passthrough-hp branch that
> bypasses IO? And a NUMA system probably would be helpful as well.
> I.e. to test the effect on the kernel side without having an IO limited
> system?
>

The preliminary benchmarks yesterday were run on a VM because I had
trouble getting consistent results between baseline runs (on origin
w/out my changes) on my usual test machine. I'm going to get this
sorted out and run some tests again.

What are you testing on that's giving you 25 GB/s?

>
> With the io-uring interface requests stay in queues from the in-coming CPU
> so easier to achieve there, although I wonder for your use-case if it
> wouldn't be sufficient to start the timer only when the request is on
> the way to fuse-server? One disadvantage I see is that virtiofs would need
> to be specially handled.

Unfortunately I don't think it suffices to only start the timer when
the request is on the way to the fuse server. If there's a malicious
or deadlocked server, they might not read from /dev/fuse, but we would
want to abort the connection in those cases as well.


Thanks,
Joanne

>
>
> Thanks,
> Bernd
>
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v6 1/2] fuse: add optional kernel-enforced timeout for requests
  2024-10-08 16:26               ` Joanne Koong
@ 2024-10-08 19:00                 ` Joanne Koong
  0 siblings, 0 replies; 20+ messages in thread
From: Joanne Koong @ 2024-10-08 19:00 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Miklos Szeredi, linux-fsdevel, josef, jefflexu, laoar.shao,
	kernel-team

On Tue, Oct 8, 2024 at 9:26 AM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Mon, Oct 7, 2024 at 1:02 PM Bernd Schubert
> <bernd.schubert@fastmail.fm> wrote:
> >
> > On 10/7/24 20:39, Joanne Koong wrote:
> > > On Tue, Oct 1, 2024 at 10:03 AM Joanne Koong <joannelkoong@gmail.com> wrote:
> > >>
> > >> On Sat, Sep 28, 2024 at 1:43 AM Bernd Schubert
> > >> <bernd.schubert@fastmail.fm> wrote:
> > >>>
> > >>> Hi Joanne,
> > >>>
> > >>> On 9/27/24 21:36, Joanne Koong wrote:
> > >>>> On Mon, Sep 2, 2024 at 3:38 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
> > >>>>>
> > >>>>> On Fri, 30 Aug 2024 at 18:27, Joanne Koong <joannelkoong@gmail.com> wrote:
> > >>>>>>
> > >>>>>> There are situations where fuse servers can become unresponsive or
> > >>>>>> stuck, for example if the server is in a deadlock. Currently, there's
> > >>>>>> no good way to detect if a server is stuck and needs to be killed
> > >>>>>> manually.
> > >>>>>>
> > >>>>>> This commit adds an option for enforcing a timeout (in seconds) on
> > >>>>>> requests where if the timeout elapses without a reply from the server,
> > >>>>>> the connection will be automatically aborted.
> > >>>>>
> > >>>>> Okay.
> > >>>>>
> > >>>>> I'm not sure what the overhead (scheduling and memory) of timers, but
> > >>>>> starting one for each request seems excessive.
> > >>>>
> > >>>> I ran some benchmarks on this using the passthrough_ll server and saw
> > >>>> roughly a 1.5% drop in throughput (from ~775 MiB/s to ~765 MiB/s):
> > >>>> fio --name randwrite --ioengine=sync --thread --invalidate=1
> > >>>> --runtime=300 --ramp_time=10 --rw=randwrite --size=1G --numjobs=4
> > >>>> --bs=4k --alloc-size 98304 --allrandrepeat=1 --randseed=12345
> > >>>> --group_reporting=1 --directory=/root/fuse_mount
> > >>>>
> > >>>> Instead of attaching a timer to each request, I think we can instead
> > >>>> do the following:
> > >>>> * add a "start_time" field to each request tracking (in jiffies) when
> > >>>> the request was started
> > >>>> * add a new list to the connection that all requests get enqueued
> > >>>> onto. When the request is completed, it gets dequeued from this list
> > >>>> * have a timer for the connection that fires off every 10 seconds or
> > >>>> so. When this timer is fired, it checks if "jiffies > req->start_time
> > >>>> + fc->req_timeout" against the head of the list to check if the
> > >>>> timeout has expired and we need to abort the request. We only need to
> > >>>> check against the head of the list because we know every other request
> > >>>> after this was started later in time. I think we could even just use
> > >>>> the fc->lock for this instead of needing a separate lock. In the worst
> > >>>> case, this grants a 10 second upper bound on the timeout a user
> > >>>> requests (eg if the user requests 2 minutes, in the worst case the
> > >>>> timeout would trigger at 2 minutes and 10 seconds).
> > >>>>
> > >>>> Also, now that we're aborting the connection entirely on a timeout
> > >>>> instead of just aborting the request, maybe it makes sense to change
> > >>>> the timeout granularity to minutes instead of seconds. I'm envisioning
> > >>>> that this timeout mechanism will mostly be used as a safeguard against
> > >>>> malicious or buggy servers with a high timeout configured (eg 10
> > >>>> minutes), and minutes seems like a nicer interface for users than them
> > >>>> having to convert that to seconds.
> > >>>>
> > >>>> Let me know if I've missed anything with this approach but if not,
> > >>>> then I'll submit v7 with this change.
> > >>>
> > >>>
> > >>> sounds great to me. Just, could we do this per fuse_dev to avoid a
> > >>> single lock for all cores?
> > >>>
> > >>
> > >> Will do! thanks for the suggestion - in that case, I'll add its own
> > >> spinlock for it too then.
> > >
> > > I realized while working on v7 that we can't do this per fuse device
> > > because the request is only associated with a device once it's read in
> > > by the server (eg fuse_dev_do_read).
> > >
> > > I ran some rough preliminary benchmarks with
> > > ./libfuse/build/example/passthrough_ll  -o writeback -o max_threads=4
> > > -o source=~/fstests ~/fuse_mount
> > > and
> > > fio --name randwrite --ioengine=sync --thread --invalidate=1
> > > --runtime=300 --ramp_time=10 --rw=randwrite --size=1G --numjobs=4
> > > --bs=4k --alloc-size 98304 --allrandrepeat=1 --randseed=12345
> > > --group_reporting=1 --directory=fuse_mount
> > >
> > > and didn't see any noticeable difference in throughput (~37 MiB/sec on
> > > my system) with vs without the timeout.
> > >
> >
> >
> > That is not much, isn't your limit the backend? I wonder what would happen
> > with 25GB/s I'm testing with. Wouldn't it make sense for this to test with
> > sequential large IO? And possibly with the passthrough-hp branch that
> > bypasses IO? And a NUMA system probably would be helpful as well.
> > I.e. to test the effect on the kernel side without having an IO limited
> > system?
> >
>
> The preliminary benchmarks yesterday were run on a VM because I had
> trouble getting consistent results between baseline runs (on origin
> w/out my changes) on my usual test machine. I'm going to get this
> sorted out and run some tests again.
>

I'll attach the updated benchmark numbers on this thread (v7 of the
timeout patchset) so that everything's in one place:
https://lore.kernel.org/linux-fsdevel/20241007184258.2837492-1-joannelkoong@gmail.com/T/#t

> What are you testing on that's giving you 25 GB/s?
>
> >
> > With the io-uring interface requests stay in queues from the in-coming CPU
> > so easier to achieve there, although I wonder for your use-case if it
> > wouldn't be sufficient to start the timer only when the request is on
> > the way to fuse-server? One disadvantage I see is that virtiofs would need
> > to be specially handled.
>
> Unfortunately I don't think it suffices to only start the timer when
> the request is on the way to the fuse server. If there's a malicious
> or deadlocked server, they might not read from /dev/fuse, but we would
> want to abort the connection in those cases as well.
>
>
> Thanks,
> Joanne
>
> >
> >
> > Thanks,
> > Bernd
> >
> >

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2024-10-08 19:00 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-30 16:26 [PATCH v6 0/2] fuse: add timeout option for requests Joanne Koong
2024-08-30 16:26 ` [PATCH v6 1/2] fuse: add optional kernel-enforced timeout " Joanne Koong
2024-09-02 10:37   ` Miklos Szeredi
2024-09-02 10:50     ` Bernd Schubert
2024-09-02 11:11       ` Miklos Szeredi
2024-09-03 17:25     ` Joanne Koong
2024-09-03 22:37       ` Bernd Schubert
2024-09-04 17:23         ` Joanne Koong
2024-09-17 22:00           ` Bernd Schubert
2024-09-18  7:29             ` Miklos Szeredi
2024-09-18  9:12               ` Bernd Schubert
2024-09-27 19:36     ` Joanne Koong
2024-09-28  8:43       ` Bernd Schubert
2024-10-01 17:03         ` Joanne Koong
2024-10-01 17:12           ` Joanne Koong
2024-10-07 18:39           ` Joanne Koong
2024-10-07 20:02             ` Bernd Schubert
2024-10-08 16:26               ` Joanne Koong
2024-10-08 19:00                 ` Joanne Koong
2024-08-30 16:26 ` [PATCH v6 2/2] fuse: add default_request_timeout and max_request_timeout sysctls Joanne Koong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).