[PATCH RESEND v9 0/3] fuse: add kernel-enforced request timeout option

public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH RESEND v9 0/3] fuse: add kernel-enforced request timeout option
@ 2024-11-14 19:13 Joanne Koong
  2024-11-14 19:13 ` [PATCH RESEND v9 1/3] fs_parser: add fsparam_u16 helper Joanne Koong
                   ` (3 more replies)
  0 siblings, 4 replies; 39+ messages in thread
From: Joanne Koong @ 2024-11-14 19:13 UTC (permalink / raw)
  To: miklos, linux-fsdevel
  Cc: josef, bernd.schubert, jefflexu, laoar.shao, kernel-team

There are situations where fuse servers can become unresponsive or
stuck, for example if the server is in a deadlock. Currently, there's
no good way to detect if a server is stuck and needs to be killed
manually.

This patchset adds a timeout option where if the server does not reply to a
request by the time the timeout elapses, the connection will be aborted.
This patchset also adds two dynamically configurable fuse sysctls
"default_request_timeout" and "max_request_timeout" for controlling/enforcing
timeout behavior system-wide.

Existing systems running fuse servers will not be affected unless they
explicitly opt into the timeout.

v8:
https://lore.kernel.org/linux-fsdevel/20241011191320.91592-1-joannelkoong@gmail.com/
Changes from v8 -> v9:
* Fix comment for u16 fs_parse_result, ULONG_MAX instead of U32_MAX, fix
  spacing (Bernd)

v7:
https://lore.kernel.org/linux-fsdevel/20241007184258.2837492-1-joannelkoong@gmail.com/
Changes from v7 -> v8:
* Use existing lists for checking expirations (Miklos)

v6:
https://lore.kernel.org/linux-fsdevel/20240830162649.3849586-1-joannelkoong@gmail.com/
Changes from v6 -> v7:
- Make timer per-connection instead of per-request (Miklos)
- Make default granularity of time minutes instead of seconds
- Removed the reviewed-bys since the interface of this has changed (now
  minutes, instead of seconds)

v5:
https://lore.kernel.org/linux-fsdevel/20240826203234.4079338-1-joannelkoong@gmail.com/
Changes from v5 -> v6:
- Gate sysctl.o behind CONFIG_SYSCTL in makefile (kernel test robot)
- Reword/clarify last sentence in cover letter (Miklos)

v4:
https://lore.kernel.org/linux-fsdevel/20240813232241.2369855-1-joannelkoong@gmail.com/
Changes from v4 -> v5:
- Change timeout behavior from aborting request to aborting connection
  (Miklos)
- Clarify wording for sysctl documentation (Jingbo)

v3:
https://lore.kernel.org/linux-fsdevel/20240808190110.3188039-1-joannelkoong@gmail.com/
Changes from v3 -> v4:
- Fix wording on some comments to make it more clear
- Use simpler logic for timer (eg remove extra if checks, use mod timer API)
  (Josef)
- Sanity-check should be on FR_FINISHING not FR_FINISHED (Jingbo)
- Fix comment for "processing queue", add req->fpq = NULL safeguard  (Bernd)

v2:
https://lore.kernel.org/linux-fsdevel/20240730002348.3431931-1-joannelkoong@gmail.com/
Changes from v2 -> v3:
- Disarm / rearm timer in dev_do_read to handle race conditions (Bernrd)
- Disarm timer in error handling for fatal interrupt (Yafang)
- Clean up do_fuse_request_end (Jingbo)
- Add timer for notify retrieve requests 
- Fix kernel test robot errors for #define no-op functions

v1:
https://lore.kernel.org/linux-fsdevel/20240717213458.1613347-1-joannelkoong@gmail.com/
Changes from v1 -> v2:
- Add timeout for background requests
- Handle resend race condition
- Add sysctls

Joanne Koong (3):
  fs_parser: add fsparam_u16 helper
  fuse: add optional kernel-enforced timeout for requests
  fuse: add default_request_timeout and max_request_timeout sysctls

 Documentation/admin-guide/sysctl/fs.rst | 27 +++++++++
 fs/fs_parser.c                          | 14 +++++
 fs/fuse/dev.c                           | 80 +++++++++++++++++++++++++
 fs/fuse/fuse_i.h                        | 31 ++++++++++
 fs/fuse/inode.c                         | 33 ++++++++++
 fs/fuse/sysctl.c                        | 20 +++++++
 include/linux/fs_parser.h               |  9 ++-
 7 files changed, 211 insertions(+), 3 deletions(-)

-- 
2.43.5


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH RESEND v9 1/3] fs_parser: add fsparam_u16 helper
  2024-11-14 19:13 [PATCH RESEND v9 0/3] fuse: add kernel-enforced request timeout option Joanne Koong
@ 2024-11-14 19:13 ` Joanne Koong
  2024-11-14 19:13 ` [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests Joanne Koong
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 39+ messages in thread
From: Joanne Koong @ 2024-11-14 19:13 UTC (permalink / raw)
  To: miklos, linux-fsdevel
  Cc: josef, bernd.schubert, jefflexu, laoar.shao, kernel-team,
	Bernd Schubert

Add a fsparam helper for unsigned 16 bit values.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
---
 fs/fs_parser.c            | 14 ++++++++++++++
 include/linux/fs_parser.h |  9 ++++++---
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/fs/fs_parser.c b/fs/fs_parser.c
index 24727ec34e5a..0e06f9618c89 100644
--- a/fs/fs_parser.c
+++ b/fs/fs_parser.c
@@ -210,6 +210,20 @@ int fs_param_is_bool(struct p_log *log, const struct fs_parameter_spec *p,
 }
 EXPORT_SYMBOL(fs_param_is_bool);
 
+int fs_param_is_u16(struct p_log *log, const struct fs_parameter_spec *p,
+		    struct fs_parameter *param, struct fs_parse_result *result)
+{
+	int base = (unsigned long)p->data;
+	if (param->type != fs_value_is_string)
+		return fs_param_bad_value(log, param);
+	if (!*param->string && (p->flags & fs_param_can_be_empty))
+		return 0;
+	if (kstrtou16(param->string, base, &result->uint_16) < 0)
+		return fs_param_bad_value(log, param);
+	return 0;
+}
+EXPORT_SYMBOL(fs_param_is_u16);
+
 int fs_param_is_u32(struct p_log *log, const struct fs_parameter_spec *p,
 		    struct fs_parameter *param, struct fs_parse_result *result)
 {
diff --git a/include/linux/fs_parser.h b/include/linux/fs_parser.h
index 6cf713a7e6c6..84acd7acef50 100644
--- a/include/linux/fs_parser.h
+++ b/include/linux/fs_parser.h
@@ -26,9 +26,10 @@ typedef int fs_param_type(struct p_log *,
 /*
  * The type of parameter expected.
  */
-fs_param_type fs_param_is_bool, fs_param_is_u32, fs_param_is_s32, fs_param_is_u64,
-	fs_param_is_enum, fs_param_is_string, fs_param_is_blob, fs_param_is_blockdev,
-	fs_param_is_path, fs_param_is_fd, fs_param_is_uid, fs_param_is_gid;
+fs_param_type fs_param_is_bool, fs_param_is_u16, fs_param_is_u32, fs_param_is_s32,
+	fs_param_is_u64, fs_param_is_enum, fs_param_is_string, fs_param_is_blob,
+	fs_param_is_blockdev, fs_param_is_path, fs_param_is_fd, fs_param_is_uid,
+	fs_param_is_gid;
 
 /*
  * Specification of the type of value a parameter wants.
@@ -55,6 +56,7 @@ struct fs_parse_result {
 	union {
 		bool		boolean;	/* For spec_bool */
 		int		int_32;		/* For spec_s32/spec_enum */
+		u16             uint_16;	/* For spec_u16{,_octal,_hex}/spec_enum */
 		unsigned int	uint_32;	/* For spec_u32{,_octal,_hex}/spec_enum */
 		u64		uint_64;	/* For spec_u64 */
 		kuid_t		uid;
@@ -119,6 +121,7 @@ static inline bool fs_validate_description(const char *name,
 #define fsparam_flag_no(NAME, OPT) \
 			__fsparam(NULL, NAME, OPT, fs_param_neg_with_no, NULL)
 #define fsparam_bool(NAME, OPT)	__fsparam(fs_param_is_bool, NAME, OPT, 0, NULL)
+#define fsparam_u16(NAME, OPT)	__fsparam(fs_param_is_u16, NAME, OPT, 0, NULL)
 #define fsparam_u32(NAME, OPT)	__fsparam(fs_param_is_u32, NAME, OPT, 0, NULL)
 #define fsparam_u32oct(NAME, OPT) \
 			__fsparam(fs_param_is_u32, NAME, OPT, 0, (void *)8)
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
  2024-11-14 19:13 [PATCH RESEND v9 0/3] fuse: add kernel-enforced request timeout option Joanne Koong
  2024-11-14 19:13 ` [PATCH RESEND v9 1/3] fs_parser: add fsparam_u16 helper Joanne Koong
@ 2024-11-14 19:13 ` Joanne Koong
  2024-11-28 10:44   ` Sergey Senozhatsky
                     ` (7 more replies)
  2024-11-14 19:13 ` [PATCH RESEND v9 3/3] fuse: add default_request_timeout and max_request_timeout sysctls Joanne Koong
  2024-11-21 22:02 ` [PATCH RESEND v9 0/3] fuse: add kernel-enforced request timeout option Josef Bacik
  3 siblings, 8 replies; 39+ messages in thread
From: Joanne Koong @ 2024-11-14 19:13 UTC (permalink / raw)
  To: miklos, linux-fsdevel
  Cc: josef, bernd.schubert, jefflexu, laoar.shao, kernel-team,
	Bernd Schubert

There are situations where fuse servers can become unresponsive or
stuck, for example if the server is deadlocked. Currently, there's no
good way to detect if a server is stuck and needs to be killed manually.

This commit adds an option for enforcing a timeout (in minutes) for
requests where if the timeout elapses without the server responding to
the request, the connection will be automatically aborted.

Please note that these timeouts are not 100% precise. The request may
take an extra FUSE_TIMEOUT_TIMER_FREQ seconds beyond the requested max
timeout due to how it's internally implemented.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
---
 fs/fuse/dev.c    | 80 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/fuse/fuse_i.h | 21 +++++++++++++
 fs/fuse/inode.c  | 21 +++++++++++++
 3 files changed, 122 insertions(+)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 29fc61a072ba..536aa4525e8f 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -45,6 +45,82 @@ static struct fuse_dev *fuse_get_dev(struct file *file)
 	return READ_ONCE(file->private_data);
 }
 
+static bool request_expired(struct fuse_conn *fc, struct fuse_req *req)
+{
+	return jiffies > req->create_time + fc->timeout.req_timeout;
+}
+
+/*
+ * Check if any requests aren't being completed by the specified request
+ * timeout. To do so, we:
+ * - check the fiq pending list
+ * - check the bg queue
+ * - check the fpq io and processing lists
+ *
+ * To make this fast, we only check against the head request on each list since
+ * these are generally queued in order of creation time (eg newer requests get
+ * queued to the tail). We might miss a few edge cases (eg requests transitioning
+ * between lists, re-sent requests at the head of the pending list having a
+ * later creation time than other requests on that list, etc.) but that is fine
+ * since if the request never gets fulfilled, it will eventually be caught.
+ */
+void fuse_check_timeout(struct timer_list *timer)
+{
+	struct fuse_conn *fc = container_of(timer, struct fuse_conn, timeout.timer);
+	struct fuse_iqueue *fiq = &fc->iq;
+	struct fuse_req *req;
+	struct fuse_dev *fud;
+	struct fuse_pqueue *fpq;
+	bool expired = false;
+	int i;
+
+	spin_lock(&fiq->lock);
+	req = list_first_entry_or_null(&fiq->pending, struct fuse_req, list);
+	if (req)
+		expired = request_expired(fc, req);
+	spin_unlock(&fiq->lock);
+	if (expired)
+		goto abort_conn;
+
+	spin_lock(&fc->bg_lock);
+	req = list_first_entry_or_null(&fc->bg_queue, struct fuse_req, list);
+	if (req)
+		expired = request_expired(fc, req);
+	spin_unlock(&fc->bg_lock);
+	if (expired)
+		goto abort_conn;
+
+	spin_lock(&fc->lock);
+	if (!fc->connected) {
+		spin_unlock(&fc->lock);
+		return;
+	}
+	list_for_each_entry(fud, &fc->devices, entry) {
+		fpq = &fud->pq;
+		spin_lock(&fpq->lock);
+		req = list_first_entry_or_null(&fpq->io, struct fuse_req, list);
+		if (req && request_expired(fc, req))
+			goto fpq_abort;
+
+		for (i = 0; i < FUSE_PQ_HASH_SIZE; i++) {
+			req = list_first_entry_or_null(&fpq->processing[i], struct fuse_req, list);
+			if (req && request_expired(fc, req))
+				goto fpq_abort;
+		}
+		spin_unlock(&fpq->lock);
+	}
+	spin_unlock(&fc->lock);
+
+	mod_timer(&fc->timeout.timer, jiffies + FUSE_TIMEOUT_TIMER_FREQ);
+	return;
+
+fpq_abort:
+	spin_unlock(&fpq->lock);
+	spin_unlock(&fc->lock);
+abort_conn:
+	fuse_abort_conn(fc);
+}
+
 static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
 {
 	INIT_LIST_HEAD(&req->list);
@@ -53,6 +129,7 @@ static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
 	refcount_set(&req->count, 1);
 	__set_bit(FR_PENDING, &req->flags);
 	req->fm = fm;
+	req->create_time = jiffies;
 }
 
 static struct fuse_req *fuse_request_alloc(struct fuse_mount *fm, gfp_t flags)
@@ -2308,6 +2385,9 @@ void fuse_abort_conn(struct fuse_conn *fc)
 		spin_unlock(&fc->lock);
 
 		end_requests(&to_end);
+
+		if (fc->timeout.req_timeout)
+			timer_delete(&fc->timeout.timer);
 	} else {
 		spin_unlock(&fc->lock);
 	}
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index d35c37ccf9b5..9092201c4e0b 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -438,6 +438,9 @@ struct fuse_req {
 
 	/** fuse_mount this request belongs to */
 	struct fuse_mount *fm;
+
+	/** When (in jiffies) the request was created */
+	unsigned long create_time;
 };
 
 struct fuse_iqueue;
@@ -528,6 +531,16 @@ struct fuse_pqueue {
 	struct list_head io;
 };
 
+/* Frequency (in seconds) of request timeout checks, if opted into */
+#define FUSE_TIMEOUT_TIMER_FREQ 60 * HZ
+
+struct fuse_timeout {
+	struct timer_list timer;
+
+	/* Request timeout (in jiffies). 0 = no timeout */
+	unsigned long req_timeout;
+};
+
 /**
  * Fuse device instance
  */
@@ -574,6 +587,8 @@ struct fuse_fs_context {
 	enum fuse_dax_mode dax_mode;
 	unsigned int max_read;
 	unsigned int blksize;
+	/*  Request timeout (in minutes). 0 = no timeout (infinite wait) */
+	unsigned int req_timeout;
 	const char *subtype;
 
 	/* DAX device, may be NULL */
@@ -920,6 +935,9 @@ struct fuse_conn {
 	/** IDR for backing files ids */
 	struct idr backing_files_map;
 #endif
+
+	/** Only used if the connection enforces request timeouts */
+	struct fuse_timeout timeout;
 };
 
 /*
@@ -1181,6 +1199,9 @@ void fuse_request_end(struct fuse_req *req);
 void fuse_abort_conn(struct fuse_conn *fc);
 void fuse_wait_aborted(struct fuse_conn *fc);
 
+/* Check if any requests timed out */
+void fuse_check_timeout(struct timer_list *timer);
+
 /**
  * Invalidate inode attributes
  */
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index f1779ff3f8d1..ee006f09cd04 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -735,6 +735,7 @@ enum {
 	OPT_ALLOW_OTHER,
 	OPT_MAX_READ,
 	OPT_BLKSIZE,
+	OPT_REQUEST_TIMEOUT,
 	OPT_ERR
 };
 
@@ -749,6 +750,7 @@ static const struct fs_parameter_spec fuse_fs_parameters[] = {
 	fsparam_u32	("max_read",		OPT_MAX_READ),
 	fsparam_u32	("blksize",		OPT_BLKSIZE),
 	fsparam_string	("subtype",		OPT_SUBTYPE),
+	fsparam_u16	("request_timeout",	OPT_REQUEST_TIMEOUT),
 	{}
 };
 
@@ -844,6 +846,10 @@ static int fuse_parse_param(struct fs_context *fsc, struct fs_parameter *param)
 		ctx->blksize = result.uint_32;
 		break;
 
+	case OPT_REQUEST_TIMEOUT:
+		ctx->req_timeout = result.uint_16;
+		break;
+
 	default:
 		return -EINVAL;
 	}
@@ -973,6 +979,8 @@ void fuse_conn_put(struct fuse_conn *fc)
 
 		if (IS_ENABLED(CONFIG_FUSE_DAX))
 			fuse_dax_conn_free(fc);
+		if (fc->timeout.req_timeout)
+			timer_shutdown_sync(&fc->timeout.timer);
 		if (fiq->ops->release)
 			fiq->ops->release(fiq);
 		put_pid_ns(fc->pid_ns);
@@ -1691,6 +1699,18 @@ int fuse_init_fs_context_submount(struct fs_context *fsc)
 }
 EXPORT_SYMBOL_GPL(fuse_init_fs_context_submount);
 
+static void fuse_init_fc_timeout(struct fuse_conn *fc, struct fuse_fs_context *ctx)
+{
+	if (ctx->req_timeout) {
+		if (check_mul_overflow(ctx->req_timeout * 60, HZ, &fc->timeout.req_timeout))
+			fc->timeout.req_timeout = ULONG_MAX;
+		timer_setup(&fc->timeout.timer, fuse_check_timeout, 0);
+		mod_timer(&fc->timeout.timer, jiffies + FUSE_TIMEOUT_TIMER_FREQ);
+	} else {
+		fc->timeout.req_timeout = 0;
+	}
+}
+
 int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
 {
 	struct fuse_dev *fud = NULL;
@@ -1753,6 +1773,7 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
 	fc->destroy = ctx->destroy;
 	fc->no_control = ctx->no_control;
 	fc->no_force_umount = ctx->no_force_umount;
+	fuse_init_fc_timeout(fc, ctx);
 
 	err = -ENOMEM;
 	root = fuse_get_root_inode(sb, ctx->rootmode);
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH RESEND v9 3/3] fuse: add default_request_timeout and max_request_timeout sysctls
  2024-11-14 19:13 [PATCH RESEND v9 0/3] fuse: add kernel-enforced request timeout option Joanne Koong
  2024-11-14 19:13 ` [PATCH RESEND v9 1/3] fs_parser: add fsparam_u16 helper Joanne Koong
  2024-11-14 19:13 ` [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests Joanne Koong
@ 2024-11-14 19:13 ` Joanne Koong
  2024-12-06  0:43   ` Jeff Layton
  2024-11-21 22:02 ` [PATCH RESEND v9 0/3] fuse: add kernel-enforced request timeout option Josef Bacik
  3 siblings, 1 reply; 39+ messages in thread
From: Joanne Koong @ 2024-11-14 19:13 UTC (permalink / raw)
  To: miklos, linux-fsdevel
  Cc: josef, bernd.schubert, jefflexu, laoar.shao, kernel-team,
	Bernd Schubert

Introduce two new sysctls, "default_request_timeout" and
"max_request_timeout". These control how long (in minutes) a server can
take to reply to a request. If the server does not reply by the timeout,
then the connection will be aborted.

"default_request_timeout" sets the default timeout if no timeout is
specified by the fuse server on mount. 0 (default) indicates no default
timeout should be enforced. If the server did specify a timeout, then
default_request_timeout will be ignored.

"max_request_timeout" sets the max amount of time the server may take to
reply to a request. 0 (default) indicates no maximum timeout. If
max_request_timeout is set and the fuse server attempts to set a
timeout greater than max_request_timeout, the system will use
max_request_timeout as the timeout. Similarly, if default_request_timeout
is greater than max_request_timeout, the system will use
max_request_timeout as the timeout. If the server does not request a
timeout and default_request_timeout is set to 0 but max_request_timeout
is set, then the timeout will be max_request_timeout.

Please note that these timeouts are not 100% precise. The request may
take an extra FUSE_TIMEOUT_TIMER_FREQ seconds beyond the set max timeout
due to how it's internally implemented.

$ sysctl -a | grep fuse.default_request_timeout
fs.fuse.default_request_timeout = 0

$ echo 65536 | sudo tee /proc/sys/fs/fuse/default_request_timeout
tee: /proc/sys/fs/fuse/default_request_timeout: Invalid argument

$ echo 65535 | sudo tee /proc/sys/fs/fuse/default_request_timeout
65535

$ sysctl -a | grep fuse.default_request_timeout
fs.fuse.default_request_timeout = 65535

$ echo 0 | sudo tee /proc/sys/fs/fuse/default_request_timeout
0

$ sysctl -a | grep fuse.default_request_timeout
fs.fuse.default_request_timeout = 0

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
---
 Documentation/admin-guide/sysctl/fs.rst | 27 +++++++++++++++++++++++++
 fs/fuse/fuse_i.h                        | 10 +++++++++
 fs/fuse/inode.c                         | 16 +++++++++++++--
 fs/fuse/sysctl.c                        | 20 ++++++++++++++++++
 4 files changed, 71 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/sysctl/fs.rst b/Documentation/admin-guide/sysctl/fs.rst
index fa25d7e718b3..790a34291467 100644
--- a/Documentation/admin-guide/sysctl/fs.rst
+++ b/Documentation/admin-guide/sysctl/fs.rst
@@ -342,3 +342,30 @@ filesystems:
 ``/proc/sys/fs/fuse/max_pages_limit`` is a read/write file for
 setting/getting the maximum number of pages that can be used for servicing
 requests in FUSE.
+
+``/proc/sys/fs/fuse/default_request_timeout`` is a read/write file for
+setting/getting the default timeout (in minutes) for a fuse server to
+reply to a kernel-issued request in the event where the server did not
+specify a timeout at mount. If the server set a timeout,
+then default_request_timeout will be ignored.  The default
+"default_request_timeout" is set to 0. 0 indicates a no-op (eg
+requests will not have a default request timeout set if no timeout was
+specified by the server).
+
+``/proc/sys/fs/fuse/max_request_timeout`` is a read/write file for
+setting/getting the maximum timeout (in minutes) for a fuse server to
+reply to a kernel-issued request. A value greater than 0 automatically opts
+the server into a timeout that will be at most "max_request_timeout", even if
+the server did not specify a timeout and default_request_timeout is set to 0.
+If max_request_timeout is greater than 0 and the server set a timeout greater
+than max_request_timeout or default_request_timeout is set to a value greater
+than max_request_timeout, the system will use max_request_timeout as the
+timeout. 0 indicates a no-op (eg requests will not have an upper bound on the
+timeout and if the server did not request a timeout and default_request_timeout
+was not set, there will be no timeout).
+
+Please note that for the timeout options, if the server does not respond to
+the request by the time the timeout elapses, then the connection to the fuse
+server will be aborted. Please also note that the timeouts are not 100%
+precise (eg you may set 10 minutes but the timeout may kick in after 11
+minutes).
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 9092201c4e0b..e82ddff8d752 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -46,6 +46,16 @@
 
 /** Maximum of max_pages received in init_out */
 extern unsigned int fuse_max_pages_limit;
+/*
+ * Default timeout (in minutes) for the server to reply to a request
+ * before the connection is aborted, if no timeout was specified on mount.
+ */
+extern unsigned int fuse_default_req_timeout;
+/*
+ * Max timeout (in minutes) for the server to reply to a request before
+ * the connection is aborted.
+ */
+extern unsigned int fuse_max_req_timeout;
 
 /** List of active connections */
 extern struct list_head fuse_conn_list;
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index ee006f09cd04..1e7cc6509e42 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -36,6 +36,9 @@ DEFINE_MUTEX(fuse_mutex);
 static int set_global_limit(const char *val, const struct kernel_param *kp);
 
 unsigned int fuse_max_pages_limit = 256;
+/* default is no timeout */
+unsigned int fuse_default_req_timeout = 0;
+unsigned int fuse_max_req_timeout = 0;
 
 unsigned max_user_bgreq;
 module_param_call(max_user_bgreq, set_global_limit, param_get_uint,
@@ -1701,8 +1704,17 @@ EXPORT_SYMBOL_GPL(fuse_init_fs_context_submount);
 
 static void fuse_init_fc_timeout(struct fuse_conn *fc, struct fuse_fs_context *ctx)
 {
-	if (ctx->req_timeout) {
-		if (check_mul_overflow(ctx->req_timeout * 60, HZ, &fc->timeout.req_timeout))
+	unsigned int timeout = ctx->req_timeout ?: fuse_default_req_timeout;
+
+	if (fuse_max_req_timeout) {
+		if (!timeout)
+			timeout = fuse_max_req_timeout;
+		else
+			timeout = min(timeout, fuse_max_req_timeout);
+	}
+
+	if (timeout) {
+		if (check_mul_overflow(timeout * 60, HZ, &fc->timeout.req_timeout))
 			fc->timeout.req_timeout = ULONG_MAX;
 		timer_setup(&fc->timeout.timer, fuse_check_timeout, 0);
 		mod_timer(&fc->timeout.timer, jiffies + FUSE_TIMEOUT_TIMER_FREQ);
diff --git a/fs/fuse/sysctl.c b/fs/fuse/sysctl.c
index b272bb333005..6a9094e17950 100644
--- a/fs/fuse/sysctl.c
+++ b/fs/fuse/sysctl.c
@@ -13,6 +13,8 @@ static struct ctl_table_header *fuse_table_header;
 /* Bound by fuse_init_out max_pages, which is a u16 */
 static unsigned int sysctl_fuse_max_pages_limit = 65535;
 
+static unsigned int sysctl_fuse_max_req_timeout_limit = U16_MAX;
+
 static struct ctl_table fuse_sysctl_table[] = {
 	{
 		.procname	= "max_pages_limit",
@@ -23,6 +25,24 @@ static struct ctl_table fuse_sysctl_table[] = {
 		.extra1		= SYSCTL_ONE,
 		.extra2		= &sysctl_fuse_max_pages_limit,
 	},
+	{
+		.procname	= "default_request_timeout",
+		.data		= &fuse_default_req_timeout,
+		.maxlen		= sizeof(fuse_default_req_timeout),
+		.mode		= 0644,
+		.proc_handler	= proc_douintvec_minmax,
+		.extra1		= SYSCTL_ZERO,
+		.extra2		= &sysctl_fuse_max_req_timeout_limit,
+	},
+	{
+		.procname	= "max_request_timeout",
+		.data		= &fuse_max_req_timeout,
+		.maxlen		= sizeof(fuse_max_req_timeout),
+		.mode		= 0644,
+		.proc_handler	= proc_douintvec_minmax,
+		.extra1		= SYSCTL_ZERO,
+		.extra2		= &sysctl_fuse_max_req_timeout_limit,
+	},
 };
 
 int fuse_sysctl_register(void)
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH RESEND v9 0/3] fuse: add kernel-enforced request timeout option
  2024-11-14 19:13 [PATCH RESEND v9 0/3] fuse: add kernel-enforced request timeout option Joanne Koong
                   ` (2 preceding siblings ...)
  2024-11-14 19:13 ` [PATCH RESEND v9 3/3] fuse: add default_request_timeout and max_request_timeout sysctls Joanne Koong
@ 2024-11-21 22:02 ` Josef Bacik
  3 siblings, 0 replies; 39+ messages in thread
From: Josef Bacik @ 2024-11-21 22:02 UTC (permalink / raw)
  To: Joanne Koong
  Cc: miklos, linux-fsdevel, bernd.schubert, jefflexu, laoar.shao,
	kernel-team

On Thu, Nov 14, 2024 at 11:13:29AM -0800, Joanne Koong wrote:
> There are situations where fuse servers can become unresponsive or
> stuck, for example if the server is in a deadlock. Currently, there's
> no good way to detect if a server is stuck and needs to be killed
> manually.
> 
> This patchset adds a timeout option where if the server does not reply to a
> request by the time the timeout elapses, the connection will be aborted.
> This patchset also adds two dynamically configurable fuse sysctls
> "default_request_timeout" and "max_request_timeout" for controlling/enforcing
> timeout behavior system-wide.
> 
> Existing systems running fuse servers will not be affected unless they
> explicitly opt into the timeout.
> 
> v8:
> https://lore.kernel.org/linux-fsdevel/20241011191320.91592-1-joannelkoong@gmail.com/
> Changes from v8 -> v9:
> * Fix comment for u16 fs_parse_result, ULONG_MAX instead of U32_MAX, fix
>   spacing (Bernd)
> 
> v7:
> https://lore.kernel.org/linux-fsdevel/20241007184258.2837492-1-joannelkoong@gmail.com/
> Changes from v7 -> v8:
> * Use existing lists for checking expirations (Miklos)
> 
> v6:
> https://lore.kernel.org/linux-fsdevel/20240830162649.3849586-1-joannelkoong@gmail.com/
> Changes from v6 -> v7:
> - Make timer per-connection instead of per-request (Miklos)
> - Make default granularity of time minutes instead of seconds
> - Removed the reviewed-bys since the interface of this has changed (now
>   minutes, instead of seconds)
> 
> v5:
> https://lore.kernel.org/linux-fsdevel/20240826203234.4079338-1-joannelkoong@gmail.com/
> Changes from v5 -> v6:
> - Gate sysctl.o behind CONFIG_SYSCTL in makefile (kernel test robot)
> - Reword/clarify last sentence in cover letter (Miklos)
> 
> v4:
> https://lore.kernel.org/linux-fsdevel/20240813232241.2369855-1-joannelkoong@gmail.com/
> Changes from v4 -> v5:
> - Change timeout behavior from aborting request to aborting connection
>   (Miklos)
> - Clarify wording for sysctl documentation (Jingbo)
> 
> v3:
> https://lore.kernel.org/linux-fsdevel/20240808190110.3188039-1-joannelkoong@gmail.com/
> Changes from v3 -> v4:
> - Fix wording on some comments to make it more clear
> - Use simpler logic for timer (eg remove extra if checks, use mod timer API)
>   (Josef)
> - Sanity-check should be on FR_FINISHING not FR_FINISHED (Jingbo)
> - Fix comment for "processing queue", add req->fpq = NULL safeguard  (Bernd)
> 
> v2:
> https://lore.kernel.org/linux-fsdevel/20240730002348.3431931-1-joannelkoong@gmail.com/
> Changes from v2 -> v3:
> - Disarm / rearm timer in dev_do_read to handle race conditions (Bernrd)
> - Disarm timer in error handling for fatal interrupt (Yafang)
> - Clean up do_fuse_request_end (Jingbo)
> - Add timer for notify retrieve requests 
> - Fix kernel test robot errors for #define no-op functions
> 
> v1:
> https://lore.kernel.org/linux-fsdevel/20240717213458.1613347-1-joannelkoong@gmail.com/
> Changes from v1 -> v2:
> - Add timeout for background requests
> - Handle resend race condition
> - Add sysctls
> 
> Joanne Koong (3):
>   fs_parser: add fsparam_u16 helper
>   fuse: add optional kernel-enforced timeout for requests
>   fuse: add default_request_timeout and max_request_timeout sysctls
> 

These look great Joanne, you can add

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

to the series.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
  2024-11-14 19:13 ` [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests Joanne Koong
@ 2024-11-28 10:44   ` Sergey Senozhatsky
  2024-11-28 11:00     ` Bernd Schubert
  2024-12-03 11:01   ` Sergey Senozhatsky
                     ` (6 subsequent siblings)
  7 siblings, 1 reply; 39+ messages in thread
From: Sergey Senozhatsky @ 2024-11-28 10:44 UTC (permalink / raw)
  To: Joanne Koong
  Cc: miklos, linux-fsdevel, josef, bernd.schubert, jefflexu,
	laoar.shao, kernel-team, Bernd Schubert

Hi Joanne,

On (24/11/14 11:13), Joanne Koong wrote:
> There are situations where fuse servers can become unresponsive or
> stuck, for example if the server is deadlocked. Currently, there's no
> good way to detect if a server is stuck and needs to be killed manually.
>
> This commit adds an option for enforcing a timeout (in minutes) for
> requests where if the timeout elapses without the server responding to
> the request, the connection will be automatically aborted.

Does it make sense to configure timeout in seconds?  hung-task watchdog
operates in seconds and can be set to anything, e.g. 45 seconds, so it
panic the system before fuse timeout has a chance to trigger.

Another question is: this will terminate the connection.  Does it
make sense to run timeout per request and just "abort" individual
requests?  What I'm currently playing with here on our side is
something like this:

----

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 8573d79ef29c..82e071cecafd 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -21,6 +21,7 @@
 #include <linux/swap.h>
 #include <linux/splice.h>
 #include <linux/sched.h>
+#include <linux/sched/sysctl.h>
 
 MODULE_ALIAS_MISCDEV(FUSE_MINOR);
 MODULE_ALIAS("devname:fuse");
@@ -368,11 +369,24 @@ static void request_wait_answer(struct fuse_req *req)
        int err;
 
        if (!fc->no_interrupt) {
-               /* Any signal may interrupt this */
-               err = wait_event_interruptible(req->waitq,
+               /* We can use CONFIG_DEFAULT_HUNG_TASK_TIMEOUT here */
+               unsigned long hang_check = sysctl_hung_task_timeout_secs;
+
+               if (hang_check) {
+                       /* Any signal or timeout may interrupt this */
+                       err = wait_event_interruptible_timeout(req->waitq,
+                                       test_bit(FR_FINISHED, &req->flags),
+                                       hang_check * (HZ / 2));
+                       if (err > 0)
+                               return;
+               } else {
+                       /* Any signal may interrupt this */
+                       err = wait_event_interruptible(req->waitq,
                                        test_bit(FR_FINISHED, &req->flags));
-               if (!err)
-                       return;
+
+                       if (!err)
+                               return;
+               }
 
                set_bit(FR_INTERRUPTED, &req->flags);
                /* matches barrier in fuse_dev_do_read() */

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
  2024-11-28 10:44   ` Sergey Senozhatsky
@ 2024-11-28 11:00     ` Bernd Schubert
  2024-11-28 11:09       ` Sergey Senozhatsky
  0 siblings, 1 reply; 39+ messages in thread
From: Bernd Schubert @ 2024-11-28 11:00 UTC (permalink / raw)
  To: Sergey Senozhatsky, Joanne Koong
  Cc: miklos, linux-fsdevel, josef, jefflexu, laoar.shao, kernel-team,
	Bernd Schubert



On 11/28/24 11:44, Sergey Senozhatsky wrote:
> Hi Joanne,
> 
> On (24/11/14 11:13), Joanne Koong wrote:
>> There are situations where fuse servers can become unresponsive or
>> stuck, for example if the server is deadlocked. Currently, there's no
>> good way to detect if a server is stuck and needs to be killed manually.
>>
>> This commit adds an option for enforcing a timeout (in minutes) for
>> requests where if the timeout elapses without the server responding to
>> the request, the connection will be automatically aborted.
> 
> Does it make sense to configure timeout in seconds?  hung-task watchdog
> operates in seconds and can be set to anything, e.g. 45 seconds, so it
> panic the system before fuse timeout has a chance to trigger.
> 
> Another question is: this will terminate the connection.  Does it
> make sense to run timeout per request and just "abort" individual
> requests?  What I'm currently playing with here on our side is
> something like this:

Miklos had asked for to abort the connection in v4
https://lore.kernel.org/all/CAJfpegsiRNnJx7OAoH58XRq3zujrcXx94S2JACFdgJJ_b8FdHw@mail.gmail.com/raw


Thanks,
Bernd


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
  2024-11-28 11:00     ` Bernd Schubert
@ 2024-11-28 11:09       ` Sergey Senozhatsky
  2024-11-28 11:23         ` Bernd Schubert
  0 siblings, 1 reply; 39+ messages in thread
From: Sergey Senozhatsky @ 2024-11-28 11:09 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Sergey Senozhatsky, Joanne Koong, miklos, linux-fsdevel, josef,
	jefflexu, laoar.shao, kernel-team, Bernd Schubert

On (24/11/28 12:00), Bernd Schubert wrote:
> On 11/28/24 11:44, Sergey Senozhatsky wrote:
> > Hi Joanne,
> > 
> > On (24/11/14 11:13), Joanne Koong wrote:
> >> There are situations where fuse servers can become unresponsive or
> >> stuck, for example if the server is deadlocked. Currently, there's no
> >> good way to detect if a server is stuck and needs to be killed manually.
> >>
> >> This commit adds an option for enforcing a timeout (in minutes) for
> >> requests where if the timeout elapses without the server responding to
> >> the request, the connection will be automatically aborted.
> > 
> > Does it make sense to configure timeout in seconds?  hung-task watchdog
> > operates in seconds and can be set to anything, e.g. 45 seconds, so it
> > panic the system before fuse timeout has a chance to trigger.
> > 
> > Another question is: this will terminate the connection.  Does it
> > make sense to run timeout per request and just "abort" individual
> > requests?  What I'm currently playing with here on our side is
> > something like this:

Thanks for the pointers again, Bernd.

> Miklos had asked for to abort the connection in v4
> https://lore.kernel.org/all/CAJfpegsiRNnJx7OAoH58XRq3zujrcXx94S2JACFdgJJ_b8FdHw@mail.gmail.com/raw

OK, sounds reasonable. I'll try to give the series some testing in the
coming days.

// I still would probably prefer "seconds" timeout granularity.
// Unless this also has been discussed already and Bernd has a link ;)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
  2024-11-28 11:09       ` Sergey Senozhatsky
@ 2024-11-28 11:23         ` Bernd Schubert
  2024-11-28 11:54           ` Sergey Senozhatsky
  0 siblings, 1 reply; 39+ messages in thread
From: Bernd Schubert @ 2024-11-28 11:23 UTC (permalink / raw)
  To: Sergey Senozhatsky, Bernd Schubert
  Cc: Joanne Koong, miklos@szeredi.hu, linux-fsdevel@vger.kernel.org,
	josef@toxicpanda.com, jefflexu@linux.alibaba.com,
	laoar.shao@gmail.com, kernel-team@meta.com

On 11/28/24 12:09, Sergey Senozhatsky wrote:
> [You don't often get email from senozhatsky@chromium.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> 
> On (24/11/28 12:00), Bernd Schubert wrote:
>> On 11/28/24 11:44, Sergey Senozhatsky wrote:
>>> Hi Joanne,
>>>
>>> On (24/11/14 11:13), Joanne Koong wrote:
>>>> There are situations where fuse servers can become unresponsive or
>>>> stuck, for example if the server is deadlocked. Currently, there's no
>>>> good way to detect if a server is stuck and needs to be killed manually.
>>>>
>>>> This commit adds an option for enforcing a timeout (in minutes) for
>>>> requests where if the timeout elapses without the server responding to
>>>> the request, the connection will be automatically aborted.
>>>
>>> Does it make sense to configure timeout in seconds?  hung-task watchdog
>>> operates in seconds and can be set to anything, e.g. 45 seconds, so it
>>> panic the system before fuse timeout has a chance to trigger.
>>>
>>> Another question is: this will terminate the connection.  Does it
>>> make sense to run timeout per request and just "abort" individual
>>> requests?  What I'm currently playing with here on our side is
>>> something like this:
> 
> Thanks for the pointers again, Bernd.
> 
>> Miklos had asked for to abort the connection in v4
>> https://lore.kernel.org/all/CAJfpegsiRNnJx7OAoH58XRq3zujrcXx94S2JACFdgJJ_b8FdHw@mail.gmail.com/raw
> 
> OK, sounds reasonable. I'll try to give the series some testing in the
> coming days.
> 
> // I still would probably prefer "seconds" timeout granularity.
> // Unless this also has been discussed already and Bernd has a link ;)


The issue is that is currently iterating through 256 hash lists + 
pending + bg.

https://lore.kernel.org/all/CAJnrk1b7bfAWWq_pFP=4XH3ddc_9GtAM2mE7EgWnx2Od+UUUjQ@mail.gmail.com/raw


Personally I would prefer a second list to avoid the check spike and latency
https://lore.kernel.org/linux-fsdevel/9ba4eaf4-b9f0-483f-90e5-9512aded419e@fastmail.fm/raw

What is your opinion about that? I guess android and chromium have an
interest low latencies and avoiding cpu spikes?


Thanks,
Bernd



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
  2024-11-28 11:23         ` Bernd Schubert
@ 2024-11-28 11:54           ` Sergey Senozhatsky
  2024-12-02  9:45             ` Tomasz Figa
  0 siblings, 1 reply; 39+ messages in thread
From: Sergey Senozhatsky @ 2024-11-28 11:54 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Sergey Senozhatsky, Bernd Schubert, Joanne Koong,
	miklos@szeredi.hu, linux-fsdevel@vger.kernel.org,
	josef@toxicpanda.com, jefflexu@linux.alibaba.com,
	laoar.shao@gmail.com, kernel-team@meta.com, Tomasz Figa

Cc-ing Tomasz

On (24/11/28 11:23), Bernd Schubert wrote:
> > Thanks for the pointers again, Bernd.
> > 
> >> Miklos had asked for to abort the connection in v4
> >> https://lore.kernel.org/all/CAJfpegsiRNnJx7OAoH58XRq3zujrcXx94S2JACFdgJJ_b8FdHw@mail.gmail.com/raw
> > 
> > OK, sounds reasonable. I'll try to give the series some testing in the
> > coming days.
> > 
> > // I still would probably prefer "seconds" timeout granularity.
> > // Unless this also has been discussed already and Bernd has a link ;)
> 
> 
> The issue is that is currently iterating through 256 hash lists + 
> pending + bg.
> 
> https://lore.kernel.org/all/CAJnrk1b7bfAWWq_pFP=4XH3ddc_9GtAM2mE7EgWnx2Od+UUUjQ@mail.gmail.com/raw

Oh, I see.

> Personally I would prefer a second list to avoid the check spike and latency
> https://lore.kernel.org/linux-fsdevel/9ba4eaf4-b9f0-483f-90e5-9512aded419e@fastmail.fm/raw

That's good to know.  I like the idea of less CPU usage in general,
our devices a battery powered so everything counts, to some extent.

> What is your opinion about that? I guess android and chromium have an
> interest low latencies and avoiding cpu spikes?

Good question.

Can't speak for android, in chromeos we probably will keep it at 1 minute,
but this is because our DEFAULT_HUNG_TASK_TIMEOUT is larger than that (we
use default value of 120 sec). There are setups that might use lower
values, or even re-define default value, e.g.:

arch/arc/configs/axs101_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
arch/arc/configs/axs103_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
arch/arc/configs/axs103_smp_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
arch/arc/configs/hsdk_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
arch/arc/configs/vdk_hs38_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
arch/arc/configs/vdk_hs38_smp_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
arch/powerpc/configs/mvme5100_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=20

In those cases 1 minute fuse timeout will overshot HUNG_TASK_TIMEOUT
and then the question is whether HUNG_TASK_PANIC is set.

On the other hand, setups that set much lower timeout than
DEFAULT_HUNG_TASK_TIMEOUT=120 will have extra CPU activities regardless,
just because watchdogs will run more often.

Tomasz, any opinions?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
  2024-11-28 11:54           ` Sergey Senozhatsky
@ 2024-12-02  9:45             ` Tomasz Figa
  2024-12-02 14:43               ` Bernd Schubert
  0 siblings, 1 reply; 39+ messages in thread
From: Tomasz Figa @ 2024-12-02  9:45 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Bernd Schubert, Bernd Schubert, Joanne Koong, miklos@szeredi.hu,
	linux-fsdevel@vger.kernel.org, josef@toxicpanda.com,
	jefflexu@linux.alibaba.com, laoar.shao@gmail.com,
	kernel-team@meta.com

Hi everyone,

On Thu, Nov 28, 2024 at 8:55 PM Sergey Senozhatsky
<senozhatsky@chromium.org> wrote:
>
> Cc-ing Tomasz
>
> On (24/11/28 11:23), Bernd Schubert wrote:
> > > Thanks for the pointers again, Bernd.
> > >
> > >> Miklos had asked for to abort the connection in v4
> > >> https://lore.kernel.org/all/CAJfpegsiRNnJx7OAoH58XRq3zujrcXx94S2JACFdgJJ_b8FdHw@mail.gmail.com/raw
> > >
> > > OK, sounds reasonable. I'll try to give the series some testing in the
> > > coming days.
> > >
> > > // I still would probably prefer "seconds" timeout granularity.
> > > // Unless this also has been discussed already and Bernd has a link ;)
> >
> >
> > The issue is that is currently iterating through 256 hash lists +
> > pending + bg.
> >
> > https://lore.kernel.org/all/CAJnrk1b7bfAWWq_pFP=4XH3ddc_9GtAM2mE7EgWnx2Od+UUUjQ@mail.gmail.com/raw
>
> Oh, I see.
>
> > Personally I would prefer a second list to avoid the check spike and latency
> > https://lore.kernel.org/linux-fsdevel/9ba4eaf4-b9f0-483f-90e5-9512aded419e@fastmail.fm/raw
>
> That's good to know.  I like the idea of less CPU usage in general,
> our devices a battery powered so everything counts, to some extent.
>
> > What is your opinion about that? I guess android and chromium have an
> > interest low latencies and avoiding cpu spikes?
>
> Good question.
>
> Can't speak for android, in chromeos we probably will keep it at 1 minute,
> but this is because our DEFAULT_HUNG_TASK_TIMEOUT is larger than that (we
> use default value of 120 sec). There are setups that might use lower
> values, or even re-define default value, e.g.:
>
> arch/arc/configs/axs101_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> arch/arc/configs/axs103_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> arch/arc/configs/axs103_smp_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> arch/arc/configs/hsdk_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> arch/arc/configs/vdk_hs38_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> arch/arc/configs/vdk_hs38_smp_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> arch/powerpc/configs/mvme5100_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=20
>
> In those cases 1 minute fuse timeout will overshot HUNG_TASK_TIMEOUT
> and then the question is whether HUNG_TASK_PANIC is set.
>
> On the other hand, setups that set much lower timeout than
> DEFAULT_HUNG_TASK_TIMEOUT=120 will have extra CPU activities regardless,
> just because watchdogs will run more often.
>
> Tomasz, any opinions?

First of all, thanks everyone for looking into this.

How about keeping a list of requests in the FIFO order (in other
words: first entry is the first to timeout) and whenever the first
entry is being removed from the list (aka the request actually
completes), re-arming the timer to the timeout of the next request in
the list? This way we don't really have any timer firing unless there
is really a request that timed out.

(In fact, we could optimize it even further by opportunistically
scheduling a timer slightly later and opportunistically handling timed
out requests when other requests are being completed, but this would
be optimizing for the slow path, so probably an overkill.)

As for the length of the request timeout vs the hung task watchdog
timeout, my opinion is that we should make sure that the hung task
watchdog doesn't hit in any case, simply because a misbehaving
userspace process must not be able to panic the kernel. In the
blk-core, the blk_io_schedule() function [1] uses
sysctl_hung_task_timeout_secs to determine the maximum length of a
single uninterruptible sleep. I suppose we could use the same
calculation to obtain our timeout number. What does everyone think?

[1] https://elixir.bootlin.com/linux/v6.12.1/source/block/blk-core.c#L1232

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
  2024-12-02  9:45             ` Tomasz Figa
@ 2024-12-02 14:43               ` Bernd Schubert
  2024-12-02 19:29                 ` Joanne Koong
  0 siblings, 1 reply; 39+ messages in thread
From: Bernd Schubert @ 2024-12-02 14:43 UTC (permalink / raw)
  To: Tomasz Figa, Sergey Senozhatsky
  Cc: Bernd Schubert, Joanne Koong, miklos@szeredi.hu,
	linux-fsdevel@vger.kernel.org, josef@toxicpanda.com,
	jefflexu@linux.alibaba.com, laoar.shao@gmail.com,
	kernel-team@meta.com



On 12/2/24 10:45, Tomasz Figa wrote:
> Hi everyone,
> 
> On Thu, Nov 28, 2024 at 8:55 PM Sergey Senozhatsky
> <senozhatsky@chromium.org> wrote:
>>
>> Cc-ing Tomasz
>>
>> On (24/11/28 11:23), Bernd Schubert wrote:
>>>> Thanks for the pointers again, Bernd.
>>>>
>>>>> Miklos had asked for to abort the connection in v4
>>>>> https://lore.kernel.org/all/CAJfpegsiRNnJx7OAoH58XRq3zujrcXx94S2JACFdgJJ_b8FdHw@mail.gmail.com/raw
>>>>
>>>> OK, sounds reasonable. I'll try to give the series some testing in the
>>>> coming days.
>>>>
>>>> // I still would probably prefer "seconds" timeout granularity.
>>>> // Unless this also has been discussed already and Bernd has a link ;)
>>>
>>>
>>> The issue is that is currently iterating through 256 hash lists +
>>> pending + bg.
>>>
>>> https://lore.kernel.org/all/CAJnrk1b7bfAWWq_pFP=4XH3ddc_9GtAM2mE7EgWnx2Od+UUUjQ@mail.gmail.com/raw
>>
>> Oh, I see.
>>
>>> Personally I would prefer a second list to avoid the check spike and latency
>>> https://lore.kernel.org/linux-fsdevel/9ba4eaf4-b9f0-483f-90e5-9512aded419e@fastmail.fm/raw
>>
>> That's good to know.  I like the idea of less CPU usage in general,
>> our devices a battery powered so everything counts, to some extent.
>>
>>> What is your opinion about that? I guess android and chromium have an
>>> interest low latencies and avoiding cpu spikes?
>>
>> Good question.
>>
>> Can't speak for android, in chromeos we probably will keep it at 1 minute,
>> but this is because our DEFAULT_HUNG_TASK_TIMEOUT is larger than that (we
>> use default value of 120 sec). There are setups that might use lower
>> values, or even re-define default value, e.g.:
>>
>> arch/arc/configs/axs101_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
>> arch/arc/configs/axs103_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
>> arch/arc/configs/axs103_smp_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
>> arch/arc/configs/hsdk_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
>> arch/arc/configs/vdk_hs38_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
>> arch/arc/configs/vdk_hs38_smp_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
>> arch/powerpc/configs/mvme5100_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=20
>>
>> In those cases 1 minute fuse timeout will overshot HUNG_TASK_TIMEOUT
>> and then the question is whether HUNG_TASK_PANIC is set.
>>
>> On the other hand, setups that set much lower timeout than
>> DEFAULT_HUNG_TASK_TIMEOUT=120 will have extra CPU activities regardless,
>> just because watchdogs will run more often.
>>
>> Tomasz, any opinions?
> 
> First of all, thanks everyone for looking into this.
> 
> How about keeping a list of requests in the FIFO order (in other
> words: first entry is the first to timeout) and whenever the first
> entry is being removed from the list (aka the request actually
> completes), re-arming the timer to the timeout of the next request in
> the list? This way we don't really have any timer firing unless there
> is really a request that timed out.

Requests are in FIFO order on the list and only head is checked. 
There are 256 hash lists per fuse device for requests currently
in user space, though.

> 
> (In fact, we could optimize it even further by opportunistically
> scheduling a timer slightly later and opportunistically handling timed
> out requests when other requests are being completed, but this would
> be optimizing for the slow path, so probably an overkill.)
> 
> As for the length of the request timeout vs the hung task watchdog
> timeout, my opinion is that we should make sure that the hung task
> watchdog doesn't hit in any case, simply because a misbehaving
> userspace process must not be able to panic the kernel. In the
> blk-core, the blk_io_schedule() function [1] uses
> sysctl_hung_task_timeout_secs to determine the maximum length of a
> single uninterruptible sleep. I suppose we could use the same
> calculation to obtain our timeout number. What does everyone think?
> 
> [1] https://elixir.bootlin.com/linux/v6.12.1/source/block/blk-core.c#L1232

I think that is a good idea.


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
  2024-12-02 14:43               ` Bernd Schubert
@ 2024-12-02 19:29                 ` Joanne Koong
  2024-12-03  4:31                   ` Sergey Senozhatsky
  2024-12-04 14:40                   ` Tomasz Figa
  0 siblings, 2 replies; 39+ messages in thread
From: Joanne Koong @ 2024-12-02 19:29 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Tomasz Figa, Sergey Senozhatsky, Bernd Schubert,
	miklos@szeredi.hu, linux-fsdevel@vger.kernel.org,
	josef@toxicpanda.com, jefflexu@linux.alibaba.com,
	laoar.shao@gmail.com, kernel-team@meta.com

On Mon, Dec 2, 2024 at 6:43 AM Bernd Schubert
<bernd.schubert@fastmail.fm> wrote:
>
> On 12/2/24 10:45, Tomasz Figa wrote:
> > Hi everyone,
> >
> > On Thu, Nov 28, 2024 at 8:55 PM Sergey Senozhatsky
> > <senozhatsky@chromium.org> wrote:
> >>
> >> Cc-ing Tomasz
> >>
> >> On (24/11/28 11:23), Bernd Schubert wrote:
> >>>> Thanks for the pointers again, Bernd.
> >>>>
> >>>>> Miklos had asked for to abort the connection in v4
> >>>>> https://lore.kernel.org/all/CAJfpegsiRNnJx7OAoH58XRq3zujrcXx94S2JACFdgJJ_b8FdHw@mail.gmail.com/raw
> >>>>
> >>>> OK, sounds reasonable. I'll try to give the series some testing in the
> >>>> coming days.
> >>>>
> >>>> // I still would probably prefer "seconds" timeout granularity.
> >>>> // Unless this also has been discussed already and Bernd has a link ;)
> >>>
> >>>
> >>> The issue is that is currently iterating through 256 hash lists +
> >>> pending + bg.
> >>>
> >>> https://lore.kernel.org/all/CAJnrk1b7bfAWWq_pFP=4XH3ddc_9GtAM2mE7EgWnx2Od+UUUjQ@mail.gmail.com/raw
> >>
> >> Oh, I see.
> >>
> >>> Personally I would prefer a second list to avoid the check spike and latency
> >>> https://lore.kernel.org/linux-fsdevel/9ba4eaf4-b9f0-483f-90e5-9512aded419e@fastmail.fm/raw
> >>
> >> That's good to know.  I like the idea of less CPU usage in general,
> >> our devices a battery powered so everything counts, to some extent.
> >>
> >>> What is your opinion about that? I guess android and chromium have an
> >>> interest low latencies and avoiding cpu spikes?
> >>
> >> Good question.
> >>
> >> Can't speak for android, in chromeos we probably will keep it at 1 minute,
> >> but this is because our DEFAULT_HUNG_TASK_TIMEOUT is larger than that (we
> >> use default value of 120 sec). There are setups that might use lower
> >> values, or even re-define default value, e.g.:
> >>
> >> arch/arc/configs/axs101_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> >> arch/arc/configs/axs103_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> >> arch/arc/configs/axs103_smp_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> >> arch/arc/configs/hsdk_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> >> arch/arc/configs/vdk_hs38_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> >> arch/arc/configs/vdk_hs38_smp_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> >> arch/powerpc/configs/mvme5100_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=20
> >>
> >> In those cases 1 minute fuse timeout will overshot HUNG_TASK_TIMEOUT
> >> and then the question is whether HUNG_TASK_PANIC is set.
> >>
> >> On the other hand, setups that set much lower timeout than
> >> DEFAULT_HUNG_TASK_TIMEOUT=120 will have extra CPU activities regardless,
> >> just because watchdogs will run more often.
> >>
> >> Tomasz, any opinions?
> >
> > First of all, thanks everyone for looking into this.

Hi Sergey and Tomasz,

Sorry for the late reply - I was out the last couple of days. Thanks
Bernd for weighing in and answering the questions!

> >
> > How about keeping a list of requests in the FIFO order (in other
> > words: first entry is the first to timeout) and whenever the first
> > entry is being removed from the list (aka the request actually
> > completes), re-arming the timer to the timeout of the next request in
> > the list? This way we don't really have any timer firing unless there
> > is really a request that timed out.

I think the issue with this is that we likely would end up wasting
more cpu cycles. For a busy FUSE server, there could be hundreds
(thousands?) of requests that happen within the span of
FUSE_TIMEOUT_TIMER_FREQ seconds.

While working on the patch, one thing I considered was disarming the
timer in the timeout handler fuse_check_timeout() if no requests are
on the list, in order to accomodate for "quiet periods" (eg if the
FUSE server is inactive for a few minutes or hours) but ultimately
decided against it because of the overhead it'd incur per request (eg
check if the timer is disarmed, would most likely need to grab the
fc->lock as well since timer rearming would need to be synchronized
between background and non-background requests, etc.).

All in all, imo I don't think having the timer trigger every 60
seconds (what FUSE_TIMEOUT_TIMER_FREQ is set to) is too costly.

>
> Requests are in FIFO order on the list and only head is checked.
> There are 256 hash lists per fuse device for requests currently
> in user space, though.
>
> >
> > (In fact, we could optimize it even further by opportunistically
> > scheduling a timer slightly later and opportunistically handling timed
> > out requests when other requests are being completed, but this would
> > be optimizing for the slow path, so probably an overkill.)
> >
> > As for the length of the request timeout vs the hung task watchdog
> > timeout, my opinion is that we should make sure that the hung task
> > watchdog doesn't hit in any case, simply because a misbehaving
> > userspace process must not be able to panic the kernel. In the
> > blk-core, the blk_io_schedule() function [1] uses
> > sysctl_hung_task_timeout_secs to determine the maximum length of a
> > single uninterruptible sleep. I suppose we could use the same
> > calculation to obtain our timeout number. What does everyone think?
> >
> > [1] https://elixir.bootlin.com/linux/v6.12.1/source/block/blk-core.c#L1232
>
> I think that is a good idea.

Btw, just something to note, the fuse request timeout has an upper
margin of error associated with it.

Copying over from the commit message -

"Please note that these timeouts are not 100% precise. The request may
take an extra FUSE_TIMEOUT_TIMER_FREQ seconds beyond the requested max
timeout due to how it's internally implemented."

For example, if a server sets the request timeout config to 10
minutes, the server could be aborted after 11 minutes
(FUSE_TIMEOUT_TIMER_FREQ is set to 60 seconds internally) instead of
10 minutes.


Thanks,
Joanne
>
>
> Thanks,
> Bernd

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
  2024-12-02 19:29                 ` Joanne Koong
@ 2024-12-03  4:31                   ` Sergey Senozhatsky
  2024-12-03  5:11                     ` Joanne Koong
  2024-12-04 14:40                   ` Tomasz Figa
  1 sibling, 1 reply; 39+ messages in thread
From: Sergey Senozhatsky @ 2024-12-03  4:31 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Bernd Schubert, Tomasz Figa, Sergey Senozhatsky, Bernd Schubert,
	miklos@szeredi.hu, linux-fsdevel@vger.kernel.org,
	josef@toxicpanda.com, jefflexu@linux.alibaba.com,
	laoar.shao@gmail.com, kernel-team@meta.com

On (24/12/02 11:29), Joanne Koong wrote:
> > >> In those cases 1 minute fuse timeout will overshot HUNG_TASK_TIMEOUT
> > >> and then the question is whether HUNG_TASK_PANIC is set.
> > >>
> > >> On the other hand, setups that set much lower timeout than
> > >> DEFAULT_HUNG_TASK_TIMEOUT=120 will have extra CPU activities regardless,
> > >> just because watchdogs will run more often.
> > >>
> > >> Tomasz, any opinions?
> > >
> > > First of all, thanks everyone for looking into this.
> 
> Hi Sergey and Tomasz,
> 
> Sorry for the late reply - I was out the last couple of days. Thanks
> Bernd for weighing in and answering the questions!
> 
> > >
> > > How about keeping a list of requests in the FIFO order (in other
> > > words: first entry is the first to timeout) and whenever the first
> > > entry is being removed from the list (aka the request actually
> > > completes), re-arming the timer to the timeout of the next request in
> > > the list? This way we don't really have any timer firing unless there
> > > is really a request that timed out.
> 
> I think the issue with this is that we likely would end up wasting
> more cpu cycles. For a busy FUSE server, there could be hundreds
> (thousands?) of requests that happen within the span of
> FUSE_TIMEOUT_TIMER_FREQ seconds.

So, a silly question - can we not do that maybe?

What I'm thinking about is what if instead of implementing fuse-watchdog
and tracking jiffies per request we'd switch to timeout aware operations
and use what's already in the kernel?  E.g. instead of wait_event() we'd
use wait_event_timeout() and would configure timeout per connection
(also bringing in current hung-task-watchdog timeout value into the
equation), using MAX_SCHEDULE_TIMEOUT as a default (similarly to what
core kernel does).  The first req that timeouts kills its siblings and
the connection.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
  2024-12-03  4:31                   ` Sergey Senozhatsky
@ 2024-12-03  5:11                     ` Joanne Koong
  0 siblings, 0 replies; 39+ messages in thread
From: Joanne Koong @ 2024-12-03  5:11 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Bernd Schubert, Tomasz Figa, Bernd Schubert, miklos@szeredi.hu,
	linux-fsdevel@vger.kernel.org, josef@toxicpanda.com,
	jefflexu@linux.alibaba.com, laoar.shao@gmail.com,
	kernel-team@meta.com

On Mon, Dec 2, 2024 at 8:31 PM Sergey Senozhatsky
<senozhatsky@chromium.org> wrote:
>
> On (24/12/02 11:29), Joanne Koong wrote:
> > > >> In those cases 1 minute fuse timeout will overshot HUNG_TASK_TIMEOUT
> > > >> and then the question is whether HUNG_TASK_PANIC is set.
> > > >>
> > > >> On the other hand, setups that set much lower timeout than
> > > >> DEFAULT_HUNG_TASK_TIMEOUT=120 will have extra CPU activities regardless,
> > > >> just because watchdogs will run more often.
> > > >>
> > > >> Tomasz, any opinions?
> > > >
> > > > First of all, thanks everyone for looking into this.
> >
> > Hi Sergey and Tomasz,
> >
> > Sorry for the late reply - I was out the last couple of days. Thanks
> > Bernd for weighing in and answering the questions!
> >
> > > >
> > > > How about keeping a list of requests in the FIFO order (in other
> > > > words: first entry is the first to timeout) and whenever the first
> > > > entry is being removed from the list (aka the request actually
> > > > completes), re-arming the timer to the timeout of the next request in
> > > > the list? This way we don't really have any timer firing unless there
> > > > is really a request that timed out.
> >
> > I think the issue with this is that we likely would end up wasting
> > more cpu cycles. For a busy FUSE server, there could be hundreds
> > (thousands?) of requests that happen within the span of
> > FUSE_TIMEOUT_TIMER_FREQ seconds.
>
> So, a silly question - can we not do that maybe?
>
> What I'm thinking about is what if instead of implementing fuse-watchdog
> and tracking jiffies per request we'd switch to timeout aware operations
> and use what's already in the kernel?  E.g. instead of wait_event() we'd
> use wait_event_timeout() and would configure timeout per connection
> (also bringing in current hung-task-watchdog timeout value into the
> equation), using MAX_SCHEDULE_TIMEOUT as a default (similarly to what
> core kernel does).  The first req that timeouts kills its siblings and
> the connection.

Using timeout aware operations like wait_event_timeout() associates a
timer per request (see schedule_timeout()) and this approach was tried
in v6 [1] but the overhead of having a timer per request showed about
a 1.5% drop in throughput [1], which is why we ended up pivoting to a
periodic watchdog timer that triggers at set intervals.


Thanks,
Joanne

[1] https://lore.kernel.org/linux-fsdevel/CAJnrk1bdyDq+4jo29ZbyjdcbFiU2qyCGGbYbqQc_G23+B_Xe_Q@mail.gmail.com/

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
  2024-11-14 19:13 ` [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests Joanne Koong
  2024-11-28 10:44   ` Sergey Senozhatsky
@ 2024-12-03 11:01   ` Sergey Senozhatsky
  2024-12-06  0:06     ` Joanne Koong
  2024-12-04 13:14   ` Sergey Senozhatsky
                     ` (5 subsequent siblings)
  7 siblings, 1 reply; 39+ messages in thread
From: Sergey Senozhatsky @ 2024-12-03 11:01 UTC (permalink / raw)
  To: Joanne Koong
  Cc: miklos, linux-fsdevel, josef, bernd.schubert, jefflexu,
	laoar.shao, kernel-team, Bernd Schubert, Tomasz Figa

On (24/11/14 11:13), Joanne Koong wrote:
[..]
> +static void fuse_init_fc_timeout(struct fuse_conn *fc, struct fuse_fs_context *ctx)
> +{
> +	if (ctx->req_timeout) {
> +		if (check_mul_overflow(ctx->req_timeout * 60, HZ, &fc->timeout.req_timeout))
> +			fc->timeout.req_timeout = ULONG_MAX;
> +		timer_setup(&fc->timeout.timer, fuse_check_timeout, 0);
> +		mod_timer(&fc->timeout.timer, jiffies + FUSE_TIMEOUT_TIMER_FREQ);

So I think this will require much bigger changes in the code.
fuse_check_timeout() is executed from IRQ context and it takes
the same locks that are acquired by preemptive task contexts.
So all of those now need to disable local IRQs before they
acquire: fc->bg_lock, fig->lock, fc->lock.  Otherwise we are
in a deadlock scenario (in many places, unfortunately).

A simple example:

[   91.466408]   _raw_spin_lock+0x39/0x70
[   91.466420]   fuse_simple_background+0x902/0x1130 [fuse]
[   91.466453]   fuse_send_init+0x337/0x680 [fuse]
[   91.466485]   fuse_fill_super+0x1c8/0x200 [fuse]
[   91.466516]   get_tree_nodev+0xaf/0x140
[   91.466527]   fuse_get_tree+0x27e/0x450 [fuse]
[   91.466559]   vfs_get_tree+0x88/0x240
[   91.466569]   path_mount+0xf26/0x1ed0
[   91.466579]   __se_sys_mount+0x1c9/0x240
[   91.466588]   do_syscall_64+0x6f/0xa0
[   91.466598]   entry_SYSCALL_64_after_hwframe+0x73/0xdd
[   91.466666] 
               other info that might help us debug this:
[   91.466672]  Possible unsafe locking scenario:

[   91.466679]        CPU0
[   91.466684]        ----
[   91.466689]   lock(&fiq->lock);
[   91.466701]   <Interrupt>
[   91.466707]     lock(&fiq->lock);
[   91.466718] 
                *** DEADLOCK ***

[   91.466724] 4 locks held by runtime_probe/5043:
[   91.466732]  #0: ffff888005812448 (sb_writers#3){.+.+}-{0:0}, at: filename_create+0xb2/0x320
[   91.466762]  #1: ffff8881499ea3f0 (&type->i_mutex_dir_key#5/1){+.+.}-{3:3}, at: filename_create+0x14c/0x320
[   91.466791]  #2: ffffffff9d864ce0 (rcu_read_lock){....}-{1:2}, at: security_sid_to_context_core+0xa4/0x4f0
[   91.466817]  #3: ffff88815c009ec0 ((&fc->timeout.timer)){+.-.}-{0:0}, at: run_timer_softirq+0x702/0x1700
[   91.466841] 
               stack backtrace:
[   91.466850] CPU: 2 PID: 5043 Comm: runtime_probe Tainted: G     U             6.6.63-lockdep #1 f2b045305e587e03c4766ca818bb77742f953c87
[   91.466864] Hardware name: HP Lantis/Lantis, BIOS Google_Lantis.13606.437.0 01/21/2022
[   91.466872] Call Trace:
[   91.466881]  <IRQ>
[   91.466889]  dump_stack_lvl+0x6d/0xb0
[   91.466904]  print_usage_bug+0x8a4/0xbb0
[   91.466917]  mark_lock+0x13ca/0x1940
[   91.466930]  __lock_acquire+0xc10/0x2670
[   91.466944]  lock_acquire+0x119/0x3a0
[   91.466955]  ? fuse_check_timeout+0x32/0x630 [fuse c290dfa1c114772028056af34b9662cba2d155b5]
[   91.466992]  ? fuse_check_timeout+0x32/0x630 [fuse c290dfa1c114772028056af34b9662cba2d155b5]
[   91.467025]  _raw_spin_lock+0x39/0x70
[   91.467036]  ? fuse_check_timeout+0x32/0x630 [fuse c290dfa1c114772028056af34b9662cba2d155b5]
[   91.467070]  fuse_check_timeout+0x32/0x630 [fuse c290dfa1c114772028056af34b9662cba2d155b5]
[   91.467104]  ? run_timer_softirq+0x702/0x1700
[   91.467114]  ? run_timer_softirq+0x702/0x1700
[   91.467123]  ? __pfx_fuse_check_timeout+0x10/0x10 [fuse c290dfa1c114772028056af34b9662cba2d155b5]
[   91.467156]  run_timer_softirq+0x763/0x1700
[   91.467172]  irq_exit_rcu+0x300/0x7d0
[   91.467183]  ? sysvec_apic_timer_interrupt+0x56/0x90
[   91.467195]  sysvec_apic_timer_interrupt+0x56/0x90
[   91.467205]  </IRQ>

Do we want to run fuse-watchdog as a preemptive task instead of
IRQ context?  Simlar to the way the hung-task watchdog runs, for
example.  Yes, it now may starve and not get scheduled in corner
cases (under some crazy pressure), but at least we don't need to
patch the entire fuse code to use irq_safe/irq_restore locking
variants.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
  2024-11-14 19:13 ` [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests Joanne Koong
  2024-11-28 10:44   ` Sergey Senozhatsky
  2024-12-03 11:01   ` Sergey Senozhatsky
@ 2024-12-04 13:14   ` Sergey Senozhatsky
  2024-12-05 23:10     ` Joanne Koong
  2024-12-04 13:47   ` Sergey Senozhatsky
                     ` (4 subsequent siblings)
  7 siblings, 1 reply; 39+ messages in thread
From: Sergey Senozhatsky @ 2024-12-04 13:14 UTC (permalink / raw)
  To: Joanne Koong
  Cc: miklos, linux-fsdevel, josef, bernd.schubert, jefflexu,
	laoar.shao, kernel-team, Bernd Schubert, Tomasz Figa

On (24/11/14 11:13), Joanne Koong wrote:
[..]
> @@ -920,6 +935,9 @@ struct fuse_conn {
>  	/** IDR for backing files ids */
>  	struct idr backing_files_map;
>  #endif
> +
> +	/** Only used if the connection enforces request timeouts */
> +	struct fuse_timeout timeout;
>  };
[..]
> @@ -749,6 +750,7 @@ static const struct fs_parameter_spec fuse_fs_parameters[] = {
>  	fsparam_u32	("max_read",		OPT_MAX_READ),
>  	fsparam_u32	("blksize",		OPT_BLKSIZE),
>  	fsparam_string	("subtype",		OPT_SUBTYPE),
> +	fsparam_u16	("request_timeout",	OPT_REQUEST_TIMEOUT),
>  	{}
>  };
>  
> @@ -844,6 +846,10 @@ static int fuse_parse_param(struct fs_context *fsc, struct fs_parameter *param)
>  		ctx->blksize = result.uint_32;
>  		break;
>  
> +	case OPT_REQUEST_TIMEOUT:
> +		ctx->req_timeout = result.uint_16;
> +		break;
> +

A quick question: so for this user-space should be updated
to request fuse-watchdog on particular connection?  Would
it make sense to have a way to simply enforce watchdog on
all connections?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
  2024-11-14 19:13 ` [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests Joanne Koong
                     ` (2 preceding siblings ...)
  2024-12-04 13:14   ` Sergey Senozhatsky
@ 2024-12-04 13:47   ` Sergey Senozhatsky
  2024-12-05 23:29     ` Joanne Koong
  2024-12-06  0:37   ` Jeff Layton
                     ` (3 subsequent siblings)
  7 siblings, 1 reply; 39+ messages in thread
From: Sergey Senozhatsky @ 2024-12-04 13:47 UTC (permalink / raw)
  To: Joanne Koong
  Cc: miklos, linux-fsdevel, josef, bernd.schubert, jefflexu,
	laoar.shao, kernel-team, Bernd Schubert, Tomasz Figa

On (24/11/14 11:13), Joanne Koong wrote:
>  
> +static bool request_expired(struct fuse_conn *fc, struct fuse_req *req)
> +{
> +	return jiffies > req->create_time + fc->timeout.req_timeout;
> +}

With jiffies we need to use time_after() and such, so we'd deal
with jiffies wrap-around.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
  2024-12-02 19:29                 ` Joanne Koong
  2024-12-03  4:31                   ` Sergey Senozhatsky
@ 2024-12-04 14:40                   ` Tomasz Figa
  2024-12-04 14:51                     ` Brian Geffon
  1 sibling, 1 reply; 39+ messages in thread
From: Tomasz Figa @ 2024-12-04 14:40 UTC (permalink / raw)
  To: Joanne Koong, Brian Geffon
  Cc: Bernd Schubert, Sergey Senozhatsky, Bernd Schubert,
	miklos@szeredi.hu, linux-fsdevel@vger.kernel.org,
	josef@toxicpanda.com, jefflexu@linux.alibaba.com,
	laoar.shao@gmail.com, kernel-team@meta.com

On Tue, Dec 3, 2024 at 4:29 AM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Mon, Dec 2, 2024 at 6:43 AM Bernd Schubert
> <bernd.schubert@fastmail.fm> wrote:
> >
> > On 12/2/24 10:45, Tomasz Figa wrote:
> > > Hi everyone,
> > >
> > > On Thu, Nov 28, 2024 at 8:55 PM Sergey Senozhatsky
> > > <senozhatsky@chromium.org> wrote:
> > >>
> > >> Cc-ing Tomasz
> > >>
> > >> On (24/11/28 11:23), Bernd Schubert wrote:
> > >>>> Thanks for the pointers again, Bernd.
> > >>>>
> > >>>>> Miklos had asked for to abort the connection in v4
> > >>>>> https://lore.kernel.org/all/CAJfpegsiRNnJx7OAoH58XRq3zujrcXx94S2JACFdgJJ_b8FdHw@mail.gmail.com/raw
> > >>>>
> > >>>> OK, sounds reasonable. I'll try to give the series some testing in the
> > >>>> coming days.
> > >>>>
> > >>>> // I still would probably prefer "seconds" timeout granularity.
> > >>>> // Unless this also has been discussed already and Bernd has a link ;)
> > >>>
> > >>>
> > >>> The issue is that is currently iterating through 256 hash lists +
> > >>> pending + bg.
> > >>>
> > >>> https://lore.kernel.org/all/CAJnrk1b7bfAWWq_pFP=4XH3ddc_9GtAM2mE7EgWnx2Od+UUUjQ@mail.gmail.com/raw
> > >>
> > >> Oh, I see.
> > >>
> > >>> Personally I would prefer a second list to avoid the check spike and latency
> > >>> https://lore.kernel.org/linux-fsdevel/9ba4eaf4-b9f0-483f-90e5-9512aded419e@fastmail.fm/raw
> > >>
> > >> That's good to know.  I like the idea of less CPU usage in general,
> > >> our devices a battery powered so everything counts, to some extent.
> > >>
> > >>> What is your opinion about that? I guess android and chromium have an
> > >>> interest low latencies and avoiding cpu spikes?
> > >>
> > >> Good question.
> > >>
> > >> Can't speak for android, in chromeos we probably will keep it at 1 minute,
> > >> but this is because our DEFAULT_HUNG_TASK_TIMEOUT is larger than that (we
> > >> use default value of 120 sec). There are setups that might use lower
> > >> values, or even re-define default value, e.g.:
> > >>
> > >> arch/arc/configs/axs101_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> > >> arch/arc/configs/axs103_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> > >> arch/arc/configs/axs103_smp_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> > >> arch/arc/configs/hsdk_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> > >> arch/arc/configs/vdk_hs38_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> > >> arch/arc/configs/vdk_hs38_smp_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> > >> arch/powerpc/configs/mvme5100_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=20
> > >>
> > >> In those cases 1 minute fuse timeout will overshot HUNG_TASK_TIMEOUT
> > >> and then the question is whether HUNG_TASK_PANIC is set.
> > >>
> > >> On the other hand, setups that set much lower timeout than
> > >> DEFAULT_HUNG_TASK_TIMEOUT=120 will have extra CPU activities regardless,
> > >> just because watchdogs will run more often.
> > >>
> > >> Tomasz, any opinions?
> > >
> > > First of all, thanks everyone for looking into this.
>
> Hi Sergey and Tomasz,
>
> Sorry for the late reply - I was out the last couple of days. Thanks
> Bernd for weighing in and answering the questions!
>
> > >
> > > How about keeping a list of requests in the FIFO order (in other
> > > words: first entry is the first to timeout) and whenever the first
> > > entry is being removed from the list (aka the request actually
> > > completes), re-arming the timer to the timeout of the next request in
> > > the list? This way we don't really have any timer firing unless there
> > > is really a request that timed out.
>
> I think the issue with this is that we likely would end up wasting
> more cpu cycles. For a busy FUSE server, there could be hundreds
> (thousands?) of requests that happen within the span of
> FUSE_TIMEOUT_TIMER_FREQ seconds.
>
> While working on the patch, one thing I considered was disarming the
> timer in the timeout handler fuse_check_timeout() if no requests are
> on the list, in order to accomodate for "quiet periods" (eg if the
> FUSE server is inactive for a few minutes or hours) but ultimately
> decided against it because of the overhead it'd incur per request (eg
> check if the timer is disarmed, would most likely need to grab the
> fc->lock as well since timer rearming would need to be synchronized
> between background and non-background requests, etc.).
>
> All in all, imo I don't think having the timer trigger every 60
> seconds (what FUSE_TIMEOUT_TIMER_FREQ is set to) is too costly.
>
> >
> > Requests are in FIFO order on the list and only head is checked.
> > There are 256 hash lists per fuse device for requests currently
> > in user space, though.
> >
> > >
> > > (In fact, we could optimize it even further by opportunistically
> > > scheduling a timer slightly later and opportunistically handling timed
> > > out requests when other requests are being completed, but this would
> > > be optimizing for the slow path, so probably an overkill.)
> > >
> > > As for the length of the request timeout vs the hung task watchdog
> > > timeout, my opinion is that we should make sure that the hung task
> > > watchdog doesn't hit in any case, simply because a misbehaving
> > > userspace process must not be able to panic the kernel. In the
> > > blk-core, the blk_io_schedule() function [1] uses
> > > sysctl_hung_task_timeout_secs to determine the maximum length of a
> > > single uninterruptible sleep. I suppose we could use the same
> > > calculation to obtain our timeout number. What does everyone think?
> > >
> > > [1] https://elixir.bootlin.com/linux/v6.12.1/source/block/blk-core.c#L1232
> >
> > I think that is a good idea.
>
> Btw, just something to note, the fuse request timeout has an upper
> margin of error associated with it.
>
> Copying over from the commit message -
>
> "Please note that these timeouts are not 100% precise. The request may
> take an extra FUSE_TIMEOUT_TIMER_FREQ seconds beyond the requested max
> timeout due to how it's internally implemented."
>
> For example, if a server sets the request timeout config to 10
> minutes, the server could be aborted after 11 minutes
> (FUSE_TIMEOUT_TIMER_FREQ is set to 60 seconds internally) instead of
> 10 minutes.
>

Let me add +Brian Geffon who also was thinking about the right timeout value.

>
> Thanks,
> Joanne
> >
> >
> > Thanks,
> > Bernd

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
  2024-12-04 14:40                   ` Tomasz Figa
@ 2024-12-04 14:51                     ` Brian Geffon
  2024-12-04 15:09                       ` Bernd Schubert
  2024-12-05  3:23                       ` Sergey Senozhatsky
  0 siblings, 2 replies; 39+ messages in thread
From: Brian Geffon @ 2024-12-04 14:51 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Joanne Koong, Bernd Schubert, Sergey Senozhatsky, Bernd Schubert,
	miklos@szeredi.hu, linux-fsdevel@vger.kernel.org,
	josef@toxicpanda.com, jefflexu@linux.alibaba.com,
	laoar.shao@gmail.com, kernel-team@meta.com

On Wed, Dec 4, 2024 at 9:40 AM Tomasz Figa <tfiga@chromium.org> wrote:
>
> On Tue, Dec 3, 2024 at 4:29 AM Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > On Mon, Dec 2, 2024 at 6:43 AM Bernd Schubert
> > <bernd.schubert@fastmail.fm> wrote:
> > >
> > > On 12/2/24 10:45, Tomasz Figa wrote:
> > > > Hi everyone,
> > > >
> > > > On Thu, Nov 28, 2024 at 8:55 PM Sergey Senozhatsky
> > > > <senozhatsky@chromium.org> wrote:
> > > >>
> > > >> Cc-ing Tomasz
> > > >>
> > > >> On (24/11/28 11:23), Bernd Schubert wrote:
> > > >>>> Thanks for the pointers again, Bernd.
> > > >>>>
> > > >>>>> Miklos had asked for to abort the connection in v4
> > > >>>>> https://lore.kernel.org/all/CAJfpegsiRNnJx7OAoH58XRq3zujrcXx94S2JACFdgJJ_b8FdHw@mail.gmail.com/raw
> > > >>>>
> > > >>>> OK, sounds reasonable. I'll try to give the series some testing in the
> > > >>>> coming days.
> > > >>>>
> > > >>>> // I still would probably prefer "seconds" timeout granularity.
> > > >>>> // Unless this also has been discussed already and Bernd has a link ;)
> > > >>>
> > > >>>
> > > >>> The issue is that is currently iterating through 256 hash lists +
> > > >>> pending + bg.
> > > >>>
> > > >>> https://lore.kernel.org/all/CAJnrk1b7bfAWWq_pFP=4XH3ddc_9GtAM2mE7EgWnx2Od+UUUjQ@mail.gmail.com/raw
> > > >>
> > > >> Oh, I see.
> > > >>
> > > >>> Personally I would prefer a second list to avoid the check spike and latency
> > > >>> https://lore.kernel.org/linux-fsdevel/9ba4eaf4-b9f0-483f-90e5-9512aded419e@fastmail.fm/raw
> > > >>
> > > >> That's good to know.  I like the idea of less CPU usage in general,
> > > >> our devices a battery powered so everything counts, to some extent.
> > > >>
> > > >>> What is your opinion about that? I guess android and chromium have an
> > > >>> interest low latencies and avoiding cpu spikes?
> > > >>
> > > >> Good question.
> > > >>
> > > >> Can't speak for android, in chromeos we probably will keep it at 1 minute,
> > > >> but this is because our DEFAULT_HUNG_TASK_TIMEOUT is larger than that (we
> > > >> use default value of 120 sec). There are setups that might use lower
> > > >> values, or even re-define default value, e.g.:
> > > >>
> > > >> arch/arc/configs/axs101_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> > > >> arch/arc/configs/axs103_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> > > >> arch/arc/configs/axs103_smp_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> > > >> arch/arc/configs/hsdk_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> > > >> arch/arc/configs/vdk_hs38_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> > > >> arch/arc/configs/vdk_hs38_smp_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> > > >> arch/powerpc/configs/mvme5100_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=20
> > > >>
> > > >> In those cases 1 minute fuse timeout will overshot HUNG_TASK_TIMEOUT
> > > >> and then the question is whether HUNG_TASK_PANIC is set.

In my opinion this is a good argument for having the hung task timeout
and a fuse timeout independent. The hung task timeout is for hung
kernel threads, in this situation we're potentially taking too long in
userspace but that doesn't necessarily mean the system is hung. I
think a loop which does an interruptible wait with a timeout of 1/2
the hung task timeout would make sense to ensure the hung task timeout
doesn't hit. There might be situations where we want a fuse timeout
which is larger than the hung task timeout, perhaps a file system
being read over a satellite internet connection?

> > > >>
> > > >> On the other hand, setups that set much lower timeout than
> > > >> DEFAULT_HUNG_TASK_TIMEOUT=120 will have extra CPU activities regardless,
> > > >> just because watchdogs will run more often.
> > > >>
> > > >> Tomasz, any opinions?
> > > >
> > > > First of all, thanks everyone for looking into this.
> >
> > Hi Sergey and Tomasz,
> >
> > Sorry for the late reply - I was out the last couple of days. Thanks
> > Bernd for weighing in and answering the questions!
> >
> > > >
> > > > How about keeping a list of requests in the FIFO order (in other
> > > > words: first entry is the first to timeout) and whenever the first
> > > > entry is being removed from the list (aka the request actually
> > > > completes), re-arming the timer to the timeout of the next request in
> > > > the list? This way we don't really have any timer firing unless there
> > > > is really a request that timed out.
> >
> > I think the issue with this is that we likely would end up wasting
> > more cpu cycles. For a busy FUSE server, there could be hundreds
> > (thousands?) of requests that happen within the span of
> > FUSE_TIMEOUT_TIMER_FREQ seconds.
> >
> > While working on the patch, one thing I considered was disarming the
> > timer in the timeout handler fuse_check_timeout() if no requests are
> > on the list, in order to accomodate for "quiet periods" (eg if the
> > FUSE server is inactive for a few minutes or hours) but ultimately
> > decided against it because of the overhead it'd incur per request (eg
> > check if the timer is disarmed, would most likely need to grab the
> > fc->lock as well since timer rearming would need to be synchronized
> > between background and non-background requests, etc.).
> >
> > All in all, imo I don't think having the timer trigger every 60
> > seconds (what FUSE_TIMEOUT_TIMER_FREQ is set to) is too costly.
> >
> > >
> > > Requests are in FIFO order on the list and only head is checked.
> > > There are 256 hash lists per fuse device for requests currently
> > > in user space, though.
> > >
> > > >
> > > > (In fact, we could optimize it even further by opportunistically
> > > > scheduling a timer slightly later and opportunistically handling timed
> > > > out requests when other requests are being completed, but this would
> > > > be optimizing for the slow path, so probably an overkill.)
> > > >
> > > > As for the length of the request timeout vs the hung task watchdog
> > > > timeout, my opinion is that we should make sure that the hung task
> > > > watchdog doesn't hit in any case, simply because a misbehaving
> > > > userspace process must not be able to panic the kernel. In the
> > > > blk-core, the blk_io_schedule() function [1] uses
> > > > sysctl_hung_task_timeout_secs to determine the maximum length of a
> > > > single uninterruptible sleep. I suppose we could use the same
> > > > calculation to obtain our timeout number. What does everyone think?
> > > >
> > > > [1] https://elixir.bootlin.com/linux/v6.12.1/source/block/blk-core.c#L1232
> > >
> > > I think that is a good idea.
> >
> > Btw, just something to note, the fuse request timeout has an upper
> > margin of error associated with it.
> >
> > Copying over from the commit message -
> >
> > "Please note that these timeouts are not 100% precise. The request may
> > take an extra FUSE_TIMEOUT_TIMER_FREQ seconds beyond the requested max
> > timeout due to how it's internally implemented."
> >
> > For example, if a server sets the request timeout config to 10
> > minutes, the server could be aborted after 11 minutes
> > (FUSE_TIMEOUT_TIMER_FREQ is set to 60 seconds internally) instead of
> > 10 minutes.
> >
>
> Let me add +Brian Geffon who also was thinking about the right timeout value.
>
> >
> > Thanks,
> > Joanne
> > >
> > >
> > > Thanks,
> > > Bernd

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
  2024-12-04 14:51                     ` Brian Geffon
@ 2024-12-04 15:09                       ` Bernd Schubert
  2024-12-05  3:23                       ` Sergey Senozhatsky
  1 sibling, 0 replies; 39+ messages in thread
From: Bernd Schubert @ 2024-12-04 15:09 UTC (permalink / raw)
  To: Brian Geffon, Tomasz Figa
  Cc: Joanne Koong, Sergey Senozhatsky, Bernd Schubert,
	miklos@szeredi.hu, linux-fsdevel@vger.kernel.org,
	josef@toxicpanda.com, jefflexu@linux.alibaba.com,
	laoar.shao@gmail.com, kernel-team@meta.com



On 12/4/24 15:51, Brian Geffon wrote:
> On Wed, Dec 4, 2024 at 9:40 AM Tomasz Figa <tfiga@chromium.org> wrote:
>>
>> On Tue, Dec 3, 2024 at 4:29 AM Joanne Koong <joannelkoong@gmail.com> wrote:
>>>
>>> On Mon, Dec 2, 2024 at 6:43 AM Bernd Schubert
>>> <bernd.schubert@fastmail.fm> wrote:
>>>>
>>>> On 12/2/24 10:45, Tomasz Figa wrote:
>>>>> Hi everyone,
>>>>>
>>>>> On Thu, Nov 28, 2024 at 8:55 PM Sergey Senozhatsky
>>>>> <senozhatsky@chromium.org> wrote:
>>>>>>
>>>>>> Cc-ing Tomasz
>>>>>>
>>>>>> On (24/11/28 11:23), Bernd Schubert wrote:
>>>>>>>> Thanks for the pointers again, Bernd.
>>>>>>>>
>>>>>>>>> Miklos had asked for to abort the connection in v4
>>>>>>>>> https://lore.kernel.org/all/CAJfpegsiRNnJx7OAoH58XRq3zujrcXx94S2JACFdgJJ_b8FdHw@mail.gmail.com/raw
>>>>>>>>
>>>>>>>> OK, sounds reasonable. I'll try to give the series some testing in the
>>>>>>>> coming days.
>>>>>>>>
>>>>>>>> // I still would probably prefer "seconds" timeout granularity.
>>>>>>>> // Unless this also has been discussed already and Bernd has a link ;)
>>>>>>>
>>>>>>>
>>>>>>> The issue is that is currently iterating through 256 hash lists +
>>>>>>> pending + bg.
>>>>>>>
>>>>>>> https://lore.kernel.org/all/CAJnrk1b7bfAWWq_pFP=4XH3ddc_9GtAM2mE7EgWnx2Od+UUUjQ@mail.gmail.com/raw
>>>>>>
>>>>>> Oh, I see.
>>>>>>
>>>>>>> Personally I would prefer a second list to avoid the check spike and latency
>>>>>>> https://lore.kernel.org/linux-fsdevel/9ba4eaf4-b9f0-483f-90e5-9512aded419e@fastmail.fm/raw
>>>>>>
>>>>>> That's good to know.  I like the idea of less CPU usage in general,
>>>>>> our devices a battery powered so everything counts, to some extent.
>>>>>>
>>>>>>> What is your opinion about that? I guess android and chromium have an
>>>>>>> interest low latencies and avoiding cpu spikes?
>>>>>>
>>>>>> Good question.
>>>>>>
>>>>>> Can't speak for android, in chromeos we probably will keep it at 1 minute,
>>>>>> but this is because our DEFAULT_HUNG_TASK_TIMEOUT is larger than that (we
>>>>>> use default value of 120 sec). There are setups that might use lower
>>>>>> values, or even re-define default value, e.g.:
>>>>>>
>>>>>> arch/arc/configs/axs101_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
>>>>>> arch/arc/configs/axs103_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
>>>>>> arch/arc/configs/axs103_smp_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
>>>>>> arch/arc/configs/hsdk_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
>>>>>> arch/arc/configs/vdk_hs38_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
>>>>>> arch/arc/configs/vdk_hs38_smp_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
>>>>>> arch/powerpc/configs/mvme5100_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=20
>>>>>>
>>>>>> In those cases 1 minute fuse timeout will overshot HUNG_TASK_TIMEOUT
>>>>>> and then the question is whether HUNG_TASK_PANIC is set.
> 
> In my opinion this is a good argument for having the hung task timeout
> and a fuse timeout independent. The hung task timeout is for hung
> kernel threads, in this situation we're potentially taking too long in
> userspace but that doesn't necessarily mean the system is hung. I
> think a loop which does an interruptible wait with a timeout of 1/2
> the hung task timeout would make sense to ensure the hung task timeout
> doesn't hit. There might be situations where we want a fuse timeout
> which is larger than the hung task timeout, perhaps a file system
> being read over a satellite internet connection?


For a network file system the remote server also might just hang and
one might want to wait much longer than  1/2 hung task timeout for 
recovery.


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
  2024-12-04 14:51                     ` Brian Geffon
  2024-12-04 15:09                       ` Bernd Schubert
@ 2024-12-05  3:23                       ` Sergey Senozhatsky
  2024-12-05 11:07                         ` Brian Geffon
  1 sibling, 1 reply; 39+ messages in thread
From: Sergey Senozhatsky @ 2024-12-05  3:23 UTC (permalink / raw)
  To: Brian Geffon
  Cc: Tomasz Figa, Joanne Koong, Bernd Schubert, Sergey Senozhatsky,
	Bernd Schubert, miklos@szeredi.hu, linux-fsdevel@vger.kernel.org,
	josef@toxicpanda.com, jefflexu@linux.alibaba.com,
	laoar.shao@gmail.com, kernel-team@meta.com

On (24/12/04 09:51), Brian Geffon wrote:
> > > > >>
> > > > >> In those cases 1 minute fuse timeout will overshot HUNG_TASK_TIMEOUT
> > > > >> and then the question is whether HUNG_TASK_PANIC is set.
> 
> In my opinion this is a good argument for having the hung task timeout
> and a fuse timeout independent. The hung task timeout is for hung
> kernel threads

Sorry, but no, it's not only for kernel threads.

> in this situation we're potentially taking too long in
> userspace but that doesn't necessarily mean the system is hung.

And it's not for hung system.  It's for tasks that stuck unreasonably
long waiting for a particular event or resource.  And those tasks very
well can be user-space processes, being stuck in syscall waiting for
something.

> I think a loop which does an interruptible wait with a timeout of 1/2
> the hung task timeout would make sense to ensure the hung task timeout
> doesn't hit.

The point here is not to silent watchdog, we might as well just disable
it and call it a day.  The point here is that fuse can be used (and IS
in our particular case) as a network filesystem, and the problem can be
way outside of your system, so spinning in a wait loop doesn't fix any
problem on that remote system no matter how long we spin, and that's
what watchdog signals us.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
  2024-12-05  3:23                       ` Sergey Senozhatsky
@ 2024-12-05 11:07                         ` Brian Geffon
  0 siblings, 0 replies; 39+ messages in thread
From: Brian Geffon @ 2024-12-05 11:07 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Tomasz Figa, Joanne Koong, Bernd Schubert, Bernd Schubert,
	miklos@szeredi.hu, linux-fsdevel@vger.kernel.org,
	josef@toxicpanda.com, jefflexu@linux.alibaba.com,
	laoar.shao@gmail.com, kernel-team@meta.com

On Wed, Dec 4, 2024 at 10:23 PM Sergey Senozhatsky
<senozhatsky@chromium.org> wrote:
>
> On (24/12/04 09:51), Brian Geffon wrote:
> > > > > >>
> > > > > >> In those cases 1 minute fuse timeout will overshot HUNG_TASK_TIMEOUT
> > > > > >> and then the question is whether HUNG_TASK_PANIC is set.
> >
> > In my opinion this is a good argument for having the hung task timeout
> > and a fuse timeout independent. The hung task timeout is for hung
> > kernel threads
>
> Sorry, but no, it's not only for kernel threads.
>
> > in this situation we're potentially taking too long in
> > userspace but that doesn't necessarily mean the system is hung.
>
> And it's not for hung system.  It's for tasks that stuck unreasonably
> long waiting for a particular event or resource.  And those tasks very
> well can be user-space processes, being stuck in syscall waiting for
> something.
>
> > I think a loop which does an interruptible wait with a timeout of 1/2
> > the hung task timeout would make sense to ensure the hung task timeout
> > doesn't hit.
>
> The point here is not to silent watchdog, we might as well just disable
> it and call it a day.  The point here is that fuse can be used (and IS
> in our particular case) as a network filesystem, and the problem can be
> way outside of your system, so spinning in a wait loop doesn't fix any
> problem on that remote system no matter how long we spin, and that's
> what watchdog signals us.

Sergey, to clarify again what I was suggesting, I was not suggesting
that no timeout is appropriate. I suggested that the hung task timeout
is not appropriate and instead we should be using a different timeout.
My suggestion about looping was simply to say: **once we've
transitioned to userspace to service the request the hung task timeout
should not apply and instead a different fuse timeout should.**

The loop is nothing more than to allow the task to break out of the
wait so it's not triggering the hung task timeout and frankly an
implementation detail that I should have probably omitted. There are
situations where a timeout less than the hung task timeout might be
appropriate and there are situations where a timeout longer than the
hung task timeout might be appropriate.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
  2024-12-04 13:14   ` Sergey Senozhatsky
@ 2024-12-05 23:10     ` Joanne Koong
  0 siblings, 0 replies; 39+ messages in thread
From: Joanne Koong @ 2024-12-05 23:10 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: miklos, linux-fsdevel, josef, bernd.schubert, jefflexu,
	laoar.shao, kernel-team, Bernd Schubert, Tomasz Figa

On Wed, Dec 4, 2024 at 5:14 AM Sergey Senozhatsky
<senozhatsky@chromium.org> wrote:
>
> On (24/11/14 11:13), Joanne Koong wrote:
> [..]
> > @@ -920,6 +935,9 @@ struct fuse_conn {
> >       /** IDR for backing files ids */
> >       struct idr backing_files_map;
> >  #endif
> > +
> > +     /** Only used if the connection enforces request timeouts */
> > +     struct fuse_timeout timeout;
> >  };
> [..]
> > @@ -749,6 +750,7 @@ static const struct fs_parameter_spec fuse_fs_parameters[] = {
> >       fsparam_u32     ("max_read",            OPT_MAX_READ),
> >       fsparam_u32     ("blksize",             OPT_BLKSIZE),
> >       fsparam_string  ("subtype",             OPT_SUBTYPE),
> > +     fsparam_u16     ("request_timeout",     OPT_REQUEST_TIMEOUT),
> >       {}
> >  };
> >
> > @@ -844,6 +846,10 @@ static int fuse_parse_param(struct fs_context *fsc, struct fs_parameter *param)
> >               ctx->blksize = result.uint_32;
> >               break;
> >
> > +     case OPT_REQUEST_TIMEOUT:
> > +             ctx->req_timeout = result.uint_16;
> > +             break;
> > +
>
> A quick question: so for this user-space should be updated
> to request fuse-watchdog on particular connection?  Would
> it make sense to have a way to simply enforce watchdog on
> all connections?

Hi Sergey,

The third patch
(https://lore.kernel.org/linux-fsdevel/20241114191332.669127-4-joannelkoong@gmail.com/)
in this patchset adds this through the sysctl interface. The sys admin
can set /proc/sys/fs/fuse/max_request_timeout and this will ensure
requests don't take longer than this value.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
  2024-12-04 13:47   ` Sergey Senozhatsky
@ 2024-12-05 23:29     ` Joanne Koong
  2024-12-06  3:26       ` Sergey Senozhatsky
  0 siblings, 1 reply; 39+ messages in thread
From: Joanne Koong @ 2024-12-05 23:29 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: miklos, linux-fsdevel, josef, bernd.schubert, jefflexu,
	laoar.shao, kernel-team, Bernd Schubert, Tomasz Figa

On Wed, Dec 4, 2024 at 5:47 AM Sergey Senozhatsky
<senozhatsky@chromium.org> wrote:
>
> On (24/11/14 11:13), Joanne Koong wrote:
> >
> > +static bool request_expired(struct fuse_conn *fc, struct fuse_req *req)
> > +{
> > +     return jiffies > req->create_time + fc->timeout.req_timeout;
> > +}
>
> With jiffies we need to use time_after() and such, so we'd deal
> with jiffies wrap-around.

Ohh I see, I guess this is because on 32-bit systems unsigned longs
are 4 bytes? Thanks for noting - I'll push out a new version with this
change.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
  2024-12-03 11:01   ` Sergey Senozhatsky
@ 2024-12-06  0:06     ` Joanne Koong
  2024-12-06  4:28       ` Sergey Senozhatsky
  0 siblings, 1 reply; 39+ messages in thread
From: Joanne Koong @ 2024-12-06  0:06 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: miklos, linux-fsdevel, josef, bernd.schubert, jefflexu,
	laoar.shao, kernel-team, Bernd Schubert, Tomasz Figa

On Tue, Dec 3, 2024 at 3:01 AM Sergey Senozhatsky
<senozhatsky@chromium.org> wrote:
>
> On (24/11/14 11:13), Joanne Koong wrote:
> [..]
> > +static void fuse_init_fc_timeout(struct fuse_conn *fc, struct fuse_fs_context *ctx)
> > +{
> > +     if (ctx->req_timeout) {
> > +             if (check_mul_overflow(ctx->req_timeout * 60, HZ, &fc->timeout.req_timeout))
> > +                     fc->timeout.req_timeout = ULONG_MAX;
> > +             timer_setup(&fc->timeout.timer, fuse_check_timeout, 0);
> > +             mod_timer(&fc->timeout.timer, jiffies + FUSE_TIMEOUT_TIMER_FREQ);
>
> So I think this will require much bigger changes in the code.
> fuse_check_timeout() is executed from IRQ context and it takes
> the same locks that are acquired by preemptive task contexts.
> So all of those now need to disable local IRQs before they
> acquire: fc->bg_lock, fig->lock, fc->lock.  Otherwise we are
> in a deadlock scenario (in many places, unfortunately).
>
> A simple example:
>
> [   91.466408]   _raw_spin_lock+0x39/0x70
> [   91.466420]   fuse_simple_background+0x902/0x1130 [fuse]
> [   91.466453]   fuse_send_init+0x337/0x680 [fuse]
> [   91.466485]   fuse_fill_super+0x1c8/0x200 [fuse]
> [   91.466516]   get_tree_nodev+0xaf/0x140
> [   91.466527]   fuse_get_tree+0x27e/0x450 [fuse]
> [   91.466559]   vfs_get_tree+0x88/0x240
> [   91.466569]   path_mount+0xf26/0x1ed0
> [   91.466579]   __se_sys_mount+0x1c9/0x240
> [   91.466588]   do_syscall_64+0x6f/0xa0
> [   91.466598]   entry_SYSCALL_64_after_hwframe+0x73/0xdd
> [   91.466666]
>                other info that might help us debug this:
> [   91.466672]  Possible unsafe locking scenario:
>
> [   91.466679]        CPU0
> [   91.466684]        ----
> [   91.466689]   lock(&fiq->lock);
> [   91.466701]   <Interrupt>
> [   91.466707]     lock(&fiq->lock);
> [   91.466718]
>                 *** DEADLOCK ***
>
> [   91.466724] 4 locks held by runtime_probe/5043:
> [   91.466732]  #0: ffff888005812448 (sb_writers#3){.+.+}-{0:0}, at: filename_create+0xb2/0x320
> [   91.466762]  #1: ffff8881499ea3f0 (&type->i_mutex_dir_key#5/1){+.+.}-{3:3}, at: filename_create+0x14c/0x320
> [   91.466791]  #2: ffffffff9d864ce0 (rcu_read_lock){....}-{1:2}, at: security_sid_to_context_core+0xa4/0x4f0
> [   91.466817]  #3: ffff88815c009ec0 ((&fc->timeout.timer)){+.-.}-{0:0}, at: run_timer_softirq+0x702/0x1700
> [   91.466841]
>                stack backtrace:
> [   91.466850] CPU: 2 PID: 5043 Comm: runtime_probe Tainted: G     U             6.6.63-lockdep #1 f2b045305e587e03c4766ca818bb77742f953c87
> [   91.466864] Hardware name: HP Lantis/Lantis, BIOS Google_Lantis.13606.437.0 01/21/2022
> [   91.466872] Call Trace:
> [   91.466881]  <IRQ>
> [   91.466889]  dump_stack_lvl+0x6d/0xb0
> [   91.466904]  print_usage_bug+0x8a4/0xbb0
> [   91.466917]  mark_lock+0x13ca/0x1940
> [   91.466930]  __lock_acquire+0xc10/0x2670
> [   91.466944]  lock_acquire+0x119/0x3a0
> [   91.466955]  ? fuse_check_timeout+0x32/0x630 [fuse c290dfa1c114772028056af34b9662cba2d155b5]
> [   91.466992]  ? fuse_check_timeout+0x32/0x630 [fuse c290dfa1c114772028056af34b9662cba2d155b5]
> [   91.467025]  _raw_spin_lock+0x39/0x70
> [   91.467036]  ? fuse_check_timeout+0x32/0x630 [fuse c290dfa1c114772028056af34b9662cba2d155b5]
> [   91.467070]  fuse_check_timeout+0x32/0x630 [fuse c290dfa1c114772028056af34b9662cba2d155b5]
> [   91.467104]  ? run_timer_softirq+0x702/0x1700
> [   91.467114]  ? run_timer_softirq+0x702/0x1700
> [   91.467123]  ? __pfx_fuse_check_timeout+0x10/0x10 [fuse c290dfa1c114772028056af34b9662cba2d155b5]
> [   91.467156]  run_timer_softirq+0x763/0x1700
> [   91.467172]  irq_exit_rcu+0x300/0x7d0
> [   91.467183]  ? sysvec_apic_timer_interrupt+0x56/0x90
> [   91.467195]  sysvec_apic_timer_interrupt+0x56/0x90
> [   91.467205]  </IRQ>
>
> Do we want to run fuse-watchdog as a preemptive task instead of
> IRQ context?  Simlar to the way the hung-task watchdog runs, for
> example.  Yes, it now may starve and not get scheduled in corner
> cases (under some crazy pressure), but at least we don't need to
> patch the entire fuse code to use irq_safe/irq_restore locking
> variants.

Interesting! Thanks for noting this.

It looks like the choices we have here then are to either:

* have this run in a kthread like hung-task watchdog, as you mentioned above
* have all fuse code use irq_safe/irq_restore for locks
* use a separate spinlock/list for request timeouts as per the
implementation in v7 [1], though will need to benchmark this to make
sure performance doesn't degrade from lock contention if lots of
requests are submitted in parallel

Having 1 extra kthread per server as overhead doesn't seem too bad to
me. There's already an upper margin of imprecision with the timer
implementation, so I don't see the kthread scheduling imprecision as a
big deal.

I'll restructure this to use a kthread in v10. If anyone has thoughts
on a better or more preferred approach though, would love to hear
them.

Thanks,
Joanne

[1] https://lore.kernel.org/linux-fsdevel/20241007184258.2837492-3-joannelkoong@gmail.com/

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
  2024-11-14 19:13 ` [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests Joanne Koong
                     ` (3 preceding siblings ...)
  2024-12-04 13:47   ` Sergey Senozhatsky
@ 2024-12-06  0:37   ` Jeff Layton
  2024-12-06 19:19     ` Joanne Koong
  2024-12-06 16:30   ` Etienne Martineau
                     ` (2 subsequent siblings)
  7 siblings, 1 reply; 39+ messages in thread
From: Jeff Layton @ 2024-12-06  0:37 UTC (permalink / raw)
  To: Joanne Koong, miklos, linux-fsdevel, Sergey Senozhatsky
  Cc: josef, bernd.schubert, jefflexu, laoar.shao, kernel-team,
	Bernd Schubert

On Thu, 2024-11-14 at 11:13 -0800, Joanne Koong wrote:
> There are situations where fuse servers can become unresponsive or
> stuck, for example if the server is deadlocked. Currently, there's no
> good way to detect if a server is stuck and needs to be killed manually.
> 
> This commit adds an option for enforcing a timeout (in minutes) for
> requests where if the timeout elapses without the server responding to
> the request, the connection will be automatically aborted.
> 

I haven't been keeping up with the earlier series, but I think I agree
with Sergey that this timeout would be better expressed in seconds.

Most filesystems that deal with timeouts (NFS, CIFS, etc.) specify them
as a number of seconds, and expressing this in minutes goes against
that convention. It also seems rather coarse-grained. I could easily
see a situation where 5 minutes is too short, but 6 minutes is too
long.

With that too, you probably wouldn't need patch #1. You could treat it
as a 32-bit integer and just clamp the value as necessary.

> Please note that these timeouts are not 100% precise. The request may
> take an extra FUSE_TIMEOUT_TIMER_FREQ seconds beyond the requested max
> timeout due to how it's internally implemented.
> 
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> Reviewed-by: Bernd Schubert <bschubert@ddn.com>
> ---
>  fs/fuse/dev.c    | 80 ++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/fuse/fuse_i.h | 21 +++++++++++++
>  fs/fuse/inode.c  | 21 +++++++++++++
>  3 files changed, 122 insertions(+)
> 
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index 29fc61a072ba..536aa4525e8f 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -45,6 +45,82 @@ static struct fuse_dev *fuse_get_dev(struct file *file)
>  	return READ_ONCE(file->private_data);
>  }
>  
> +static bool request_expired(struct fuse_conn *fc, struct fuse_req *req)
> +{
> +	return jiffies > req->create_time + fc->timeout.req_timeout;
> +}
> +
> +/*
> + * Check if any requests aren't being completed by the specified request
> + * timeout. To do so, we:
> + * - check the fiq pending list
> + * - check the bg queue
> + * - check the fpq io and processing lists
> + *
> + * To make this fast, we only check against the head request on each list since
> + * these are generally queued in order of creation time (eg newer requests get
> + * queued to the tail). We might miss a few edge cases (eg requests transitioning
> + * between lists, re-sent requests at the head of the pending list having a
> + * later creation time than other requests on that list, etc.) but that is fine
> + * since if the request never gets fulfilled, it will eventually be caught.
> + */
> +void fuse_check_timeout(struct timer_list *timer)
> +{
> +	struct fuse_conn *fc = container_of(timer, struct fuse_conn, timeout.timer);
> +	struct fuse_iqueue *fiq = &fc->iq;
> +	struct fuse_req *req;
> +	struct fuse_dev *fud;
> +	struct fuse_pqueue *fpq;
> +	bool expired = false;
> +	int i;
> +
> +	spin_lock(&fiq->lock);
> +	req = list_first_entry_or_null(&fiq->pending, struct fuse_req, list);
> +	if (req)
> +		expired = request_expired(fc, req);
> +	spin_unlock(&fiq->lock);
> +	if (expired)
> +		goto abort_conn;
> +
> +	spin_lock(&fc->bg_lock);
> +	req = list_first_entry_or_null(&fc->bg_queue, struct fuse_req, list);
> +	if (req)
> +		expired = request_expired(fc, req);
> +	spin_unlock(&fc->bg_lock);
> +	if (expired)
> +		goto abort_conn;
> +
> +	spin_lock(&fc->lock);
> +	if (!fc->connected) {
> +		spin_unlock(&fc->lock);
> +		return;
> +	}
> +	list_for_each_entry(fud, &fc->devices, entry) {
> +		fpq = &fud->pq;
> +		spin_lock(&fpq->lock);
> +		req = list_first_entry_or_null(&fpq->io, struct fuse_req, list);
> +		if (req && request_expired(fc, req))
> +			goto fpq_abort;
> +
> +		for (i = 0; i < FUSE_PQ_HASH_SIZE; i++) {
> +			req = list_first_entry_or_null(&fpq->processing[i], struct fuse_req, list);
> +			if (req && request_expired(fc, req))
> +				goto fpq_abort;
> +		}
> +		spin_unlock(&fpq->lock);
> +	}
> +	spin_unlock(&fc->lock);
> +
> +	mod_timer(&fc->timeout.timer, jiffies + FUSE_TIMEOUT_TIMER_FREQ);
> +	return;
> +
> +fpq_abort:
> +	spin_unlock(&fpq->lock);
> +	spin_unlock(&fc->lock);
> +abort_conn:
> +	fuse_abort_conn(fc);
> +}
> +
>  static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
>  {
>  	INIT_LIST_HEAD(&req->list);
> @@ -53,6 +129,7 @@ static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
>  	refcount_set(&req->count, 1);
>  	__set_bit(FR_PENDING, &req->flags);
>  	req->fm = fm;
> +	req->create_time = jiffies;
>  }
>  
>  static struct fuse_req *fuse_request_alloc(struct fuse_mount *fm, gfp_t flags)
> @@ -2308,6 +2385,9 @@ void fuse_abort_conn(struct fuse_conn *fc)
>  		spin_unlock(&fc->lock);
>  
>  		end_requests(&to_end);
> +
> +		if (fc->timeout.req_timeout)
> +			timer_delete(&fc->timeout.timer);
>  	} else {
>  		spin_unlock(&fc->lock);
>  	}
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index d35c37ccf9b5..9092201c4e0b 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -438,6 +438,9 @@ struct fuse_req {
>  
>  	/** fuse_mount this request belongs to */
>  	struct fuse_mount *fm;
> +
> +	/** When (in jiffies) the request was created */
> +	unsigned long create_time;
>  };
>  
>  struct fuse_iqueue;
> @@ -528,6 +531,16 @@ struct fuse_pqueue {
>  	struct list_head io;
>  };
>  
> +/* Frequency (in seconds) of request timeout checks, if opted into */
> +#define FUSE_TIMEOUT_TIMER_FREQ 60 * HZ
> +
> +struct fuse_timeout {
> +	struct timer_list timer;
> +
> +	/* Request timeout (in jiffies). 0 = no timeout */
> +	unsigned long req_timeout;
> +};
> +
>  /**
>   * Fuse device instance
>   */
> @@ -574,6 +587,8 @@ struct fuse_fs_context {
>  	enum fuse_dax_mode dax_mode;
>  	unsigned int max_read;
>  	unsigned int blksize;
> +	/*  Request timeout (in minutes). 0 = no timeout (infinite wait) */
> +	unsigned int req_timeout;
>  	const char *subtype;
>  
>  	/* DAX device, may be NULL */
> @@ -920,6 +935,9 @@ struct fuse_conn {
>  	/** IDR for backing files ids */
>  	struct idr backing_files_map;
>  #endif
> +
> +	/** Only used if the connection enforces request timeouts */
> +	struct fuse_timeout timeout;
>  };
>  
>  /*
> @@ -1181,6 +1199,9 @@ void fuse_request_end(struct fuse_req *req);
>  void fuse_abort_conn(struct fuse_conn *fc);
>  void fuse_wait_aborted(struct fuse_conn *fc);
>  
> +/* Check if any requests timed out */
> +void fuse_check_timeout(struct timer_list *timer);
> +
>  /**
>   * Invalidate inode attributes
>   */
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index f1779ff3f8d1..ee006f09cd04 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -735,6 +735,7 @@ enum {
>  	OPT_ALLOW_OTHER,
>  	OPT_MAX_READ,
>  	OPT_BLKSIZE,
> +	OPT_REQUEST_TIMEOUT,
>  	OPT_ERR
>  };
>  
> @@ -749,6 +750,7 @@ static const struct fs_parameter_spec fuse_fs_parameters[] = {
>  	fsparam_u32	("max_read",		OPT_MAX_READ),
>  	fsparam_u32	("blksize",		OPT_BLKSIZE),
>  	fsparam_string	("subtype",		OPT_SUBTYPE),
> +	fsparam_u16	("request_timeout",	OPT_REQUEST_TIMEOUT),
>  	{}
>  };
>  
> @@ -844,6 +846,10 @@ static int fuse_parse_param(struct fs_context *fsc, struct fs_parameter *param)
>  		ctx->blksize = result.uint_32;
>  		break;
>  
> +	case OPT_REQUEST_TIMEOUT:
> +		ctx->req_timeout = result.uint_16;
> +		break;
> +
>  	default:
>  		return -EINVAL;
>  	}
> @@ -973,6 +979,8 @@ void fuse_conn_put(struct fuse_conn *fc)
>  
>  		if (IS_ENABLED(CONFIG_FUSE_DAX))
>  			fuse_dax_conn_free(fc);
> +		if (fc->timeout.req_timeout)
> +			timer_shutdown_sync(&fc->timeout.timer);
>  		if (fiq->ops->release)
>  			fiq->ops->release(fiq);
>  		put_pid_ns(fc->pid_ns);
> @@ -1691,6 +1699,18 @@ int fuse_init_fs_context_submount(struct fs_context *fsc)
>  }
>  EXPORT_SYMBOL_GPL(fuse_init_fs_context_submount);
>  
> +static void fuse_init_fc_timeout(struct fuse_conn *fc, struct fuse_fs_context *ctx)
> +{
> +	if (ctx->req_timeout) {
> +		if (check_mul_overflow(ctx->req_timeout * 60, HZ, &fc->timeout.req_timeout))
> +			fc->timeout.req_timeout = ULONG_MAX;
> +		timer_setup(&fc->timeout.timer, fuse_check_timeout, 0);
> +		mod_timer(&fc->timeout.timer, jiffies + FUSE_TIMEOUT_TIMER_FREQ);
> +	} else {
> +		fc->timeout.req_timeout = 0;
> +	}
> +}
> +
>  int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
>  {
>  	struct fuse_dev *fud = NULL;
> @@ -1753,6 +1773,7 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
>  	fc->destroy = ctx->destroy;
>  	fc->no_control = ctx->no_control;
>  	fc->no_force_umount = ctx->no_force_umount;
> +	fuse_init_fc_timeout(fc, ctx);
>  
>  	err = -ENOMEM;
>  	root = fuse_get_root_inode(sb, ctx->rootmode);

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RESEND v9 3/3] fuse: add default_request_timeout and max_request_timeout sysctls
  2024-11-14 19:13 ` [PATCH RESEND v9 3/3] fuse: add default_request_timeout and max_request_timeout sysctls Joanne Koong
@ 2024-12-06  0:43   ` Jeff Layton
  2024-12-06 18:16     ` Joanne Koong
  0 siblings, 1 reply; 39+ messages in thread
From: Jeff Layton @ 2024-12-06  0:43 UTC (permalink / raw)
  To: Joanne Koong, miklos, linux-fsdevel
  Cc: josef, bernd.schubert, jefflexu, laoar.shao, kernel-team,
	Bernd Schubert

On Thu, 2024-11-14 at 11:13 -0800, Joanne Koong wrote:
> Introduce two new sysctls, "default_request_timeout" and
> "max_request_timeout". These control how long (in minutes) a server can
> take to reply to a request. If the server does not reply by the timeout,
> then the connection will be aborted.
> 

Is this patch really needed? You have a per-mount limit already, do we
need a global default and max?

Also, ditto here wrt seconds vs. minutes. If these *are* needed, then
having them expressed in seconds would be more intuitive for most
admins.


> "default_request_timeout" sets the default timeout if no timeout is
> specified by the fuse server on mount. 0 (default) indicates no default
> timeout should be enforced. If the server did specify a timeout, then
> default_request_timeout will be ignored.
> 
> "max_request_timeout" sets the max amount of time the server may take to
> reply to a request. 0 (default) indicates no maximum timeout. If
> max_request_timeout is set and the fuse server attempts to set a
> timeout greater than max_request_timeout, the system will use
> max_request_timeout as the timeout. Similarly, if default_request_timeout
> is greater than max_request_timeout, the system will use
> max_request_timeout as the timeout. If the server does not request a
> timeout and default_request_timeout is set to 0 but max_request_timeout
> is set, then the timeout will be max_request_timeout.
> 
> Please note that these timeouts are not 100% precise. The request may
> take an extra FUSE_TIMEOUT_TIMER_FREQ seconds beyond the set max timeout
> due to how it's internally implemented.
> 
> $ sysctl -a | grep fuse.default_request_timeout
> fs.fuse.default_request_timeout = 0
> 
> $ echo 65536 | sudo tee /proc/sys/fs/fuse/default_request_timeout
> tee: /proc/sys/fs/fuse/default_request_timeout: Invalid argument
> 
> $ echo 65535 | sudo tee /proc/sys/fs/fuse/default_request_timeout
> 65535
> 
> $ sysctl -a | grep fuse.default_request_timeout
> fs.fuse.default_request_timeout = 65535
> 
> $ echo 0 | sudo tee /proc/sys/fs/fuse/default_request_timeout
> 0
> 
> $ sysctl -a | grep fuse.default_request_timeout
> fs.fuse.default_request_timeout = 0
> 
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> Reviewed-by: Bernd Schubert <bschubert@ddn.com>
> ---
>  Documentation/admin-guide/sysctl/fs.rst | 27 +++++++++++++++++++++++++
>  fs/fuse/fuse_i.h                        | 10 +++++++++
>  fs/fuse/inode.c                         | 16 +++++++++++++--
>  fs/fuse/sysctl.c                        | 20 ++++++++++++++++++
>  4 files changed, 71 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/admin-guide/sysctl/fs.rst b/Documentation/admin-guide/sysctl/fs.rst
> index fa25d7e718b3..790a34291467 100644
> --- a/Documentation/admin-guide/sysctl/fs.rst
> +++ b/Documentation/admin-guide/sysctl/fs.rst
> @@ -342,3 +342,30 @@ filesystems:
>  ``/proc/sys/fs/fuse/max_pages_limit`` is a read/write file for
>  setting/getting the maximum number of pages that can be used for servicing
>  requests in FUSE.
> +
> +``/proc/sys/fs/fuse/default_request_timeout`` is a read/write file for
> +setting/getting the default timeout (in minutes) for a fuse server to
> +reply to a kernel-issued request in the event where the server did not
> +specify a timeout at mount. If the server set a timeout,
> +then default_request_timeout will be ignored.  The default
> +"default_request_timeout" is set to 0. 0 indicates a no-op (eg
> +requests will not have a default request timeout set if no timeout was
> +specified by the server).
> +
> +``/proc/sys/fs/fuse/max_request_timeout`` is a read/write file for
> +setting/getting the maximum timeout (in minutes) for a fuse server to
> +reply to a kernel-issued request. A value greater than 0 automatically opts
> +the server into a timeout that will be at most "max_request_timeout", even if
> +the server did not specify a timeout and default_request_timeout is set to 0.
> +If max_request_timeout is greater than 0 and the server set a timeout greater
> +than max_request_timeout or default_request_timeout is set to a value greater
> +than max_request_timeout, the system will use max_request_timeout as the
> +timeout. 0 indicates a no-op (eg requests will not have an upper bound on the
> +timeout and if the server did not request a timeout and default_request_timeout
> +was not set, there will be no timeout).
> +
> +Please note that for the timeout options, if the server does not respond to
> +the request by the time the timeout elapses, then the connection to the fuse
> +server will be aborted. Please also note that the timeouts are not 100%
> +precise (eg you may set 10 minutes but the timeout may kick in after 11
> +minutes).
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index 9092201c4e0b..e82ddff8d752 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -46,6 +46,16 @@
>  
>  /** Maximum of max_pages received in init_out */
>  extern unsigned int fuse_max_pages_limit;
> +/*
> + * Default timeout (in minutes) for the server to reply to a request
> + * before the connection is aborted, if no timeout was specified on mount.
> + */
> +extern unsigned int fuse_default_req_timeout;
> +/*
> + * Max timeout (in minutes) for the server to reply to a request before
> + * the connection is aborted.
> + */
> +extern unsigned int fuse_max_req_timeout;
>  
>  /** List of active connections */
>  extern struct list_head fuse_conn_list;
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index ee006f09cd04..1e7cc6509e42 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -36,6 +36,9 @@ DEFINE_MUTEX(fuse_mutex);
>  static int set_global_limit(const char *val, const struct kernel_param *kp);
>  
>  unsigned int fuse_max_pages_limit = 256;
> +/* default is no timeout */
> +unsigned int fuse_default_req_timeout = 0;
> +unsigned int fuse_max_req_timeout = 0;
>  
>  unsigned max_user_bgreq;
>  module_param_call(max_user_bgreq, set_global_limit, param_get_uint,
> @@ -1701,8 +1704,17 @@ EXPORT_SYMBOL_GPL(fuse_init_fs_context_submount);
>  
>  static void fuse_init_fc_timeout(struct fuse_conn *fc, struct fuse_fs_context *ctx)
>  {
> -	if (ctx->req_timeout) {
> -		if (check_mul_overflow(ctx->req_timeout * 60, HZ, &fc->timeout.req_timeout))
> +	unsigned int timeout = ctx->req_timeout ?: fuse_default_req_timeout;
> +
> +	if (fuse_max_req_timeout) {
> +		if (!timeout)
> +			timeout = fuse_max_req_timeout;
> +		else
> +			timeout = min(timeout, fuse_max_req_timeout);
> +	}
> +
> +	if (timeout) {
> +		if (check_mul_overflow(timeout * 60, HZ, &fc->timeout.req_timeout))
>  			fc->timeout.req_timeout = ULONG_MAX;
>  		timer_setup(&fc->timeout.timer, fuse_check_timeout, 0);
>  		mod_timer(&fc->timeout.timer, jiffies + FUSE_TIMEOUT_TIMER_FREQ);
> diff --git a/fs/fuse/sysctl.c b/fs/fuse/sysctl.c
> index b272bb333005..6a9094e17950 100644
> --- a/fs/fuse/sysctl.c
> +++ b/fs/fuse/sysctl.c
> @@ -13,6 +13,8 @@ static struct ctl_table_header *fuse_table_header;
>  /* Bound by fuse_init_out max_pages, which is a u16 */
>  static unsigned int sysctl_fuse_max_pages_limit = 65535;
>  
> +static unsigned int sysctl_fuse_max_req_timeout_limit = U16_MAX;
> +
>  static struct ctl_table fuse_sysctl_table[] = {
>  	{
>  		.procname	= "max_pages_limit",
> @@ -23,6 +25,24 @@ static struct ctl_table fuse_sysctl_table[] = {
>  		.extra1		= SYSCTL_ONE,
>  		.extra2		= &sysctl_fuse_max_pages_limit,
>  	},
> +	{
> +		.procname	= "default_request_timeout",
> +		.data		= &fuse_default_req_timeout,
> +		.maxlen		= sizeof(fuse_default_req_timeout),
> +		.mode		= 0644,
> +		.proc_handler	= proc_douintvec_minmax,
> +		.extra1		= SYSCTL_ZERO,
> +		.extra2		= &sysctl_fuse_max_req_timeout_limit,
> +	},
> +	{
> +		.procname	= "max_request_timeout",
> +		.data		= &fuse_max_req_timeout,
> +		.maxlen		= sizeof(fuse_max_req_timeout),
> +		.mode		= 0644,
> +		.proc_handler	= proc_douintvec_minmax,
> +		.extra1		= SYSCTL_ZERO,
> +		.extra2		= &sysctl_fuse_max_req_timeout_limit,
> +	},
>  };
>  
>  int fuse_sysctl_register(void)

-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
  2024-12-05 23:29     ` Joanne Koong
@ 2024-12-06  3:26       ` Sergey Senozhatsky
  0 siblings, 0 replies; 39+ messages in thread
From: Sergey Senozhatsky @ 2024-12-06  3:26 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Sergey Senozhatsky, miklos, linux-fsdevel, josef, bernd.schubert,
	jefflexu, laoar.shao, kernel-team, Bernd Schubert, Tomasz Figa

On (24/12/05 15:29), Joanne Koong wrote:
> On Wed, Dec 4, 2024 at 5:47 AM Sergey Senozhatsky
> <senozhatsky@chromium.org> wrote:
> >
> > On (24/11/14 11:13), Joanne Koong wrote:
> > >
> > > +static bool request_expired(struct fuse_conn *fc, struct fuse_req *req)
> > > +{
> > > +     return jiffies > req->create_time + fc->timeout.req_timeout;
> > > +}
> >
> > With jiffies we need to use time_after() and such, so we'd deal
> > with jiffies wrap-around.
> 
> Ohh I see, I guess this is because on 32-bit systems unsigned longs
> are 4 bytes?

Correct, IIRC on 32bit system jiffies wraparound every 47 or 49 days.
But I also recall that jiffies (at least in the past) were initialized
to -5 minutes so that they would wraparound during first 5 minutes
of uptime to reveal bugs.  I don't know if it's still the case though.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
  2024-12-06  0:06     ` Joanne Koong
@ 2024-12-06  4:28       ` Sergey Senozhatsky
  0 siblings, 0 replies; 39+ messages in thread
From: Sergey Senozhatsky @ 2024-12-06  4:28 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Sergey Senozhatsky, miklos, linux-fsdevel, josef, bernd.schubert,
	jefflexu, laoar.shao, kernel-team, Bernd Schubert, Tomasz Figa

On (24/12/05 16:06), Joanne Koong wrote:
> Interesting! Thanks for noting this.
> 
> It looks like the choices we have here then are to either:
> 
> * have this run in a kthread like hung-task watchdog, as you mentioned above

Works for me.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
  2024-11-14 19:13 ` [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests Joanne Koong
                     ` (4 preceding siblings ...)
  2024-12-06  0:37   ` Jeff Layton
@ 2024-12-06 16:30   ` Etienne Martineau
  2024-12-06 19:05   ` Joanne Koong
  2024-12-06 20:12   ` Jeff Layton
  7 siblings, 0 replies; 39+ messages in thread
From: Etienne Martineau @ 2024-12-06 16:30 UTC (permalink / raw)
  To: joannelkoong, miklos, linux-fsdevel
  Cc: bernd.schubert, bschubert, jefflexu, josef, kernel-team,
	laoar.shao

From: Joanne Koong <joannelkoong@gmail.com>

> There are situations where fuse servers can become unresponsive or
> stuck, for example if the server is deadlocked. Currently, there's no
> good way to detect if a server is stuck and needs to be killed manually.

I tried this V9 patchset and realized that upon max_request_timeout the connection will be dropped irrespectively if the task is in D state or not. I guess this is expected behavior.
To me the concerning aspect is when tasks are going in D state because of the consequence when running with hung_task_timeout_secs and hung_task_panic=1.

> 
> This commit adds an option for enforcing a timeout (in minutes) for
> requests where if the timeout elapses without the server responding to
> the request, the connection will be automatically aborted.
> 
> Please note that these timeouts are not 100% precise. The request may
> take an extra FUSE_TIMEOUT_TIMER_FREQ seconds beyond the requested max
> timeout due to how it's internally implemented.
> 
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> Reviewed-by: Bernd Schubert <bschubert@ddn.com>
> ---
>  fs/fuse/dev.c    | 80 ++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/fuse/fuse_i.h | 21 +++++++++++++
>  fs/fuse/inode.c  | 21 +++++++++++++
>  3 files changed, 122 insertions(+)
> 
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index 29fc61a072ba..536aa4525e8f 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -45,6 +45,82 @@ static struct fuse_dev *fuse_get_dev(struct file *file)
>  	return READ_ONCE(file->private_data);
>  }
>  
> +static bool request_expired(struct fuse_conn *fc, struct fuse_req *req)
> +{
> +	return jiffies > req->create_time + fc->timeout.req_timeout;
> +}
> +
> +/*
> + * Check if any requests aren't being completed by the specified request
> + * timeout. To do so, we:
> + * - check the fiq pending list
> + * - check the bg queue
> + * - check the fpq io and processing lists
> + *
> + * To make this fast, we only check against the head request on each list since
> + * these are generally queued in order of creation time (eg newer requests get
> + * queued to the tail). We might miss a few edge cases (eg requests transitioning
> + * between lists, re-sent requests at the head of the pending list having a
> + * later creation time than other requests on that list, etc.) but that is fine
> + * since if the request never gets fulfilled, it will eventually be caught.
> + */
> +void fuse_check_timeout(struct timer_list *timer)
> +{
> +	struct fuse_conn *fc = container_of(timer, struct fuse_conn, timeout.timer);
Here this timer may get queued and if echo 1 > /sys/fs/fuse/connections/'nn'/abort is done at more or less the same time over the same connection I'm wondering what will happen?
At least I think we may need timer_delete_sync() instead of timer_delete() in fuse_abort_conn() and potentially call it from the top of fuse_abort_conn() instead.

> +	struct fuse_iqueue *fiq = &fc->iq;
> +	struct fuse_req *req;
> +	struct fuse_dev *fud;
> +	struct fuse_pqueue *fpq;
> +	bool expired = false;
> +	int i;
> +
> +	spin_lock(&fiq->lock);
Do we need spin_lock_irq instead bcos this is irq context no?

>  
> +static void fuse_init_fc_timeout(struct fuse_conn *fc, struct fuse_fs_context *ctx)
> +{
> +	if (ctx->req_timeout) {
> +		if (check_mul_overflow(ctx->req_timeout * 60, HZ, &fc->timeout.req_timeout))
> +			fc->timeout.req_timeout = ULONG_MAX;
> +		timer_setup(&fc->timeout.timer, fuse_check_timeout, 0);
> +		mod_timer(&fc->timeout.timer, jiffies + FUSE_TIMEOUT_TIMER_FREQ);
> +	} else {
> +		fc->timeout.req_timeout = 0;
> +	}
> +}
> +
>  int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
>  {
>  	struct fuse_dev *fud = NULL;
> @@ -1753,6 +1773,7 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
>  	fc->destroy = ctx->destroy;
>  	fc->no_control = ctx->no_control;
>  	fc->no_force_umount = ctx->no_force_umount;
> +	fuse_init_fc_timeout(fc, ctx);

The max_request_timeout is latched in fc->timeout.req_timeout during fuse_fill_super_common() so any further modification are not taking into effect
Say:
 sudo bash -c 'echo 100 > /proc/sys/fs/fuse/max_request_timeout'
 sshfs -o allow_other,default_permissions you@localhost:/home/you/test ./mnt
 kill -STOP `pidof /usr/lib/openssh/sftp-server`
 ls ./mnt/
 ^C
  # Trying to change back the timeout doesn't have any effect
 sudo bash -c 'echo 10 > /proc/sys/fs/fuse/max_request_timeout'

Thanks


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RESEND v9 3/3] fuse: add default_request_timeout and max_request_timeout sysctls
  2024-12-06  0:43   ` Jeff Layton
@ 2024-12-06 18:16     ` Joanne Koong
  0 siblings, 0 replies; 39+ messages in thread
From: Joanne Koong @ 2024-12-06 18:16 UTC (permalink / raw)
  To: Jeff Layton
  Cc: miklos, linux-fsdevel, josef, bernd.schubert, jefflexu,
	laoar.shao, kernel-team, Bernd Schubert

On Thu, Dec 5, 2024 at 4:43 PM Jeff Layton <jlayton@poochiereds.net> wrote:
>
> On Thu, 2024-11-14 at 11:13 -0800, Joanne Koong wrote:
> > Introduce two new sysctls, "default_request_timeout" and
> > "max_request_timeout". These control how long (in minutes) a server can
> > take to reply to a request. If the server does not reply by the timeout,
> > then the connection will be aborted.
> >
>
> Is this patch really needed? You have a per-mount limit already, do we
> need a global default and max?

Yes, I do think this patch would be useful. Since fuse servers can run
unprivileged, I think giving system admins some way to enforce a
default and max limit that applies automatically to all fuse servers
would help protect against malicious or buggy servers on the system.

>
> Also, ditto here wrt seconds vs. minutes. If these *are* needed, then
> having them expressed in seconds would be more intuitive for most
> admins.

Noted. I'll change this to seconds in the next iteration of this patchset.
>
>
> > "default_request_timeout" sets the default timeout if no timeout is
> > specified by the fuse server on mount. 0 (default) indicates no default
> > timeout should be enforced. If the server did specify a timeout, then
> > default_request_timeout will be ignored.
> >
> > "max_request_timeout" sets the max amount of time the server may take to
> > reply to a request. 0 (default) indicates no maximum timeout. If
> > max_request_timeout is set and the fuse server attempts to set a
> > timeout greater than max_request_timeout, the system will use
> > max_request_timeout as the timeout. Similarly, if default_request_timeout
> > is greater than max_request_timeout, the system will use
> > max_request_timeout as the timeout. If the server does not request a
> > timeout and default_request_timeout is set to 0 but max_request_timeout
> > is set, then the timeout will be max_request_timeout.
> >
> > Please note that these timeouts are not 100% precise. The request may
> > take an extra FUSE_TIMEOUT_TIMER_FREQ seconds beyond the set max timeout
> > due to how it's internally implemented.
> >
> > $ sysctl -a | grep fuse.default_request_timeout
> > fs.fuse.default_request_timeout = 0
> >
> > $ echo 65536 | sudo tee /proc/sys/fs/fuse/default_request_timeout
> > tee: /proc/sys/fs/fuse/default_request_timeout: Invalid argument
> >
> > $ echo 65535 | sudo tee /proc/sys/fs/fuse/default_request_timeout
> > 65535
> >
> > $ sysctl -a | grep fuse.default_request_timeout
> > fs.fuse.default_request_timeout = 65535
> >
> > $ echo 0 | sudo tee /proc/sys/fs/fuse/default_request_timeout
> > 0
> >
> > $ sysctl -a | grep fuse.default_request_timeout
> > fs.fuse.default_request_timeout = 0
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > Reviewed-by: Bernd Schubert <bschubert@ddn.com>
> > ---
> >  Documentation/admin-guide/sysctl/fs.rst | 27 +++++++++++++++++++++++++
> >  fs/fuse/fuse_i.h                        | 10 +++++++++
> >  fs/fuse/inode.c                         | 16 +++++++++++++--
> >  fs/fuse/sysctl.c                        | 20 ++++++++++++++++++
> >  4 files changed, 71 insertions(+), 2 deletions(-)
> >
> > diff --git a/Documentation/admin-guide/sysctl/fs.rst b/Documentation/admin-guide/sysctl/fs.rst
> > index fa25d7e718b3..790a34291467 100644
> > --- a/Documentation/admin-guide/sysctl/fs.rst
> > +++ b/Documentation/admin-guide/sysctl/fs.rst
> > @@ -342,3 +342,30 @@ filesystems:
> >  ``/proc/sys/fs/fuse/max_pages_limit`` is a read/write file for
> >  setting/getting the maximum number of pages that can be used for servicing
> >  requests in FUSE.
> > +
> > +``/proc/sys/fs/fuse/default_request_timeout`` is a read/write file for
> > +setting/getting the default timeout (in minutes) for a fuse server to
> > +reply to a kernel-issued request in the event where the server did not
> > +specify a timeout at mount. If the server set a timeout,
> > +then default_request_timeout will be ignored.  The default
> > +"default_request_timeout" is set to 0. 0 indicates a no-op (eg
> > +requests will not have a default request timeout set if no timeout was
> > +specified by the server).
> > +
> > +``/proc/sys/fs/fuse/max_request_timeout`` is a read/write file for
> > +setting/getting the maximum timeout (in minutes) for a fuse server to
> > +reply to a kernel-issued request. A value greater than 0 automatically opts
> > +the server into a timeout that will be at most "max_request_timeout", even if
> > +the server did not specify a timeout and default_request_timeout is set to 0.
> > +If max_request_timeout is greater than 0 and the server set a timeout greater
> > +than max_request_timeout or default_request_timeout is set to a value greater
> > +than max_request_timeout, the system will use max_request_timeout as the
> > +timeout. 0 indicates a no-op (eg requests will not have an upper bound on the
> > +timeout and if the server did not request a timeout and default_request_timeout
> > +was not set, there will be no timeout).
> > +
> > +Please note that for the timeout options, if the server does not respond to
> > +the request by the time the timeout elapses, then the connection to the fuse
> > +server will be aborted. Please also note that the timeouts are not 100%
> > +precise (eg you may set 10 minutes but the timeout may kick in after 11
> > +minutes).
> > diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> > index 9092201c4e0b..e82ddff8d752 100644
> > --- a/fs/fuse/fuse_i.h
> > +++ b/fs/fuse/fuse_i.h
> > @@ -46,6 +46,16 @@
> >
> >  /** Maximum of max_pages received in init_out */
> >  extern unsigned int fuse_max_pages_limit;
> > +/*
> > + * Default timeout (in minutes) for the server to reply to a request
> > + * before the connection is aborted, if no timeout was specified on mount.
> > + */
> > +extern unsigned int fuse_default_req_timeout;
> > +/*
> > + * Max timeout (in minutes) for the server to reply to a request before
> > + * the connection is aborted.
> > + */
> > +extern unsigned int fuse_max_req_timeout;
> >
> >  /** List of active connections */
> >  extern struct list_head fuse_conn_list;
> > diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> > index ee006f09cd04..1e7cc6509e42 100644
> > --- a/fs/fuse/inode.c
> > +++ b/fs/fuse/inode.c
> > @@ -36,6 +36,9 @@ DEFINE_MUTEX(fuse_mutex);
> >  static int set_global_limit(const char *val, const struct kernel_param *kp);
> >
> >  unsigned int fuse_max_pages_limit = 256;
> > +/* default is no timeout */
> > +unsigned int fuse_default_req_timeout = 0;
> > +unsigned int fuse_max_req_timeout = 0;
> >
> >  unsigned max_user_bgreq;
> >  module_param_call(max_user_bgreq, set_global_limit, param_get_uint,
> > @@ -1701,8 +1704,17 @@ EXPORT_SYMBOL_GPL(fuse_init_fs_context_submount);
> >
> >  static void fuse_init_fc_timeout(struct fuse_conn *fc, struct fuse_fs_context *ctx)
> >  {
> > -     if (ctx->req_timeout) {
> > -             if (check_mul_overflow(ctx->req_timeout * 60, HZ, &fc->timeout.req_timeout))
> > +     unsigned int timeout = ctx->req_timeout ?: fuse_default_req_timeout;
> > +
> > +     if (fuse_max_req_timeout) {
> > +             if (!timeout)
> > +                     timeout = fuse_max_req_timeout;
> > +             else
> > +                     timeout = min(timeout, fuse_max_req_timeout);
> > +     }
> > +
> > +     if (timeout) {
> > +             if (check_mul_overflow(timeout * 60, HZ, &fc->timeout.req_timeout))
> >                       fc->timeout.req_timeout = ULONG_MAX;
> >               timer_setup(&fc->timeout.timer, fuse_check_timeout, 0);
> >               mod_timer(&fc->timeout.timer, jiffies + FUSE_TIMEOUT_TIMER_FREQ);
> > diff --git a/fs/fuse/sysctl.c b/fs/fuse/sysctl.c
> > index b272bb333005..6a9094e17950 100644
> > --- a/fs/fuse/sysctl.c
> > +++ b/fs/fuse/sysctl.c
> > @@ -13,6 +13,8 @@ static struct ctl_table_header *fuse_table_header;
> >  /* Bound by fuse_init_out max_pages, which is a u16 */
> >  static unsigned int sysctl_fuse_max_pages_limit = 65535;
> >
> > +static unsigned int sysctl_fuse_max_req_timeout_limit = U16_MAX;
> > +
> >  static struct ctl_table fuse_sysctl_table[] = {
> >       {
> >               .procname       = "max_pages_limit",
> > @@ -23,6 +25,24 @@ static struct ctl_table fuse_sysctl_table[] = {
> >               .extra1         = SYSCTL_ONE,
> >               .extra2         = &sysctl_fuse_max_pages_limit,
> >       },
> > +     {
> > +             .procname       = "default_request_timeout",
> > +             .data           = &fuse_default_req_timeout,
> > +             .maxlen         = sizeof(fuse_default_req_timeout),
> > +             .mode           = 0644,
> > +             .proc_handler   = proc_douintvec_minmax,
> > +             .extra1         = SYSCTL_ZERO,
> > +             .extra2         = &sysctl_fuse_max_req_timeout_limit,
> > +     },
> > +     {
> > +             .procname       = "max_request_timeout",
> > +             .data           = &fuse_max_req_timeout,
> > +             .maxlen         = sizeof(fuse_max_req_timeout),
> > +             .mode           = 0644,
> > +             .proc_handler   = proc_douintvec_minmax,
> > +             .extra1         = SYSCTL_ZERO,
> > +             .extra2         = &sysctl_fuse_max_req_timeout_limit,
> > +     },
> >  };
> >
> >  int fuse_sysctl_register(void)
>
> --
> Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
  2024-11-14 19:13 ` [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests Joanne Koong
                     ` (5 preceding siblings ...)
  2024-12-06 16:30   ` Etienne Martineau
@ 2024-12-06 19:05   ` Joanne Koong
  2024-12-06 19:56     ` Etienne
  2024-12-06 20:12   ` Jeff Layton
  7 siblings, 1 reply; 39+ messages in thread
From: Joanne Koong @ 2024-12-06 19:05 UTC (permalink / raw)
  To: Etienne Martineau
  Cc: miklos, linux-fsdevel, bernd.schubert, bschubert, jefflexu, josef,
	kernel-team, laoar.shao

On Fri, Dec 6, 2024 at 8:31 AM Etienne Martineau <etmartin4313@gmail.com> wrote:
>
> From: Joanne Koong <joannelkoong@gmail.com>
>
> > There are situations where fuse servers can become unresponsive or
> > stuck, for example if the server is deadlocked. Currently, there's no
> > good way to detect if a server is stuck and needs to be killed manually.
>
> I tried this V9 patchset and realized that upon max_request_timeout the connection will be dropped irrespectively if the task is in D state or not. I guess this is expected behavior.

Yes, the connection will be dropped regardless of what state the task is in.

> To me the concerning aspect is when tasks are going in D state because of the consequence when running with hung_task_timeout_secs and hung_task_panic=1.

Could you elaborate on this a bit more? When running with
hung_task_timeout_secs and hung_task_panic=1, how does it cause the
task to go into D state?

>
> >
> > This commit adds an option for enforcing a timeout (in minutes) for
> > requests where if the timeout elapses without the server responding to
> > the request, the connection will be automatically aborted.
> >
> > Please note that these timeouts are not 100% precise. The request may
> > take an extra FUSE_TIMEOUT_TIMER_FREQ seconds beyond the requested max
> > timeout due to how it's internally implemented.
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > Reviewed-by: Bernd Schubert <bschubert@ddn.com>
> > ---
> >  fs/fuse/dev.c    | 80 ++++++++++++++++++++++++++++++++++++++++++++++++
> >  fs/fuse/fuse_i.h | 21 +++++++++++++
> >  fs/fuse/inode.c  | 21 +++++++++++++
> >  3 files changed, 122 insertions(+)
> >
> > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> > index 29fc61a072ba..536aa4525e8f 100644
> > --- a/fs/fuse/dev.c
> > +++ b/fs/fuse/dev.c
> > @@ -45,6 +45,82 @@ static struct fuse_dev *fuse_get_dev(struct file *file)
> >       return READ_ONCE(file->private_data);
> >  }
> >
> > +static bool request_expired(struct fuse_conn *fc, struct fuse_req *req)
> > +{
> > +     return jiffies > req->create_time + fc->timeout.req_timeout;
> > +}
> > +
> > +/*
> > + * Check if any requests aren't being completed by the specified request
> > + * timeout. To do so, we:
> > + * - check the fiq pending list
> > + * - check the bg queue
> > + * - check the fpq io and processing lists
> > + *
> > + * To make this fast, we only check against the head request on each list since
> > + * these are generally queued in order of creation time (eg newer requests get
> > + * queued to the tail). We might miss a few edge cases (eg requests transitioning
> > + * between lists, re-sent requests at the head of the pending list having a
> > + * later creation time than other requests on that list, etc.) but that is fine
> > + * since if the request never gets fulfilled, it will eventually be caught.
> > + */
> > +void fuse_check_timeout(struct timer_list *timer)
> > +{
> > +     struct fuse_conn *fc = container_of(timer, struct fuse_conn, timeout.timer);
> Here this timer may get queued and if echo 1 > /sys/fs/fuse/connections/'nn'/abort is done at more or less the same time over the same connection I'm wondering what will happen?
> At least I think we may need timer_delete_sync() instead of timer_delete() in fuse_abort_conn() and potentially call it from the top of fuse_abort_conn() instead.

I don't think this is an issue because there's still a reference on
the "struct fuse_conn" when fuse_abort_conn() is called. The fuse_conn
struct is freed in fuse_conn_put() when the last refcount is dropped,
and in fuse_conn_put() there's this line

if (fc->timeout.req_timeout)
  timer_shutdown_sync(&fc->timeout.timer);

that guarantees the timer is not queued / callback of timer is not
running / cannot be rearmed.

>
> > +     struct fuse_iqueue *fiq = &fc->iq;
> > +     struct fuse_req *req;
> > +     struct fuse_dev *fud;
> > +     struct fuse_pqueue *fpq;
> > +     bool expired = false;
> > +     int i;
> > +
> > +     spin_lock(&fiq->lock);
> Do we need spin_lock_irq instead bcos this is irq context no?

Yeah, I missed that spin_lock doesn't disable interrupts. Sergey noted
this as well in [1] and for the next iteration of this patchset, I'm
planning to use use a kthread per server connection, similar to what
the hung task watchdog does.

[1] https://lore.kernel.org/linux-fsdevel/CAJnrk1bwYjHGGNsPwjsd5Kp3iL6ua_hmyN3kFZo1b8OVV9bOpw@mail.gmail.com/

>
> >
> > +static void fuse_init_fc_timeout(struct fuse_conn *fc, struct fuse_fs_context *ctx)
> > +{
> > +     if (ctx->req_timeout) {
> > +             if (check_mul_overflow(ctx->req_timeout * 60, HZ, &fc->timeout.req_timeout))
> > +                     fc->timeout.req_timeout = ULONG_MAX;
> > +             timer_setup(&fc->timeout.timer, fuse_check_timeout, 0);
> > +             mod_timer(&fc->timeout.timer, jiffies + FUSE_TIMEOUT_TIMER_FREQ);
> > +     } else {
> > +             fc->timeout.req_timeout = 0;
> > +     }
> > +}
> > +
> >  int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
> >  {
> >       struct fuse_dev *fud = NULL;
> > @@ -1753,6 +1773,7 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
> >       fc->destroy = ctx->destroy;
> >       fc->no_control = ctx->no_control;
> >       fc->no_force_umount = ctx->no_force_umount;
> > +     fuse_init_fc_timeout(fc, ctx);
>
> The max_request_timeout is latched in fc->timeout.req_timeout during fuse_fill_super_common() so any further modification are not taking into effect
> Say:
>  sudo bash -c 'echo 100 > /proc/sys/fs/fuse/max_request_timeout'
>  sshfs -o allow_other,default_permissions you@localhost:/home/you/test ./mnt
>  kill -STOP `pidof /usr/lib/openssh/sftp-server`
>  ls ./mnt/
>  ^C
>   # Trying to change back the timeout doesn't have any effect
>  sudo bash -c 'echo 10 > /proc/sys/fs/fuse/max_request_timeout'
>

This is expected behavior. The timeout that's set at mount time will
be the timeout for the duration of the connection.


Thanks,
Joanne

> Thanks
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
  2024-12-06  0:37   ` Jeff Layton
@ 2024-12-06 19:19     ` Joanne Koong
  2024-12-06 20:06       ` Jeff Layton
  0 siblings, 1 reply; 39+ messages in thread
From: Joanne Koong @ 2024-12-06 19:19 UTC (permalink / raw)
  To: Jeff Layton
  Cc: miklos, linux-fsdevel, Sergey Senozhatsky, josef, bernd.schubert,
	jefflexu, laoar.shao, kernel-team, Bernd Schubert

On Thu, Dec 5, 2024 at 4:37 PM Jeff Layton <jlayton@kernel.org> wrote:
>
> On Thu, 2024-11-14 at 11:13 -0800, Joanne Koong wrote:
> > There are situations where fuse servers can become unresponsive or
> > stuck, for example if the server is deadlocked. Currently, there's no
> > good way to detect if a server is stuck and needs to be killed manually.
> >
> > This commit adds an option for enforcing a timeout (in minutes) for
> > requests where if the timeout elapses without the server responding to
> > the request, the connection will be automatically aborted.
> >
>
> I haven't been keeping up with the earlier series, but I think I agree
> with Sergey that this timeout would be better expressed in seconds.
>
> Most filesystems that deal with timeouts (NFS, CIFS, etc.) specify them
> as a number of seconds, and expressing this in minutes goes against
> that convention. It also seems rather coarse-grained. I could easily
> see a situation where 5 minutes is too short, but 6 minutes is too
> long.

Sounds good, I'll change the timeout to seconds. The reason it was set
in minutes is because the timeouts have an upper margin of error
(right now, up to 1 minute) and I didn't want to give a misleading
illusion of precision.

It sounds like it'd be useful if the timer is run more frequently (eg
instead of every 1 minute, maybe running this every 15 or 30 secs) as
well. I'll make this change in the next version.

>
> With that too, you probably wouldn't need patch #1. You could treat it
> as a 32-bit integer and just clamp the value as necessary.
>
> > Please note that these timeouts are not 100% precise. The request may
> > take an extra FUSE_TIMEOUT_TIMER_FREQ seconds beyond the requested max
> > timeout due to how it's internally implemented.
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > Reviewed-by: Bernd Schubert <bschubert@ddn.com>
> > ---
> >  fs/fuse/dev.c    | 80 ++++++++++++++++++++++++++++++++++++++++++++++++
> >  fs/fuse/fuse_i.h | 21 +++++++++++++
> >  fs/fuse/inode.c  | 21 +++++++++++++
> >  3 files changed, 122 insertions(+)
> >
...
>
> --
> Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
  2024-12-06 19:05   ` Joanne Koong
@ 2024-12-06 19:56     ` Etienne
  2024-12-06 21:52       ` Joanne Koong
  0 siblings, 1 reply; 39+ messages in thread
From: Etienne @ 2024-12-06 19:56 UTC (permalink / raw)
  To: Joanne Koong
  Cc: miklos, linux-fsdevel, bernd.schubert, bschubert, jefflexu, josef,
	kernel-team, laoar.shao

> > I tried this V9 patchset and realized that upon max_request_timeout the connection will be dropped irrespectively if the task is in D state or not. I guess this is expected behavior.
>
> Yes, the connection will be dropped regardless of what state the task is in.
Thanks for confirmation

>
> > To me the concerning aspect is when tasks are going in D state because of the consequence when running with hung_task_timeout_secs and hung_task_panic=1.
>
> Could you elaborate on this a bit more? When running with
> hung_task_timeout_secs and hung_task_panic=1, how does it cause the
> task to go into D state?

Sorry for the confusion. It doesn't cause tasks to go in D state.
What I meant is that I've been looking for a way to terminate tasks
stuck in D state because we have hung_task_panic=1 and this is causing
bad consequences when they trigger the hung task timer.

> > Here this timer may get queued and if echo 1 > /sys/fs/fuse/connections/'nn'/abort is done at more or less the same time over the same connection I'm wondering what will happen?
> > At least I think we may need timer_delete_sync() instead of timer_delete() in fuse_abort_conn() and potentially call it from the top of fuse_abort_conn() instead.
>
> I don't think this is an issue because there's still a reference on
> the "struct fuse_conn" when fuse_abort_conn() is called. The fuse_conn
> struct is freed in fuse_conn_put() when the last refcount is dropped,
> and in fuse_conn_put() there's this line
>
> if (fc->timeout.req_timeout)
>   timer_shutdown_sync(&fc->timeout.timer);
>
> that guarantees the timer is not queued / callback of timer is not
> running / cannot be rearmed.

Got it. I missed that last part in fuse_conn_put()
Thanks

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
  2024-12-06 19:19     ` Joanne Koong
@ 2024-12-06 20:06       ` Jeff Layton
  0 siblings, 0 replies; 39+ messages in thread
From: Jeff Layton @ 2024-12-06 20:06 UTC (permalink / raw)
  To: Joanne Koong
  Cc: miklos, linux-fsdevel, Sergey Senozhatsky, josef, bernd.schubert,
	jefflexu, laoar.shao, kernel-team, Bernd Schubert

On Fri, 2024-12-06 at 11:19 -0800, Joanne Koong wrote:
> On Thu, Dec 5, 2024 at 4:37 PM Jeff Layton <jlayton@kernel.org> wrote:
> > 
> > On Thu, 2024-11-14 at 11:13 -0800, Joanne Koong wrote:
> > > There are situations where fuse servers can become unresponsive or
> > > stuck, for example if the server is deadlocked. Currently, there's no
> > > good way to detect if a server is stuck and needs to be killed manually.
> > > 
> > > This commit adds an option for enforcing a timeout (in minutes) for
> > > requests where if the timeout elapses without the server responding to
> > > the request, the connection will be automatically aborted.
> > > 
> > 
> > I haven't been keeping up with the earlier series, but I think I agree
> > with Sergey that this timeout would be better expressed in seconds.
> > 
> > Most filesystems that deal with timeouts (NFS, CIFS, etc.) specify them
> > as a number of seconds, and expressing this in minutes goes against
> > that convention. It also seems rather coarse-grained. I could easily
> > see a situation where 5 minutes is too short, but 6 minutes is too
> > long.
> 
> Sounds good, I'll change the timeout to seconds. The reason it was set
> in minutes is because the timeouts have an upper margin of error
> (right now, up to 1 minute) and I didn't want to give a misleading
> illusion of precision.
> 

Understood. That is a concern, but I think you can mitigate it by
documenting that the timeout just makes it eligible to be reaped,
rather than a hard timeout guarantee. That would also allow you to
change the implementation in the future to reap more (or less)
aggressively, without breaking the interface.

> It sounds like it'd be useful if the timer is run more frequently (eg
> instead of every 1 minute, maybe running this every 15 or 30 secs) as
> well. I'll make this change in the next version.
> 

Yeah, 1 minute seems like quite a long time. I don't know FUSE well
enough to get a feel for how heavyweight fuse_check_timeout() is, but
it might be good to check more often, especially if you don't expect it
to take long in the common case.


> > 
> > With that too, you probably wouldn't need patch #1. You could treat it
> > as a 32-bit integer and just clamp the value as necessary.
> > 
> > > Please note that these timeouts are not 100% precise. The request may
> > > take an extra FUSE_TIMEOUT_TIMER_FREQ seconds beyond the requested max
> > > timeout due to how it's internally implemented.
> > > 
> > > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > > Reviewed-by: Bernd Schubert <bschubert@ddn.com>
> > > ---
> > >  fs/fuse/dev.c    | 80 ++++++++++++++++++++++++++++++++++++++++++++++++
> > >  fs/fuse/fuse_i.h | 21 +++++++++++++
> > >  fs/fuse/inode.c  | 21 +++++++++++++
> > >  3 files changed, 122 insertions(+)
> > > 
> ...
> > 
> > --
> > Jeff Layton <jlayton@kernel.org>

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
  2024-11-14 19:13 ` [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests Joanne Koong
                     ` (6 preceding siblings ...)
  2024-12-06 19:05   ` Joanne Koong
@ 2024-12-06 20:12   ` Jeff Layton
  2024-12-06 21:54     ` Joanne Koong
  7 siblings, 1 reply; 39+ messages in thread
From: Jeff Layton @ 2024-12-06 20:12 UTC (permalink / raw)
  To: Joanne Koong, miklos, linux-fsdevel
  Cc: josef, bernd.schubert, jefflexu, laoar.shao, kernel-team,
	Bernd Schubert

On Thu, 2024-11-14 at 11:13 -0800, Joanne Koong wrote:
> There are situations where fuse servers can become unresponsive or
> stuck, for example if the server is deadlocked. Currently, there's no
> good way to detect if a server is stuck and needs to be killed manually.
> 
> This commit adds an option for enforcing a timeout (in minutes) for
> requests where if the timeout elapses without the server responding to
> the request, the connection will be automatically aborted.
> 
> Please note that these timeouts are not 100% precise. The request may
> take an extra FUSE_TIMEOUT_TIMER_FREQ seconds beyond the requested max
> timeout due to how it's internally implemented.
> 
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> Reviewed-by: Bernd Schubert <bschubert@ddn.com>
> ---
>  fs/fuse/dev.c    | 80 ++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/fuse/fuse_i.h | 21 +++++++++++++
>  fs/fuse/inode.c  | 21 +++++++++++++
>  3 files changed, 122 insertions(+)
> 
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index 29fc61a072ba..536aa4525e8f 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -45,6 +45,82 @@ static struct fuse_dev *fuse_get_dev(struct file *file)
>  	return READ_ONCE(file->private_data);
>  }
>  
> +static bool request_expired(struct fuse_conn *fc, struct fuse_req *req)
> +{
> +	return jiffies > req->create_time + fc->timeout.req_timeout;
> +}
> +
> +/*
> + * Check if any requests aren't being completed by the specified request
> + * timeout. To do so, we:
> + * - check the fiq pending list
> + * - check the bg queue
> + * - check the fpq io and processing lists
> + *
> + * To make this fast, we only check against the head request on each list since
> + * these are generally queued in order of creation time (eg newer requests get
> + * queued to the tail). We might miss a few edge cases (eg requests transitioning
> + * between lists, re-sent requests at the head of the pending list having a
> + * later creation time than other requests on that list, etc.) but that is fine
> + * since if the request never gets fulfilled, it will eventually be caught.
> + */
> +void fuse_check_timeout(struct timer_list *timer)
> +{
> +	struct fuse_conn *fc = container_of(timer, struct fuse_conn, timeout.timer);
> +	struct fuse_iqueue *fiq = &fc->iq;
> +	struct fuse_req *req;
> +	struct fuse_dev *fud;
> +	struct fuse_pqueue *fpq;
> +	bool expired = false;
> +	int i;
> +
> +	spin_lock(&fiq->lock);
> +	req = list_first_entry_or_null(&fiq->pending, struct fuse_req, list);
> +	if (req)
> +		expired = request_expired(fc, req);
> +	spin_unlock(&fiq->lock);
> +	if (expired)
> +		goto abort_conn;
> +
> +	spin_lock(&fc->bg_lock);
> +	req = list_first_entry_or_null(&fc->bg_queue, struct fuse_req, list);
> +	if (req)
> +		expired = request_expired(fc, req);
> +	spin_unlock(&fc->bg_lock);
> +	if (expired)
> +		goto abort_conn;
> +
> +	spin_lock(&fc->lock);
> +	if (!fc->connected) {
> +		spin_unlock(&fc->lock);
> +		return;
> +	}
> +	list_for_each_entry(fud, &fc->devices, entry) {
> +		fpq = &fud->pq;
> +		spin_lock(&fpq->lock);
> +		req = list_first_entry_or_null(&fpq->io, struct fuse_req, list);
> +		if (req && request_expired(fc, req))
> +			goto fpq_abort;
> +
> +		for (i = 0; i < FUSE_PQ_HASH_SIZE; i++) {
> +			req = list_first_entry_or_null(&fpq->processing[i], struct fuse_req, list);
> +			if (req && request_expired(fc, req))
> +				goto fpq_abort;
> +		}
> +		spin_unlock(&fpq->lock);
> +	}
> +	spin_unlock(&fc->lock);
> +
> +	mod_timer(&fc->timeout.timer, jiffies + FUSE_TIMEOUT_TIMER_FREQ);
> +	return;
> +
> +fpq_abort:
> +	spin_unlock(&fpq->lock);
> +	spin_unlock(&fc->lock);
> +abort_conn:
> +	fuse_abort_conn(fc);
> +}
> +
>  static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
>  {
>  	INIT_LIST_HEAD(&req->list);
> @@ -53,6 +129,7 @@ static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
>  	refcount_set(&req->count, 1);
>  	__set_bit(FR_PENDING, &req->flags);
>  	req->fm = fm;
> +	req->create_time = jiffies;
>  }
>  
>  static struct fuse_req *fuse_request_alloc(struct fuse_mount *fm, gfp_t flags)
> @@ -2308,6 +2385,9 @@ void fuse_abort_conn(struct fuse_conn *fc)
>  		spin_unlock(&fc->lock);
>  
>  		end_requests(&to_end);
> +
> +		if (fc->timeout.req_timeout)
> +			timer_delete(&fc->timeout.timer);
>  	} else {
>  		spin_unlock(&fc->lock);
>  	}
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index d35c37ccf9b5..9092201c4e0b 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -438,6 +438,9 @@ struct fuse_req {
>  
>  	/** fuse_mount this request belongs to */
>  	struct fuse_mount *fm;
> +
> +	/** When (in jiffies) the request was created */
> +	unsigned long create_time;
>  };
>  
>  struct fuse_iqueue;
> @@ -528,6 +531,16 @@ struct fuse_pqueue {
>  	struct list_head io;
>  };
>  
> +/* Frequency (in seconds) of request timeout checks, if opted into */
> +#define FUSE_TIMEOUT_TIMER_FREQ 60 * HZ
> +
> +struct fuse_timeout {
> +	struct timer_list timer;
> +
> +	/* Request timeout (in jiffies). 0 = no timeout */
> +	unsigned long req_timeout;
> +};
> +
>  /**
>   * Fuse device instance
>   */
> @@ -574,6 +587,8 @@ struct fuse_fs_context {
>  	enum fuse_dax_mode dax_mode;
>  	unsigned int max_read;
>  	unsigned int blksize;
> +	/*  Request timeout (in minutes). 0 = no timeout (infinite wait) */
> +	unsigned int req_timeout;
>  	const char *subtype;
>  
>  	/* DAX device, may be NULL */
> @@ -920,6 +935,9 @@ struct fuse_conn {
>  	/** IDR for backing files ids */
>  	struct idr backing_files_map;
>  #endif
> +
> +	/** Only used if the connection enforces request timeouts */
> +	struct fuse_timeout timeout;
>  };
>  
>  /*
> @@ -1181,6 +1199,9 @@ void fuse_request_end(struct fuse_req *req);
>  void fuse_abort_conn(struct fuse_conn *fc);
>  void fuse_wait_aborted(struct fuse_conn *fc);
>  
> +/* Check if any requests timed out */
> +void fuse_check_timeout(struct timer_list *timer);
> +
>  /**
>   * Invalidate inode attributes
>   */
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index f1779ff3f8d1..ee006f09cd04 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -735,6 +735,7 @@ enum {
>  	OPT_ALLOW_OTHER,
>  	OPT_MAX_READ,
>  	OPT_BLKSIZE,
> +	OPT_REQUEST_TIMEOUT,
>  	OPT_ERR
>  };
>  
> @@ -749,6 +750,7 @@ static const struct fs_parameter_spec fuse_fs_parameters[] = {
>  	fsparam_u32	("max_read",		OPT_MAX_READ),
>  	fsparam_u32	("blksize",		OPT_BLKSIZE),
>  	fsparam_string	("subtype",		OPT_SUBTYPE),
> +	fsparam_u16	("request_timeout",	OPT_REQUEST_TIMEOUT),
>  	{}
>  };
>  
> @@ -844,6 +846,10 @@ static int fuse_parse_param(struct fs_context *fsc, struct fs_parameter *param)
>  		ctx->blksize = result.uint_32;
>  		break;
>  
> +	case OPT_REQUEST_TIMEOUT:
> +		ctx->req_timeout = result.uint_16;
> +		break;
> +
>  	default:
>  		return -EINVAL;
>  	}
> @@ -973,6 +979,8 @@ void fuse_conn_put(struct fuse_conn *fc)
>  
>  		if (IS_ENABLED(CONFIG_FUSE_DAX))
>  			fuse_dax_conn_free(fc);
> +		if (fc->timeout.req_timeout)
> +			timer_shutdown_sync(&fc->timeout.timer);
>  		if (fiq->ops->release)
>  			fiq->ops->release(fiq);
>  		put_pid_ns(fc->pid_ns);
> @@ -1691,6 +1699,18 @@ int fuse_init_fs_context_submount(struct fs_context *fsc)
>  }
>  EXPORT_SYMBOL_GPL(fuse_init_fs_context_submount);
>  
> +static void fuse_init_fc_timeout(struct fuse_conn *fc, struct fuse_fs_context *ctx)
> +{
> +	if (ctx->req_timeout) {
> +		if (check_mul_overflow(ctx->req_timeout * 60, HZ, &fc->timeout.req_timeout))
> +			fc->timeout.req_timeout = ULONG_MAX;
> +		timer_setup(&fc->timeout.timer, fuse_check_timeout, 0);
> +		mod_timer(&fc->timeout.timer, jiffies + FUSE_TIMEOUT_TIMER_FREQ);
> +	} else {
> +		fc->timeout.req_timeout = 0;
> +	}
> +}
> +


Does fuse_check_timeout need to run in IRQ context? It doesn't seem
like it does. Have you considered setting up a recurring delayed
workqueue job instead? That would run in process context, which might
make the locking in that function less hairy.


>  int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
>  {
>  	struct fuse_dev *fud = NULL;
> @@ -1753,6 +1773,7 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
>  	fc->destroy = ctx->destroy;
>  	fc->no_control = ctx->no_control;
>  	fc->no_force_umount = ctx->no_force_umount;
> +	fuse_init_fc_timeout(fc, ctx);
>  
>  	err = -ENOMEM;
>  	root = fuse_get_root_inode(sb, ctx->rootmode);

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
  2024-12-06 19:56     ` Etienne
@ 2024-12-06 21:52       ` Joanne Koong
  0 siblings, 0 replies; 39+ messages in thread
From: Joanne Koong @ 2024-12-06 21:52 UTC (permalink / raw)
  To: Etienne
  Cc: miklos, linux-fsdevel, bernd.schubert, bschubert, jefflexu, josef,
	kernel-team, laoar.shao

On Fri, Dec 6, 2024 at 11:56 AM Etienne <etmartin4313@gmail.com> wrote:
>
> > > I tried this V9 patchset and realized that upon max_request_timeout the connection will be dropped irrespectively if the task is in D state or not. I guess this is expected behavior.
> >
> > Yes, the connection will be dropped regardless of what state the task is in.
> Thanks for confirmation
>
> >
> > > To me the concerning aspect is when tasks are going in D state because of the consequence when running with hung_task_timeout_secs and hung_task_panic=1.
> >
> > Could you elaborate on this a bit more? When running with
> > hung_task_timeout_secs and hung_task_panic=1, how does it cause the
> > task to go into D state?
>
> Sorry for the confusion. It doesn't cause tasks to go in D state.
> What I meant is that I've been looking for a way to terminate tasks
> stuck in D state because we have hung_task_panic=1 and this is causing
> bad consequences when they trigger the hung task timer.

Gotcha, thanks for clarifying!
>
> > > Here this timer may get queued and if echo 1 > /sys/fs/fuse/connections/'nn'/abort is done at more or less the same time over the same connection I'm wondering what will happen?
> > > At least I think we may need timer_delete_sync() instead of timer_delete() in fuse_abort_conn() and potentially call it from the top of fuse_abort_conn() instead.
> >
> > I don't think this is an issue because there's still a reference on
> > the "struct fuse_conn" when fuse_abort_conn() is called. The fuse_conn
> > struct is freed in fuse_conn_put() when the last refcount is dropped,
> > and in fuse_conn_put() there's this line
> >
> > if (fc->timeout.req_timeout)
> >   timer_shutdown_sync(&fc->timeout.timer);
> >
> > that guarantees the timer is not queued / callback of timer is not
> > running / cannot be rearmed.
>
> Got it. I missed that last part in fuse_conn_put()
> Thanks

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
  2024-12-06 20:12   ` Jeff Layton
@ 2024-12-06 21:54     ` Joanne Koong
  0 siblings, 0 replies; 39+ messages in thread
From: Joanne Koong @ 2024-12-06 21:54 UTC (permalink / raw)
  To: Jeff Layton
  Cc: miklos, linux-fsdevel, josef, bernd.schubert, jefflexu,
	laoar.shao, kernel-team, Bernd Schubert

On Fri, Dec 6, 2024 at 12:12 PM Jeff Layton <jlayton@kernel.org> wrote:
>
> On Thu, 2024-11-14 at 11:13 -0800, Joanne Koong wrote:
> > There are situations where fuse servers can become unresponsive or
> > stuck, for example if the server is deadlocked. Currently, there's no
> > good way to detect if a server is stuck and needs to be killed manually.
> >
> > This commit adds an option for enforcing a timeout (in minutes) for
> > requests where if the timeout elapses without the server responding to
> > the request, the connection will be automatically aborted.
> >
> > Please note that these timeouts are not 100% precise. The request may
> > take an extra FUSE_TIMEOUT_TIMER_FREQ seconds beyond the requested max
> > timeout due to how it's internally implemented.
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > Reviewed-by: Bernd Schubert <bschubert@ddn.com>
> > ---
> >  fs/fuse/dev.c    | 80 ++++++++++++++++++++++++++++++++++++++++++++++++
> >  fs/fuse/fuse_i.h | 21 +++++++++++++
> >  fs/fuse/inode.c  | 21 +++++++++++++
> >  3 files changed, 122 insertions(+)
> >
> > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> > index 29fc61a072ba..536aa4525e8f 100644
> > --- a/fs/fuse/dev.c
> > +++ b/fs/fuse/dev.c
> >               return -EINVAL;
> >       }
> > @@ -973,6 +979,8 @@ void fuse_conn_put(struct fuse_conn *fc)
> >
> >               if (IS_ENABLED(CONFIG_FUSE_DAX))
> >                       fuse_dax_conn_free(fc);
> > +             if (fc->timeout.req_timeout)
> > +                     timer_shutdown_sync(&fc->timeout.timer);
> >               if (fiq->ops->release)
> >                       fiq->ops->release(fiq);
> >               put_pid_ns(fc->pid_ns);
> > @@ -1691,6 +1699,18 @@ int fuse_init_fs_context_submount(struct fs_context *fsc)
> >  }
> >  EXPORT_SYMBOL_GPL(fuse_init_fs_context_submount);
> >
> > +static void fuse_init_fc_timeout(struct fuse_conn *fc, struct fuse_fs_context *ctx)
> > +{
> > +     if (ctx->req_timeout) {
> > +             if (check_mul_overflow(ctx->req_timeout * 60, HZ, &fc->timeout.req_timeout))
> > +                     fc->timeout.req_timeout = ULONG_MAX;
> > +             timer_setup(&fc->timeout.timer, fuse_check_timeout, 0);
> > +             mod_timer(&fc->timeout.timer, jiffies + FUSE_TIMEOUT_TIMER_FREQ);
> > +     } else {
> > +             fc->timeout.req_timeout = 0;
> > +     }
> > +}
> > +
>
>
> Does fuse_check_timeout need to run in IRQ context? It doesn't seem
> like it does. Have you considered setting up a recurring delayed
> workqueue job instead? That would run in process context, which might
> make the locking in that function less hairy.
>

Great idea, I'll use a recurring delayed workqueue job instead of a
kthread, since it's more lightweight.


Thanks,
Joanne
>
> >  int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
> >  {
> >       struct fuse_dev *fud = NULL;
> > @@ -1753,6 +1773,7 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
> >       fc->destroy = ctx->destroy;
> >       fc->no_control = ctx->no_control;
> >       fc->no_force_umount = ctx->no_force_umount;
> > +     fuse_init_fc_timeout(fc, ctx);
> >
> >       err = -ENOMEM;
> >       root = fuse_get_root_inode(sb, ctx->rootmode);
>
> --
> Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2024-12-06 21:54 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-14 19:13 [PATCH RESEND v9 0/3] fuse: add kernel-enforced request timeout option Joanne Koong
2024-11-14 19:13 ` [PATCH RESEND v9 1/3] fs_parser: add fsparam_u16 helper Joanne Koong
2024-11-14 19:13 ` [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests Joanne Koong
2024-11-28 10:44   ` Sergey Senozhatsky
2024-11-28 11:00     ` Bernd Schubert
2024-11-28 11:09       ` Sergey Senozhatsky
2024-11-28 11:23         ` Bernd Schubert
2024-11-28 11:54           ` Sergey Senozhatsky
2024-12-02  9:45             ` Tomasz Figa
2024-12-02 14:43               ` Bernd Schubert
2024-12-02 19:29                 ` Joanne Koong
2024-12-03  4:31                   ` Sergey Senozhatsky
2024-12-03  5:11                     ` Joanne Koong
2024-12-04 14:40                   ` Tomasz Figa
2024-12-04 14:51                     ` Brian Geffon
2024-12-04 15:09                       ` Bernd Schubert
2024-12-05  3:23                       ` Sergey Senozhatsky
2024-12-05 11:07                         ` Brian Geffon
2024-12-03 11:01   ` Sergey Senozhatsky
2024-12-06  0:06     ` Joanne Koong
2024-12-06  4:28       ` Sergey Senozhatsky
2024-12-04 13:14   ` Sergey Senozhatsky
2024-12-05 23:10     ` Joanne Koong
2024-12-04 13:47   ` Sergey Senozhatsky
2024-12-05 23:29     ` Joanne Koong
2024-12-06  3:26       ` Sergey Senozhatsky
2024-12-06  0:37   ` Jeff Layton
2024-12-06 19:19     ` Joanne Koong
2024-12-06 20:06       ` Jeff Layton
2024-12-06 16:30   ` Etienne Martineau
2024-12-06 19:05   ` Joanne Koong
2024-12-06 19:56     ` Etienne
2024-12-06 21:52       ` Joanne Koong
2024-12-06 20:12   ` Jeff Layton
2024-12-06 21:54     ` Joanne Koong
2024-11-14 19:13 ` [PATCH RESEND v9 3/3] fuse: add default_request_timeout and max_request_timeout sysctls Joanne Koong
2024-12-06  0:43   ` Jeff Layton
2024-12-06 18:16     ` Joanne Koong
2024-11-21 22:02 ` [PATCH RESEND v9 0/3] fuse: add kernel-enforced request timeout option Josef Bacik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox