linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] fuse: add timeout option for requests
@ 2024-07-30  0:23 Joanne Koong
  2024-07-30  0:23 ` [PATCH v2 1/2] fuse: add optional kernel-enforced timeout " Joanne Koong
                   ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: Joanne Koong @ 2024-07-30  0:23 UTC (permalink / raw)
  To: miklos, linux-fsdevel; +Cc: josef, bernd.schubert, laoar.shao, kernel-team

There are situations where fuse servers can become unresponsive or take
too long to reply to a request. Currently there is no upper bound on
how long a request may take, which may be frustrating to users who get
stuck waiting for a request to complete.

This patchset adds a timeout option for requests and two dynamically
configurable fuse sysctls "default_request_timeout" and "max_request_timeout"
for controlling/enforcing timeout behavior system-wide.

Existing fuse servers will not be affected unless they explicitly opt into the
timeout.

v1: https://lore.kernel.org/linux-fsdevel/20240717213458.1613347-1-joannelkoong@gmail.com/
Changes from v1:
- Add timeout for background requests
- Handle resend race condition
- Add sysctls

Joanne Koong (2):
  fuse: add optional kernel-enforced timeout for requests
  fuse: add default_request_timeout and max_request_timeout sysctls

 Documentation/admin-guide/sysctl/fs.rst |  17 +++
 fs/fuse/Makefile                        |   2 +-
 fs/fuse/dev.c                           | 187 +++++++++++++++++++++++-
 fs/fuse/fuse_i.h                        |  30 ++++
 fs/fuse/inode.c                         |  24 +++
 fs/fuse/sysctl.c                        |  42 ++++++
 6 files changed, 293 insertions(+), 9 deletions(-)
 create mode 100644 fs/fuse/sysctl.c

-- 
2.43.0


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v2 1/2] fuse: add optional kernel-enforced timeout for requests
  2024-07-30  0:23 [PATCH v2 0/2] fuse: add timeout option for requests Joanne Koong
@ 2024-07-30  0:23 ` Joanne Koong
  2024-08-04 22:46   ` Bernd Schubert
                     ` (2 more replies)
  2024-07-30  0:23 ` [PATCH v2 2/2] fuse: add default_request_timeout and max_request_timeout sysctls Joanne Koong
  2024-07-30  5:59 ` [PATCH v2 0/2] fuse: add timeout option for requests Yafang Shao
  2 siblings, 3 replies; 34+ messages in thread
From: Joanne Koong @ 2024-07-30  0:23 UTC (permalink / raw)
  To: miklos, linux-fsdevel; +Cc: josef, bernd.schubert, laoar.shao, kernel-team

There are situations where fuse servers can become unresponsive or take
too long to reply to a request. Currently there is no upper bound on
how long a request may take, which may be frustrating to users who get
stuck waiting for a request to complete.

This commit adds a timeout option (in seconds) for requests. If the
timeout elapses before the server replies to the request, the request
will fail with -ETIME.

There are 3 possibilities for a request that times out:
a) The request times out before the request has been sent to userspace
b) The request times out after the request has been sent to userspace
and before it receives a reply from the server
c) The request times out after the request has been sent to userspace
and the server replies while the kernel is timing out the request

While a request timeout is being handled, there may be other handlers
running at the same time if:
a) the kernel is forwarding the request to the server
b) the kernel is processing the server's reply to the request
c) the request is being re-sent
d) the connection is aborting
e) the device is getting released

Proper synchronization must be added to ensure that the request is
handled correctly in all of these cases. To this effect, there is a new
FR_FINISHING bit added to the request flags, which is set atomically by
either the timeout handler (see fuse_request_timeout()) which is invoked
after the request timeout elapses or set by the request reply handler
(see dev_do_write()), whichever gets there first. If the reply handler
and the timeout handler are executing simultaneously and the reply handler
sets FR_FINISHING before the timeout handler, then the request will be
handled as if the timeout did not elapse. If the timeout handler sets
FR_FINISHING before the reply handler, then the request will fail with
-ETIME and the request will be cleaned up.

Currently, this is the refcount lifecycle of a request:

Synchronous request is created:
fuse_simple_request -> allocates request, sets refcount to 1
  __fuse_request_send -> acquires refcount
    queues request and waits for reply...
fuse_simple_request -> drops refcount

Background request is created:
fuse_simple_background -> allocates request, sets refcount to 1

Request is replied to:
fuse_dev_do_write
  fuse_request_end -> drops refcount on request

Proper acquires on the request reference must be added to ensure that the
timeout handler does not drop the last refcount on the request while
other handlers may be operating on the request. Please note that the
timeout handler may get invoked at any phase of the request's
lifetime (eg before the request has been forwarded to userspace, etc).

It is always guaranteed that there is a refcount on the request when the
timeout handler is executing. The timeout handler will be either
deactivated by the reply/abort/release handlers, or if the timeout
handler is concurrently executing on another CPU, the reply/abort/release
handlers will wait for the timeout handler to finish executing first before
it drops the final refcount on the request.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 fs/fuse/dev.c    | 187 +++++++++++++++++++++++++++++++++++++++++++++--
 fs/fuse/fuse_i.h |  14 ++++
 fs/fuse/inode.c  |   7 ++
 3 files changed, 200 insertions(+), 8 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 9eb191b5c4de..9992bc5f4469 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -31,6 +31,8 @@ MODULE_ALIAS("devname:fuse");
 
 static struct kmem_cache *fuse_req_cachep;
 
+static void fuse_request_timeout(struct timer_list *timer);
+
 static struct fuse_dev *fuse_get_dev(struct file *file)
 {
 	/*
@@ -48,6 +50,8 @@ static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
 	refcount_set(&req->count, 1);
 	__set_bit(FR_PENDING, &req->flags);
 	req->fm = fm;
+	if (fm->fc->req_timeout)
+		timer_setup(&req->timer, fuse_request_timeout, 0);
 }
 
 static struct fuse_req *fuse_request_alloc(struct fuse_mount *fm, gfp_t flags)
@@ -277,12 +281,15 @@ static void flush_bg_queue(struct fuse_conn *fc)
  * the 'end' callback is called if given, else the reference to the
  * request is released
  */
-void fuse_request_end(struct fuse_req *req)
+static void do_fuse_request_end(struct fuse_req *req, bool from_timer_callback)
 {
 	struct fuse_mount *fm = req->fm;
 	struct fuse_conn *fc = fm->fc;
 	struct fuse_iqueue *fiq = &fc->iq;
 
+	if (from_timer_callback)
+		req->out.h.error = -ETIME;
+
 	if (test_and_set_bit(FR_FINISHED, &req->flags))
 		goto put_request;
 
@@ -296,8 +303,6 @@ void fuse_request_end(struct fuse_req *req)
 		list_del_init(&req->intr_entry);
 		spin_unlock(&fiq->lock);
 	}
-	WARN_ON(test_bit(FR_PENDING, &req->flags));
-	WARN_ON(test_bit(FR_SENT, &req->flags));
 	if (test_bit(FR_BACKGROUND, &req->flags)) {
 		spin_lock(&fc->bg_lock);
 		clear_bit(FR_BACKGROUND, &req->flags);
@@ -324,13 +329,105 @@ void fuse_request_end(struct fuse_req *req)
 		wake_up(&req->waitq);
 	}
 
+	if (!from_timer_callback && req->timer.function)
+		timer_delete_sync(&req->timer);
+
 	if (test_bit(FR_ASYNC, &req->flags))
 		req->args->end(fm, req->args, req->out.h.error);
 put_request:
 	fuse_put_request(req);
 }
+
+void fuse_request_end(struct fuse_req *req)
+{
+	WARN_ON(test_bit(FR_PENDING, &req->flags));
+	WARN_ON(test_bit(FR_SENT, &req->flags));
+
+	do_fuse_request_end(req, false);
+}
 EXPORT_SYMBOL_GPL(fuse_request_end);
 
+static void timeout_inflight_req(struct fuse_req *req)
+{
+	struct fuse_conn *fc = req->fm->fc;
+	struct fuse_iqueue *fiq = &fc->iq;
+	struct fuse_pqueue *fpq;
+
+	spin_lock(&fiq->lock);
+	fpq = req->fpq;
+	spin_unlock(&fiq->lock);
+
+	/*
+	 * If fpq has not been set yet, then the request is aborting (which
+	 * clears FR_PENDING flag) before dev_do_read (which sets req->fpq)
+	 * has been called. Let the abort handler handle this request.
+	 */
+	if (!fpq)
+		return;
+
+	spin_lock(&fpq->lock);
+	if (!fpq->connected || req->out.h.error == -ECONNABORTED) {
+		/*
+		 * Connection is being aborted or the fuse_dev is being released.
+		 * The abort / release will clean up the request
+		 */
+		spin_unlock(&fpq->lock);
+		return;
+	}
+
+	if (!test_bit(FR_PRIVATE, &req->flags))
+		list_del_init(&req->list);
+
+	spin_unlock(&fpq->lock);
+
+	do_fuse_request_end(req, true);
+}
+
+static void timeout_pending_req(struct fuse_req *req)
+{
+	struct fuse_conn *fc = req->fm->fc;
+	struct fuse_iqueue *fiq = &fc->iq;
+	bool background = test_bit(FR_BACKGROUND, &req->flags);
+
+	if (background)
+		spin_lock(&fc->bg_lock);
+	spin_lock(&fiq->lock);
+
+	if (!test_bit(FR_PENDING, &req->flags)) {
+		spin_unlock(&fiq->lock);
+		if (background)
+			spin_unlock(&fc->bg_lock);
+		timeout_inflight_req(req);
+		return;
+	}
+
+	if (!test_bit(FR_PRIVATE, &req->flags))
+		list_del_init(&req->list);
+
+	spin_unlock(&fiq->lock);
+	if (background)
+		spin_unlock(&fc->bg_lock);
+
+	do_fuse_request_end(req, true);
+}
+
+static void fuse_request_timeout(struct timer_list *timer)
+{
+	struct fuse_req *req = container_of(timer, struct fuse_req, timer);
+
+	/*
+	 * Request reply is being finished by the kernel right now.
+	 * No need to time out the request.
+	 */
+	if (test_and_set_bit(FR_FINISHING, &req->flags))
+		return;
+
+	if (test_bit(FR_PENDING, &req->flags))
+		timeout_pending_req(req);
+	else
+		timeout_inflight_req(req);
+}
+
 static int queue_interrupt(struct fuse_req *req)
 {
 	struct fuse_iqueue *fiq = &req->fm->fc->iq;
@@ -409,7 +506,8 @@ static void request_wait_answer(struct fuse_req *req)
 
 static void __fuse_request_send(struct fuse_req *req)
 {
-	struct fuse_iqueue *fiq = &req->fm->fc->iq;
+	struct fuse_conn *fc = req->fm->fc;
+	struct fuse_iqueue *fiq = &fc->iq;
 
 	BUG_ON(test_bit(FR_BACKGROUND, &req->flags));
 	spin_lock(&fiq->lock);
@@ -421,6 +519,10 @@ static void __fuse_request_send(struct fuse_req *req)
 		/* acquire extra reference, since request is still needed
 		   after fuse_request_end() */
 		__fuse_get_request(req);
+		if (req->timer.function) {
+			req->timer.expires = jiffies + fc->req_timeout;
+			add_timer(&req->timer);
+		}
 		queue_request_and_unlock(fiq, req);
 
 		request_wait_answer(req);
@@ -539,6 +641,10 @@ static bool fuse_request_queue_background(struct fuse_req *req)
 		if (fc->num_background == fc->max_background)
 			fc->blocked = 1;
 		list_add_tail(&req->list, &fc->bg_queue);
+		if (req->timer.function) {
+			req->timer.expires = jiffies + fc->req_timeout;
+			add_timer(&req->timer);
+		}
 		flush_bg_queue(fc);
 		queued = true;
 	}
@@ -1268,6 +1374,9 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
 	req = list_entry(fiq->pending.next, struct fuse_req, list);
 	clear_bit(FR_PENDING, &req->flags);
 	list_del_init(&req->list);
+	/* Acquire a reference in case the timeout handler starts executing */
+	__fuse_get_request(req);
+	req->fpq = fpq;
 	spin_unlock(&fiq->lock);
 
 	args = req->args;
@@ -1280,6 +1389,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
 		if (args->opcode == FUSE_SETXATTR)
 			req->out.h.error = -E2BIG;
 		fuse_request_end(req);
+		fuse_put_request(req);
 		goto restart;
 	}
 	spin_lock(&fpq->lock);
@@ -1316,13 +1426,33 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
 	}
 	hash = fuse_req_hash(req->in.h.unique);
 	list_move_tail(&req->list, &fpq->processing[hash]);
-	__fuse_get_request(req);
 	set_bit(FR_SENT, &req->flags);
 	spin_unlock(&fpq->lock);
 	/* matches barrier in request_wait_answer() */
 	smp_mb__after_atomic();
 	if (test_bit(FR_INTERRUPTED, &req->flags))
 		queue_interrupt(req);
+
+	/*
+	 * Check if the timeout handler is running / ran. If it did, we need to
+	 * remove the request from any lists in case the timeout handler finished
+	 * before dev_do_read moved the request to the processing list.
+	 *
+	 * Check FR_SENT to distinguish whether the timeout or the write handler
+	 * is finishing the request. However, there can be the case where the
+	 * timeout handler and resend handler are running concurrently, so we
+	 * need to also check the FR_PENDING bit.
+	 */
+	if (test_bit(FR_FINISHING, &req->flags) &&
+	    (test_bit(FR_SENT, &req->flags) || test_bit(FR_PENDING, &req->flags))) {
+		spin_lock(&fpq->lock);
+		if (!test_bit(FR_PRIVATE, &req->flags))
+			list_del_init(&req->list);
+		spin_unlock(&fpq->lock);
+		fuse_put_request(req);
+		return -ETIME;
+	}
+
 	fuse_put_request(req);
 
 	return reqsize;
@@ -1332,6 +1462,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
 		list_del_init(&req->list);
 	spin_unlock(&fpq->lock);
 	fuse_request_end(req);
+	fuse_put_request(req);
 	return err;
 
  err_unlock:
@@ -1806,8 +1937,25 @@ static void fuse_resend(struct fuse_conn *fc)
 		struct fuse_pqueue *fpq = &fud->pq;
 
 		spin_lock(&fpq->lock);
-		for (i = 0; i < FUSE_PQ_HASH_SIZE; i++)
+		for (i = 0; i < FUSE_PQ_HASH_SIZE; i++) {
+			list_for_each_entry(req, &fpq->processing[i], list) {
+				/*
+				 * We must acquire a reference here in case the timeout
+				 * handler is running at the same time. Else the
+				 * request might get freed out from under us
+				 */
+				__fuse_get_request(req);
+
+				/*
+				 * While we have an acquired reference on the request,
+				 * the request must remain on the list so that we
+				 * can release the reference on it
+				 */
+				set_bit(FR_PRIVATE, &req->flags);
+			}
+
 			list_splice_tail_init(&fpq->processing[i], &to_queue);
+		}
 		spin_unlock(&fpq->lock);
 	}
 	spin_unlock(&fc->lock);
@@ -1820,6 +1968,12 @@ static void fuse_resend(struct fuse_conn *fc)
 	}
 
 	spin_lock(&fiq->lock);
+	list_for_each_entry_safe(req, next, &to_queue, list) {
+		if (test_bit(FR_FINISHING, &req->flags))
+			list_del_init(&req->list);
+		clear_bit(FR_PRIVATE, &req->flags);
+		fuse_put_request(req);
+	}
 	/* iq and pq requests are both oldest to newest */
 	list_splice(&to_queue, &fiq->pending);
 	fiq->ops->wake_pending_and_unlock(fiq);
@@ -1951,9 +2105,10 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
 		goto copy_finish;
 	}
 
+	__fuse_get_request(req);
+
 	/* Is it an interrupt reply ID? */
 	if (oh.unique & FUSE_INT_REQ_BIT) {
-		__fuse_get_request(req);
 		spin_unlock(&fpq->lock);
 
 		err = 0;
@@ -1969,6 +2124,13 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
 		goto copy_finish;
 	}
 
+	if (test_and_set_bit(FR_FINISHING, &req->flags)) {
+		/* timeout handler is already finishing the request */
+		spin_unlock(&fpq->lock);
+		fuse_put_request(req);
+		goto copy_finish;
+	}
+
 	clear_bit(FR_SENT, &req->flags);
 	list_move(&req->list, &fpq->io);
 	req->out.h = oh;
@@ -1995,6 +2157,7 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
 	spin_unlock(&fpq->lock);
 
 	fuse_request_end(req);
+	fuse_put_request(req);
 out:
 	return err ? err : nbytes;
 
@@ -2260,13 +2423,21 @@ int fuse_dev_release(struct inode *inode, struct file *file)
 	if (fud) {
 		struct fuse_conn *fc = fud->fc;
 		struct fuse_pqueue *fpq = &fud->pq;
+		struct fuse_req *req;
 		LIST_HEAD(to_end);
 		unsigned int i;
 
 		spin_lock(&fpq->lock);
 		WARN_ON(!list_empty(&fpq->io));
-		for (i = 0; i < FUSE_PQ_HASH_SIZE; i++)
+		for (i = 0; i < FUSE_PQ_HASH_SIZE; i++) {
+			/*
+			 * Set the req error here so that the timeout
+			 * handler knows it's being released
+			 */
+			list_for_each_entry(req, &fpq->processing[i], list)
+				req->out.h.error = -ECONNABORTED;
 			list_splice_init(&fpq->processing[i], &to_end);
+		}
 		spin_unlock(&fpq->lock);
 
 		end_requests(&to_end);
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index f23919610313..2b616c5977b4 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -375,6 +375,8 @@ struct fuse_io_priv {
  * FR_FINISHED:		request is finished
  * FR_PRIVATE:		request is on private list
  * FR_ASYNC:		request is asynchronous
+ * FR_FINISHING:	request is being finished, by either the timeout handler
+ *			or the reply handler
  */
 enum fuse_req_flag {
 	FR_ISREPLY,
@@ -389,6 +391,7 @@ enum fuse_req_flag {
 	FR_FINISHED,
 	FR_PRIVATE,
 	FR_ASYNC,
+	FR_FINISHING,
 };
 
 /**
@@ -435,6 +438,12 @@ struct fuse_req {
 
 	/** fuse_mount this request belongs to */
 	struct fuse_mount *fm;
+
+	/** page queue this request has been added to */
+	struct fuse_pqueue *fpq;
+
+	/** optional timer for request replies, if timeout is enabled */
+	struct timer_list timer;
 };
 
 struct fuse_iqueue;
@@ -574,6 +583,8 @@ struct fuse_fs_context {
 	enum fuse_dax_mode dax_mode;
 	unsigned int max_read;
 	unsigned int blksize;
+	/*  Request timeout (in seconds). 0 = no timeout (infinite wait) */
+	unsigned int req_timeout;
 	const char *subtype;
 
 	/* DAX device, may be NULL */
@@ -633,6 +644,9 @@ struct fuse_conn {
 	/** Constrain ->max_pages to this value during feature negotiation */
 	unsigned int max_pages_limit;
 
+	/* Request timeout (in jiffies). 0 = no timeout (infinite wait) */
+	unsigned long req_timeout;
+
 	/** Input queue */
 	struct fuse_iqueue iq;
 
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 99e44ea7d875..9e69006fc026 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -733,6 +733,7 @@ enum {
 	OPT_ALLOW_OTHER,
 	OPT_MAX_READ,
 	OPT_BLKSIZE,
+	OPT_REQUEST_TIMEOUT,
 	OPT_ERR
 };
 
@@ -747,6 +748,7 @@ static const struct fs_parameter_spec fuse_fs_parameters[] = {
 	fsparam_u32	("max_read",		OPT_MAX_READ),
 	fsparam_u32	("blksize",		OPT_BLKSIZE),
 	fsparam_string	("subtype",		OPT_SUBTYPE),
+	fsparam_u32	("request_timeout",	OPT_REQUEST_TIMEOUT),
 	{}
 };
 
@@ -830,6 +832,10 @@ static int fuse_parse_param(struct fs_context *fsc, struct fs_parameter *param)
 		ctx->blksize = result.uint_32;
 		break;
 
+	case OPT_REQUEST_TIMEOUT:
+		ctx->req_timeout = result.uint_32;
+		break;
+
 	default:
 		return -EINVAL;
 	}
@@ -1724,6 +1730,7 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
 	fc->group_id = ctx->group_id;
 	fc->legacy_opts_show = ctx->legacy_opts_show;
 	fc->max_read = max_t(unsigned int, 4096, ctx->max_read);
+	fc->req_timeout = ctx->req_timeout * HZ;
 	fc->destroy = ctx->destroy;
 	fc->no_control = ctx->no_control;
 	fc->no_force_umount = ctx->no_force_umount;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 2/2] fuse: add default_request_timeout and max_request_timeout sysctls
  2024-07-30  0:23 [PATCH v2 0/2] fuse: add timeout option for requests Joanne Koong
  2024-07-30  0:23 ` [PATCH v2 1/2] fuse: add optional kernel-enforced timeout " Joanne Koong
@ 2024-07-30  0:23 ` Joanne Koong
  2024-07-30  7:49   ` kernel test robot
                     ` (2 more replies)
  2024-07-30  5:59 ` [PATCH v2 0/2] fuse: add timeout option for requests Yafang Shao
  2 siblings, 3 replies; 34+ messages in thread
From: Joanne Koong @ 2024-07-30  0:23 UTC (permalink / raw)
  To: miklos, linux-fsdevel; +Cc: josef, bernd.schubert, laoar.shao, kernel-team

Introduce two new sysctls, "default_request_timeout" and
"max_request_timeout". These control timeouts on replies by the
server to kernel-issued fuse requests.

"default_request_timeout" sets a timeout if no timeout is specified by
the fuse server on mount. 0 (default) indicates no timeout should be enforced.

"max_request_timeout" sets a maximum timeout for fuse requests. If the
fuse server attempts to set a timeout greater than max_request_timeout,
the system will default to max_request_timeout. Similarly, if the max
default timeout is greater than the max request timeout, the system will
default to the max request timeout. 0 (default) indicates no timeout should
be enforced.

$ sysctl -a | grep fuse
fs.fuse.default_request_timeout = 0
fs.fuse.max_request_timeout = 0

$ echo 0x100000000 | sudo tee /proc/sys/fs/fuse/default_request_timeout
tee: /proc/sys/fs/fuse/default_request_timeout: Invalid argument

$ echo 0xFFFFFFFF | sudo tee /proc/sys/fs/fuse/default_request_timeout
0xFFFFFFFF

$ sysctl -a | grep fuse
fs.fuse.default_request_timeout = 4294967295
fs.fuse.max_request_timeout = 0

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 Documentation/admin-guide/sysctl/fs.rst | 17 ++++++++++
 fs/fuse/Makefile                        |  2 +-
 fs/fuse/fuse_i.h                        | 16 ++++++++++
 fs/fuse/inode.c                         | 19 ++++++++++-
 fs/fuse/sysctl.c                        | 42 +++++++++++++++++++++++++
 5 files changed, 94 insertions(+), 2 deletions(-)
 create mode 100644 fs/fuse/sysctl.c

diff --git a/Documentation/admin-guide/sysctl/fs.rst b/Documentation/admin-guide/sysctl/fs.rst
index 47499a1742bd..44fd495f69b4 100644
--- a/Documentation/admin-guide/sysctl/fs.rst
+++ b/Documentation/admin-guide/sysctl/fs.rst
@@ -332,3 +332,20 @@ Each "watch" costs roughly 90 bytes on a 32-bit kernel, and roughly 160 bytes
 on a 64-bit one.
 The current default value for ``max_user_watches`` is 4% of the
 available low memory, divided by the "watch" cost in bytes.
+
+5. /proc/sys/fs/fuse - Configuration options for FUSE filesystems
+=====================================================================
+
+This directory contains the following configuration options for FUSE
+filesystems:
+
+``/proc/sys/fs/fuse/default_request_timeout`` is a read/write file for
+setting/getting the default timeout (in seconds) for a fuse server to
+reply to a kernel-issued request in the event where the server did not
+specify a timeout at mount. 0 indicates no timeout.
+
+``/proc/sys/fs/fuse/max_request_timeout`` is a read/write file for
+setting/getting the maximum timeout (in seconds) for a fuse server to
+reply to a kernel-issued request. If the server attempts to set a
+timeout greater than max_request_timeout, the system will use
+max_request_timeout as the timeout. 0 indicates no timeout.
diff --git a/fs/fuse/Makefile b/fs/fuse/Makefile
index 6e0228c6d0cb..cd4ef3e08ebf 100644
--- a/fs/fuse/Makefile
+++ b/fs/fuse/Makefile
@@ -7,7 +7,7 @@ obj-$(CONFIG_FUSE_FS) += fuse.o
 obj-$(CONFIG_CUSE) += cuse.o
 obj-$(CONFIG_VIRTIO_FS) += virtiofs.o
 
-fuse-y := dev.o dir.o file.o inode.o control.o xattr.o acl.o readdir.o ioctl.o
+fuse-y := dev.o dir.o file.o inode.o control.o xattr.o acl.o readdir.o ioctl.o sysctl.o
 fuse-y += iomode.o
 fuse-$(CONFIG_FUSE_DAX) += dax.o
 fuse-$(CONFIG_FUSE_PASSTHROUGH) += passthrough.o
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 2b616c5977b4..1a3a3b8a83ae 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -47,6 +47,14 @@
 /** Number of dentries for each connection in the control filesystem */
 #define FUSE_CTL_NUM_DENTRIES 5
 
+/*
+ * Default timeout (in seconds) for the server to reply to a request
+ * if no timeout was specified on mount
+ */
+extern u32 fuse_default_req_timeout;
+/** Max timeout (in seconds) for the server to reply to a request */
+extern u32 fuse_max_req_timeout;
+
 /** List of active connections */
 extern struct list_head fuse_conn_list;
 
@@ -1486,4 +1494,12 @@ ssize_t fuse_passthrough_splice_write(struct pipe_inode_info *pipe,
 				      size_t len, unsigned int flags);
 ssize_t fuse_passthrough_mmap(struct file *file, struct vm_area_struct *vma);
 
+#ifdef CONFIG_SYSCTL
+extern int fuse_sysctl_register(void);
+extern void fuse_sysctl_unregister(void);
+#else
+#define fuse_sysctl_register()		(0)
+#define fuse_sysctl_unregister()	do { } while (0)
+#endif /* CONFIG_SYSCTL */
+
 #endif /* _FS_FUSE_I_H */
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 9e69006fc026..cf333448f2d3 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -35,6 +35,10 @@ DEFINE_MUTEX(fuse_mutex);
 
 static int set_global_limit(const char *val, const struct kernel_param *kp);
 
+/* default is no timeout */
+u32 fuse_default_req_timeout = 0;
+u32 fuse_max_req_timeout = 0;
+
 unsigned max_user_bgreq;
 module_param_call(max_user_bgreq, set_global_limit, param_get_uint,
 		  &max_user_bgreq, 0644);
@@ -1678,6 +1682,7 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
 	struct fuse_conn *fc = fm->fc;
 	struct inode *root;
 	struct dentry *root_dentry;
+	u32 req_timeout;
 	int err;
 
 	err = -EINVAL;
@@ -1730,10 +1735,16 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
 	fc->group_id = ctx->group_id;
 	fc->legacy_opts_show = ctx->legacy_opts_show;
 	fc->max_read = max_t(unsigned int, 4096, ctx->max_read);
-	fc->req_timeout = ctx->req_timeout * HZ;
 	fc->destroy = ctx->destroy;
 	fc->no_control = ctx->no_control;
 	fc->no_force_umount = ctx->no_force_umount;
+	req_timeout = ctx->req_timeout ?: fuse_default_req_timeout;
+	if (!fuse_max_req_timeout)
+		fc->req_timeout = req_timeout * HZ;
+	else if (!req_timeout)
+		fc->req_timeout = fuse_max_req_timeout * HZ;
+	else
+		fc->req_timeout = min(req_timeout, fuse_max_req_timeout) * HZ;
 
 	err = -ENOMEM;
 	root = fuse_get_root_inode(sb, ctx->rootmode);
@@ -2046,8 +2057,14 @@ static int __init fuse_fs_init(void)
 	if (err)
 		goto out3;
 
+	err = fuse_sysctl_register();
+	if (err)
+		goto out4;
+
 	return 0;
 
+ out4:
+	unregister_filesystem(&fuse_fs_type);
  out3:
 	unregister_fuseblk();
  out2:
diff --git a/fs/fuse/sysctl.c b/fs/fuse/sysctl.c
new file mode 100644
index 000000000000..c87bb0ecbfa9
--- /dev/null
+++ b/fs/fuse/sysctl.c
@@ -0,0 +1,42 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+* linux/fs/fuse/fuse_sysctl.c
+*
+* Sysctl interface to fuse parameters
+*/
+#include <linux/sysctl.h>
+
+#include "fuse_i.h"
+
+static struct ctl_table_header *fuse_table_header;
+
+static struct ctl_table fuse_sysctl_table[] = {
+	{
+		.procname	= "default_request_timeout",
+		.data		= &fuse_default_req_timeout,
+		.maxlen		= sizeof(fuse_default_req_timeout),
+		.mode		= 0644,
+		.proc_handler	= proc_douintvec,
+	},
+	{
+		.procname	= "max_request_timeout",
+		.data		= &fuse_max_req_timeout,
+		.maxlen		= sizeof(fuse_max_req_timeout),
+		.mode		= 0644,
+		.proc_handler	= proc_douintvec,
+	},
+};
+
+int fuse_sysctl_register(void)
+{
+	fuse_table_header = register_sysctl("fs/fuse", fuse_sysctl_table);
+	if (!fuse_table_header)
+		return -ENOMEM;
+	return 0;
+}
+
+void fuse_sysctl_unregister(void)
+{
+	unregister_sysctl_table(fuse_table_header);
+	fuse_table_header = NULL;
+}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 0/2] fuse: add timeout option for requests
  2024-07-30  0:23 [PATCH v2 0/2] fuse: add timeout option for requests Joanne Koong
  2024-07-30  0:23 ` [PATCH v2 1/2] fuse: add optional kernel-enforced timeout " Joanne Koong
  2024-07-30  0:23 ` [PATCH v2 2/2] fuse: add default_request_timeout and max_request_timeout sysctls Joanne Koong
@ 2024-07-30  5:59 ` Yafang Shao
  2024-07-30 18:16   ` Joanne Koong
  2 siblings, 1 reply; 34+ messages in thread
From: Yafang Shao @ 2024-07-30  5:59 UTC (permalink / raw)
  To: Joanne Koong; +Cc: miklos, linux-fsdevel, josef, bernd.schubert, kernel-team

On Tue, Jul 30, 2024 at 8:28 AM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> There are situations where fuse servers can become unresponsive or take
> too long to reply to a request. Currently there is no upper bound on
> how long a request may take, which may be frustrating to users who get
> stuck waiting for a request to complete.
>
> This patchset adds a timeout option for requests and two dynamically
> configurable fuse sysctls "default_request_timeout" and "max_request_timeout"
> for controlling/enforcing timeout behavior system-wide.
>
> Existing fuse servers will not be affected unless they explicitly opt into the
> timeout.
>
> v1: https://lore.kernel.org/linux-fsdevel/20240717213458.1613347-1-joannelkoong@gmail.com/
> Changes from v1:
> - Add timeout for background requests
> - Handle resend race condition
> - Add sysctls
>
> Joanne Koong (2):
>   fuse: add optional kernel-enforced timeout for requests
>   fuse: add default_request_timeout and max_request_timeout sysctls
>
>  Documentation/admin-guide/sysctl/fs.rst |  17 +++
>  fs/fuse/Makefile                        |   2 +-
>  fs/fuse/dev.c                           | 187 +++++++++++++++++++++++-
>  fs/fuse/fuse_i.h                        |  30 ++++
>  fs/fuse/inode.c                         |  24 +++
>  fs/fuse/sysctl.c                        |  42 ++++++
>  6 files changed, 293 insertions(+), 9 deletions(-)
>  create mode 100644 fs/fuse/sysctl.c
>
> --
> 2.43.0
>

Hello Joanne,

Thanks for your update.

I have tested your patches using my test case, which is similar to the
hello-fuse [0] example, with an additional change as follows:

@@ -125,6 +125,8 @@ static int hello_read(const char *path, char *buf,
size_t size, off_t offset,
        } else
                size = 0;

+       // TO trigger timeout
+       sleep(60);
        return size;
 }

[0] https://github.com/libfuse/libfuse/blob/master/example/hello.c

However, it triggered a crash with the following setup:

1. Set FUSE timeout:
  sysctl -w fs.fuse.default_request_timeout=10
  sysctl -w fs.fuse.max_request_timeout = 20

2. Start FUSE daemon:
  ./hello /tmp/fuse

3. Read from FUSE:
  cat /tmp/fuse/hello

4. Kill the process within 10 seconds (to avoid the timeout being triggered).
   Then the crash will be triggered.

The crash details are as follows:

[  270.729966] CPU: 37 PID: 0 Comm: swapper/37 Kdump: loaded Not
tainted 6.10.0+ #30
[  270.731658] RIP: 0010:__run_timers+0x27e/0x360
[  270.732129] Code: 07 48 c7 43 08 00 00 00 00 48 85 c0 74 78 4d 8b
2f 4c 89 6b 08 0f 1f 44 00 00 49 8b 45 00 49 8b 55 08 48 89 02 48 85
c0 74 04 <48> 89 50 08 4d 8b 65 18 49 c7 45 08 00 00 00 00 48 b8 22 01
00 00
[  270.733815] RSP: 0018:ffff9c1c80d00ed8 EFLAGS: 00010086
[  270.734347] RAX: dead000000000122 RBX: ffff8bfb7f7613c0 RCX: 0000000000000001
[  270.735037] RDX: ffff9c1c80d00ef8 RSI: 0000000000000000 RDI: ffff8bfb7f7613e8
[  270.735723] RBP: ffff9c1c80d00f70 R08: 00000000000000b3 R09: ffff8bfb7f761430
[  270.736439] R10: ffffffffb0e060c0 R11: 00000000000000b1 R12: 0000000000000001
[  270.737133] R13: ffff8bbd6591a0a0 R14: 00000000ffff8c00 R15: ffff9c1c80d00ef8
[  270.737834] FS:  0000000000000000(0000) GS:ffff8bfb7f740000(0000)
knlGS:0000000000000000
[  270.738603] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  270.739178] CR2: 00007fac75f44778 CR3: 00000001200c2006 CR4: 0000000000370ef0
[  270.739880] Call Trace:
[  270.740211]  <IRQ>
[  270.740512]  ? show_regs+0x69/0x80
[  270.740934]  ? die_addr+0x38/0x90
[  270.741340]  ? exc_general_protection+0x236/0x490
[  270.741874]  ? asm_exc_general_protection+0x27/0x30
[  270.742416]  ? __run_timers+0x27e/0x360
[  270.742872]  ? __run_timers+0x1b4/0x360
[  270.743318]  ? kvm_sched_clock_read+0x11/0x20
[  270.743821]  ? sched_clock_noinstr+0x9/0x10
[  270.744298]  ? sched_clock+0x10/0x30
[  270.744716]  ? sched_clock_cpu+0x10/0x190
[  270.745190]  run_timer_softirq+0x3a/0x60
[  270.745647]  handle_softirqs+0x118/0x350
[  270.746106]  irq_exit_rcu+0x60/0x80
[  270.746527]  sysvec_apic_timer_interrupt+0x7f/0x90
[  270.747067]  </IRQ>
[  270.747374]  <TASK>
[  270.747669]  asm_sysvec_apic_timer_interrupt+0x1b/0x20
[  270.748255] RIP: 0010:default_idle+0xb/0x20
[  270.748724] Code: 00 4d 29 c8 4c 01 c7 4c 29 c2 e9 6e ff ff ff 90
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 eb 07 0f 00 2d b3 51 33
00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40
00 90
[  270.750507] RSP: 0018:ffff9c1c801e7e18 EFLAGS: 00000246
[  270.751075] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 000028c5f01d9e8e
[  270.751811] RDX: 0000000000000001 RSI: ffffffffb112e200 RDI: ffff8bfb7f77c8e0
[  270.752529] RBP: ffff9c1c801e7e20 R08: 0000003f08a215b6 R09: 0000000000000001
[  270.753298] R10: ffffffffb0e56080 R11: 0000000000000001 R12: 0000000000000001
[  270.754023] R13: ffffffffb112e200 R14: ffffffffb112e280 R15: 0000000000000001
[  270.754742]  ? ct_kernel_exit.constprop.0+0x79/0x90
[  270.755285]  ? arch_cpu_idle+0x9/0x10
[  270.755707]  default_enter_idle+0x22/0x2f
[  270.756175]  cpuidle_enter_state+0x88/0x430
[  270.756648]  cpuidle_enter+0x34/0x50
[  270.757075]  call_cpuidle+0x22/0x50
[  270.757492]  cpuidle_idle_call+0xd2/0x120
[  270.757960]  do_idle+0x77/0xd0
[  270.758347]  cpu_startup_entry+0x2c/0x30
[  270.758804]  start_secondary+0x117/0x140
[  270.759260]  common_startup_64+0x13e/0x141
[  270.759721]  </TASK>

Please feel free to reach out if you are unable to reproduce the issue.

--
Regards
Yafang

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 2/2] fuse: add default_request_timeout and max_request_timeout sysctls
  2024-07-30  0:23 ` [PATCH v2 2/2] fuse: add default_request_timeout and max_request_timeout sysctls Joanne Koong
@ 2024-07-30  7:49   ` kernel test robot
  2024-07-30  9:14   ` kernel test robot
  2024-08-05  7:38   ` Jingbo Xu
  2 siblings, 0 replies; 34+ messages in thread
From: kernel test robot @ 2024-07-30  7:49 UTC (permalink / raw)
  To: Joanne Koong, miklos, linux-fsdevel
  Cc: oe-kbuild-all, josef, bernd.schubert, laoar.shao, kernel-team

Hi Joanne,

kernel test robot noticed the following build errors:

[auto build test ERROR on mszeredi-fuse/for-next]
[also build test ERROR on linus/master v6.11-rc1 next-20240730]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Joanne-Koong/fuse-add-optional-kernel-enforced-timeout-for-requests/20240730-085106
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git for-next
patch link:    https://lore.kernel.org/r/20240730002348.3431931-3-joannelkoong%40gmail.com
patch subject: [PATCH v2 2/2] fuse: add default_request_timeout and max_request_timeout sysctls
config: i386-buildonly-randconfig-001-20240730 (https://download.01.org/0day-ci/archive/20240730/202407301513.fphdIYEE-lkp@intel.com/config)
compiler: gcc-13 (Ubuntu 13.2.0-4ubuntu3) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240730/202407301513.fphdIYEE-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202407301513.fphdIYEE-lkp@intel.com/

All error/warnings (new ones prefixed by >>):

>> fs/fuse/sysctl.c:30:30: error: macro "fuse_sysctl_register" passed 1 arguments, but takes just 0
      30 | int fuse_sysctl_register(void)
         |                              ^
   In file included from fs/fuse/sysctl.c:9:
   fs/fuse/fuse_i.h:1501: note: macro "fuse_sysctl_register" defined here
    1501 | #define fuse_sysctl_register()          (0)
         | 
>> fs/fuse/sysctl.c:31:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
      31 | {
         | ^
>> fs/fuse/sysctl.c:38:33: error: macro "fuse_sysctl_unregister" passed 1 arguments, but takes just 0
      38 | void fuse_sysctl_unregister(void)
         |                                 ^
   fs/fuse/fuse_i.h:1502: note: macro "fuse_sysctl_unregister" defined here
    1502 | #define fuse_sysctl_unregister()        do { } while (0)
         | 
   fs/fuse/sysctl.c:39:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
      39 | {
         | ^
>> fs/fuse/sysctl.c:13:25: warning: 'fuse_sysctl_table' defined but not used [-Wunused-variable]
      13 | static struct ctl_table fuse_sysctl_table[] = {
         |                         ^~~~~~~~~~~~~~~~~
>> fs/fuse/sysctl.c:11:33: warning: 'fuse_table_header' defined but not used [-Wunused-variable]
      11 | static struct ctl_table_header *fuse_table_header;
         |                                 ^~~~~~~~~~~~~~~~~


vim +/fuse_sysctl_register +30 fs/fuse/sysctl.c

    10	
  > 11	static struct ctl_table_header *fuse_table_header;
    12	
  > 13	static struct ctl_table fuse_sysctl_table[] = {
    14		{
    15			.procname	= "default_request_timeout",
    16			.data		= &fuse_default_req_timeout,
    17			.maxlen		= sizeof(fuse_default_req_timeout),
    18			.mode		= 0644,
    19			.proc_handler	= proc_douintvec,
    20		},
    21		{
    22			.procname	= "max_request_timeout",
    23			.data		= &fuse_max_req_timeout,
    24			.maxlen		= sizeof(fuse_max_req_timeout),
    25			.mode		= 0644,
    26			.proc_handler	= proc_douintvec,
    27		},
    28	};
    29	
  > 30	int fuse_sysctl_register(void)
  > 31	{
    32		fuse_table_header = register_sysctl("fs/fuse", fuse_sysctl_table);
    33		if (!fuse_table_header)
    34			return -ENOMEM;
    35		return 0;
    36	}
    37	
  > 38	void fuse_sysctl_unregister(void)

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 2/2] fuse: add default_request_timeout and max_request_timeout sysctls
  2024-07-30  0:23 ` [PATCH v2 2/2] fuse: add default_request_timeout and max_request_timeout sysctls Joanne Koong
  2024-07-30  7:49   ` kernel test robot
@ 2024-07-30  9:14   ` kernel test robot
  2024-08-05  7:38   ` Jingbo Xu
  2 siblings, 0 replies; 34+ messages in thread
From: kernel test robot @ 2024-07-30  9:14 UTC (permalink / raw)
  To: Joanne Koong, miklos, linux-fsdevel
  Cc: llvm, oe-kbuild-all, josef, bernd.schubert, laoar.shao,
	kernel-team

Hi Joanne,

kernel test robot noticed the following build errors:

[auto build test ERROR on mszeredi-fuse/for-next]
[also build test ERROR on linus/master v6.11-rc1 next-20240730]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Joanne-Koong/fuse-add-optional-kernel-enforced-timeout-for-requests/20240730-085106
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git for-next
patch link:    https://lore.kernel.org/r/20240730002348.3431931-3-joannelkoong%40gmail.com
patch subject: [PATCH v2 2/2] fuse: add default_request_timeout and max_request_timeout sysctls
config: i386-buildonly-randconfig-006-20240730 (https://download.01.org/0day-ci/archive/20240730/202407301619.Mja5LwBe-lkp@intel.com/config)
compiler: clang version 18.1.5 (https://github.com/llvm/llvm-project 617a15a9eac96088ae5e9134248d8236e34b91b1)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240730/202407301619.Mja5LwBe-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202407301619.Mja5LwBe-lkp@intel.com/

All errors (new ones prefixed by >>):

>> fs/fuse/sysctl.c:30:26: error: too many arguments provided to function-like macro invocation
      30 | int fuse_sysctl_register(void)
         |                          ^
   fs/fuse/fuse_i.h:1501:9: note: macro 'fuse_sysctl_register' defined here
    1501 | #define fuse_sysctl_register()          (0)
         |         ^
>> fs/fuse/sysctl.c:30:25: error: expected ';' after top level declarator
      30 | int fuse_sysctl_register(void)
         |                         ^
         |                         ;
   fs/fuse/sysctl.c:38:29: error: too many arguments provided to function-like macro invocation
      38 | void fuse_sysctl_unregister(void)
         |                             ^
   fs/fuse/fuse_i.h:1502:9: note: macro 'fuse_sysctl_unregister' defined here
    1502 | #define fuse_sysctl_unregister()        do { } while (0)
         |         ^
>> fs/fuse/sysctl.c:38:6: error: variable has incomplete type 'void'
      38 | void fuse_sysctl_unregister(void)
         |      ^
   fs/fuse/sysctl.c:38:28: error: expected ';' after top level declarator
      38 | void fuse_sysctl_unregister(void)
         |                            ^
         |                            ;
   5 errors generated.


vim +30 fs/fuse/sysctl.c

    29	
  > 30	int fuse_sysctl_register(void)
    31	{
    32		fuse_table_header = register_sysctl("fs/fuse", fuse_sysctl_table);
    33		if (!fuse_table_header)
    34			return -ENOMEM;
    35		return 0;
    36	}
    37	
  > 38	void fuse_sysctl_unregister(void)

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 0/2] fuse: add timeout option for requests
  2024-07-30  5:59 ` [PATCH v2 0/2] fuse: add timeout option for requests Yafang Shao
@ 2024-07-30 18:16   ` Joanne Koong
  2024-07-31  2:13     ` Yafang Shao
  0 siblings, 1 reply; 34+ messages in thread
From: Joanne Koong @ 2024-07-30 18:16 UTC (permalink / raw)
  To: Yafang Shao; +Cc: miklos, linux-fsdevel, josef, bernd.schubert, kernel-team

On Mon, Jul 29, 2024 at 11:00 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Tue, Jul 30, 2024 at 8:28 AM Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > There are situations where fuse servers can become unresponsive or take
> > too long to reply to a request. Currently there is no upper bound on
> > how long a request may take, which may be frustrating to users who get
> > stuck waiting for a request to complete.
> >
> > This patchset adds a timeout option for requests and two dynamically
> > configurable fuse sysctls "default_request_timeout" and "max_request_timeout"
> > for controlling/enforcing timeout behavior system-wide.
> >
> > Existing fuse servers will not be affected unless they explicitly opt into the
> > timeout.
> >
> > v1: https://lore.kernel.org/linux-fsdevel/20240717213458.1613347-1-joannelkoong@gmail.com/
> > Changes from v1:
> > - Add timeout for background requests
> > - Handle resend race condition
> > - Add sysctls
> >
> > Joanne Koong (2):
> >   fuse: add optional kernel-enforced timeout for requests
> >   fuse: add default_request_timeout and max_request_timeout sysctls
> >
> >  Documentation/admin-guide/sysctl/fs.rst |  17 +++
> >  fs/fuse/Makefile                        |   2 +-
> >  fs/fuse/dev.c                           | 187 +++++++++++++++++++++++-
> >  fs/fuse/fuse_i.h                        |  30 ++++
> >  fs/fuse/inode.c                         |  24 +++
> >  fs/fuse/sysctl.c                        |  42 ++++++
> >  6 files changed, 293 insertions(+), 9 deletions(-)
> >  create mode 100644 fs/fuse/sysctl.c
> >
> > --
> > 2.43.0
> >
>
> Hello Joanne,
>
> Thanks for your update.
>
> I have tested your patches using my test case, which is similar to the
> hello-fuse [0] example, with an additional change as follows:
>
> @@ -125,6 +125,8 @@ static int hello_read(const char *path, char *buf,
> size_t size, off_t offset,
>         } else
>                 size = 0;
>
> +       // TO trigger timeout
> +       sleep(60);
>         return size;
>  }
>
> [0] https://github.com/libfuse/libfuse/blob/master/example/hello.c
>
> However, it triggered a crash with the following setup:
>
> 1. Set FUSE timeout:
>   sysctl -w fs.fuse.default_request_timeout=10
>   sysctl -w fs.fuse.max_request_timeout = 20
>
> 2. Start FUSE daemon:
>   ./hello /tmp/fuse
>
> 3. Read from FUSE:
>   cat /tmp/fuse/hello
>
> 4. Kill the process within 10 seconds (to avoid the timeout being triggered).
>    Then the crash will be triggered.

Hi Yafang,

Thanks for trying this out on your use case!

How consistently are you able to repro this? I tried reproing using
your instructions above but I'm not able to get the crash.

From the crash logs you provided below, it looks like what's happening
is that if the process gets killed, the timer isn't getting deleted.
I'll look more into what happens in fuse when a process is killed and
get back to you on this.

Thanks,
Joanne

>
> The crash details are as follows:
>
> [  270.729966] CPU: 37 PID: 0 Comm: swapper/37 Kdump: loaded Not
> tainted 6.10.0+ #30
> [  270.731658] RIP: 0010:__run_timers+0x27e/0x360
> [  270.732129] Code: 07 48 c7 43 08 00 00 00 00 48 85 c0 74 78 4d 8b
> 2f 4c 89 6b 08 0f 1f 44 00 00 49 8b 45 00 49 8b 55 08 48 89 02 48 85
> c0 74 04 <48> 89 50 08 4d 8b 65 18 49 c7 45 08 00 00 00 00 48 b8 22 01
> 00 00
> [  270.733815] RSP: 0018:ffff9c1c80d00ed8 EFLAGS: 00010086
> [  270.734347] RAX: dead000000000122 RBX: ffff8bfb7f7613c0 RCX: 0000000000000001
> [  270.735037] RDX: ffff9c1c80d00ef8 RSI: 0000000000000000 RDI: ffff8bfb7f7613e8
> [  270.735723] RBP: ffff9c1c80d00f70 R08: 00000000000000b3 R09: ffff8bfb7f761430
> [  270.736439] R10: ffffffffb0e060c0 R11: 00000000000000b1 R12: 0000000000000001
> [  270.737133] R13: ffff8bbd6591a0a0 R14: 00000000ffff8c00 R15: ffff9c1c80d00ef8
> [  270.737834] FS:  0000000000000000(0000) GS:ffff8bfb7f740000(0000)
> knlGS:0000000000000000
> [  270.738603] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  270.739178] CR2: 00007fac75f44778 CR3: 00000001200c2006 CR4: 0000000000370ef0
> [  270.739880] Call Trace:
> [  270.740211]  <IRQ>
> [  270.740512]  ? show_regs+0x69/0x80
> [  270.740934]  ? die_addr+0x38/0x90
> [  270.741340]  ? exc_general_protection+0x236/0x490
> [  270.741874]  ? asm_exc_general_protection+0x27/0x30
> [  270.742416]  ? __run_timers+0x27e/0x360
> [  270.742872]  ? __run_timers+0x1b4/0x360
> [  270.743318]  ? kvm_sched_clock_read+0x11/0x20
> [  270.743821]  ? sched_clock_noinstr+0x9/0x10
> [  270.744298]  ? sched_clock+0x10/0x30
> [  270.744716]  ? sched_clock_cpu+0x10/0x190
> [  270.745190]  run_timer_softirq+0x3a/0x60
> [  270.745647]  handle_softirqs+0x118/0x350
> [  270.746106]  irq_exit_rcu+0x60/0x80
> [  270.746527]  sysvec_apic_timer_interrupt+0x7f/0x90
> [  270.747067]  </IRQ>
> [  270.747374]  <TASK>
> [  270.747669]  asm_sysvec_apic_timer_interrupt+0x1b/0x20
> [  270.748255] RIP: 0010:default_idle+0xb/0x20
> [  270.748724] Code: 00 4d 29 c8 4c 01 c7 4c 29 c2 e9 6e ff ff ff 90
> 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 eb 07 0f 00 2d b3 51 33
> 00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40
> 00 90
> [  270.750507] RSP: 0018:ffff9c1c801e7e18 EFLAGS: 00000246
> [  270.751075] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 000028c5f01d9e8e
> [  270.751811] RDX: 0000000000000001 RSI: ffffffffb112e200 RDI: ffff8bfb7f77c8e0
> [  270.752529] RBP: ffff9c1c801e7e20 R08: 0000003f08a215b6 R09: 0000000000000001
> [  270.753298] R10: ffffffffb0e56080 R11: 0000000000000001 R12: 0000000000000001
> [  270.754023] R13: ffffffffb112e200 R14: ffffffffb112e280 R15: 0000000000000001
> [  270.754742]  ? ct_kernel_exit.constprop.0+0x79/0x90
> [  270.755285]  ? arch_cpu_idle+0x9/0x10
> [  270.755707]  default_enter_idle+0x22/0x2f
> [  270.756175]  cpuidle_enter_state+0x88/0x430
> [  270.756648]  cpuidle_enter+0x34/0x50
> [  270.757075]  call_cpuidle+0x22/0x50
> [  270.757492]  cpuidle_idle_call+0xd2/0x120
> [  270.757960]  do_idle+0x77/0xd0
> [  270.758347]  cpu_startup_entry+0x2c/0x30
> [  270.758804]  start_secondary+0x117/0x140
> [  270.759260]  common_startup_64+0x13e/0x141
> [  270.759721]  </TASK>
>
> Please feel free to reach out if you are unable to reproduce the issue.
>
> --
> Regards
> Yafang

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 0/2] fuse: add timeout option for requests
  2024-07-30 18:16   ` Joanne Koong
@ 2024-07-31  2:13     ` Yafang Shao
  2024-07-31 17:52       ` Joanne Koong
  0 siblings, 1 reply; 34+ messages in thread
From: Yafang Shao @ 2024-07-31  2:13 UTC (permalink / raw)
  To: Joanne Koong; +Cc: miklos, linux-fsdevel, josef, bernd.schubert, kernel-team

On Wed, Jul 31, 2024 at 2:16 AM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Mon, Jul 29, 2024 at 11:00 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > On Tue, Jul 30, 2024 at 8:28 AM Joanne Koong <joannelkoong@gmail.com> wrote:
> > >
> > > There are situations where fuse servers can become unresponsive or take
> > > too long to reply to a request. Currently there is no upper bound on
> > > how long a request may take, which may be frustrating to users who get
> > > stuck waiting for a request to complete.
> > >
> > > This patchset adds a timeout option for requests and two dynamically
> > > configurable fuse sysctls "default_request_timeout" and "max_request_timeout"
> > > for controlling/enforcing timeout behavior system-wide.
> > >
> > > Existing fuse servers will not be affected unless they explicitly opt into the
> > > timeout.
> > >
> > > v1: https://lore.kernel.org/linux-fsdevel/20240717213458.1613347-1-joannelkoong@gmail.com/
> > > Changes from v1:
> > > - Add timeout for background requests
> > > - Handle resend race condition
> > > - Add sysctls
> > >
> > > Joanne Koong (2):
> > >   fuse: add optional kernel-enforced timeout for requests
> > >   fuse: add default_request_timeout and max_request_timeout sysctls
> > >
> > >  Documentation/admin-guide/sysctl/fs.rst |  17 +++
> > >  fs/fuse/Makefile                        |   2 +-
> > >  fs/fuse/dev.c                           | 187 +++++++++++++++++++++++-
> > >  fs/fuse/fuse_i.h                        |  30 ++++
> > >  fs/fuse/inode.c                         |  24 +++
> > >  fs/fuse/sysctl.c                        |  42 ++++++
> > >  6 files changed, 293 insertions(+), 9 deletions(-)
> > >  create mode 100644 fs/fuse/sysctl.c
> > >
> > > --
> > > 2.43.0
> > >
> >
> > Hello Joanne,
> >
> > Thanks for your update.
> >
> > I have tested your patches using my test case, which is similar to the
> > hello-fuse [0] example, with an additional change as follows:
> >
> > @@ -125,6 +125,8 @@ static int hello_read(const char *path, char *buf,
> > size_t size, off_t offset,
> >         } else
> >                 size = 0;
> >
> > +       // TO trigger timeout
> > +       sleep(60);
> >         return size;
> >  }
> >
> > [0] https://github.com/libfuse/libfuse/blob/master/example/hello.c
> >
> > However, it triggered a crash with the following setup:
> >
> > 1. Set FUSE timeout:
> >   sysctl -w fs.fuse.default_request_timeout=10
> >   sysctl -w fs.fuse.max_request_timeout = 20
> >
> > 2. Start FUSE daemon:
> >   ./hello /tmp/fuse
> >
> > 3. Read from FUSE:
> >   cat /tmp/fuse/hello
> >
> > 4. Kill the process within 10 seconds (to avoid the timeout being triggered).
> >    Then the crash will be triggered.
>
> Hi Yafang,
>
> Thanks for trying this out on your use case!
>
> How consistently are you able to repro this?

It triggers the crash every time.

> I tried reproing using
> your instructions above but I'm not able to get the crash.

Please note that it is the `cat /tmp/fuse/hello` process that was
killed, not the fuse daemon.
The crash seems to occur when the fuse daemon wakes up after
sleep(60). Please ensure that the fuse daemon can be woken up.

>
> From the crash logs you provided below, it looks like what's happening
> is that if the process gets killed, the timer isn't getting deleted.
> I'll look more into what happens in fuse when a process is killed and
> get back to you on this.

Thanks

--
Regards
Yafang

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 0/2] fuse: add timeout option for requests
  2024-07-31  2:13     ` Yafang Shao
@ 2024-07-31 17:52       ` Joanne Koong
  2024-07-31 18:46         ` Joanne Koong
  0 siblings, 1 reply; 34+ messages in thread
From: Joanne Koong @ 2024-07-31 17:52 UTC (permalink / raw)
  To: Yafang Shao; +Cc: miklos, linux-fsdevel, josef, bernd.schubert, kernel-team

On Tue, Jul 30, 2024 at 7:14 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Wed, Jul 31, 2024 at 2:16 AM Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > On Mon, Jul 29, 2024 at 11:00 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > >
> > > On Tue, Jul 30, 2024 at 8:28 AM Joanne Koong <joannelkoong@gmail.com> wrote:
> > > >
> > > > There are situations where fuse servers can become unresponsive or take
> > > > too long to reply to a request. Currently there is no upper bound on
> > > > how long a request may take, which may be frustrating to users who get
> > > > stuck waiting for a request to complete.
> > > >
> > > > This patchset adds a timeout option for requests and two dynamically
> > > > configurable fuse sysctls "default_request_timeout" and "max_request_timeout"
> > > > for controlling/enforcing timeout behavior system-wide.
> > > >
> > > > Existing fuse servers will not be affected unless they explicitly opt into the
> > > > timeout.
> > > >
> > > > v1: https://lore.kernel.org/linux-fsdevel/20240717213458.1613347-1-joannelkoong@gmail.com/
> > > > Changes from v1:
> > > > - Add timeout for background requests
> > > > - Handle resend race condition
> > > > - Add sysctls
> > > >
> > > > Joanne Koong (2):
> > > >   fuse: add optional kernel-enforced timeout for requests
> > > >   fuse: add default_request_timeout and max_request_timeout sysctls
> > > >
> > > >  Documentation/admin-guide/sysctl/fs.rst |  17 +++
> > > >  fs/fuse/Makefile                        |   2 +-
> > > >  fs/fuse/dev.c                           | 187 +++++++++++++++++++++++-
> > > >  fs/fuse/fuse_i.h                        |  30 ++++
> > > >  fs/fuse/inode.c                         |  24 +++
> > > >  fs/fuse/sysctl.c                        |  42 ++++++
> > > >  6 files changed, 293 insertions(+), 9 deletions(-)
> > > >  create mode 100644 fs/fuse/sysctl.c
> > > >
> > > > --
> > > > 2.43.0
> > > >
> > >
> > > Hello Joanne,
> > >
> > > Thanks for your update.
> > >
> > > I have tested your patches using my test case, which is similar to the
> > > hello-fuse [0] example, with an additional change as follows:
> > >
> > > @@ -125,6 +125,8 @@ static int hello_read(const char *path, char *buf,
> > > size_t size, off_t offset,
> > >         } else
> > >                 size = 0;
> > >
> > > +       // TO trigger timeout
> > > +       sleep(60);
> > >         return size;
> > >  }
> > >
> > > [0] https://github.com/libfuse/libfuse/blob/master/example/hello.c
> > >
> > > However, it triggered a crash with the following setup:
> > >
> > > 1. Set FUSE timeout:
> > >   sysctl -w fs.fuse.default_request_timeout=10
> > >   sysctl -w fs.fuse.max_request_timeout = 20
> > >
> > > 2. Start FUSE daemon:
> > >   ./hello /tmp/fuse
> > >
> > > 3. Read from FUSE:
> > >   cat /tmp/fuse/hello
> > >
> > > 4. Kill the process within 10 seconds (to avoid the timeout being triggered).
> > >    Then the crash will be triggered.
> >
> > Hi Yafang,
> >
> > Thanks for trying this out on your use case!
> >
> > How consistently are you able to repro this?
>
> It triggers the crash every time.
>
> > I tried reproing using
> > your instructions above but I'm not able to get the crash.
>
> Please note that it is the `cat /tmp/fuse/hello` process that was
> killed, not the fuse daemon.
> The crash seems to occur when the fuse daemon wakes up after
> sleep(60). Please ensure that the fuse daemon can be woken up.
>

I'm still not able to trigger the crash by killing the `cat
/tmp/fuse/hello` process. This is how I'm repro-ing

1) Add sleep to test code in
https://github.com/libfuse/libfuse/blob/master/example/hello.c
@@ -125,6 +126,9 @@ static int hello_read(const char *path, char *buf,
size_t size, off_t offset,
        } else
                size = 0;

+       sleep(60);
+       printf("hello_read woke up from sleep\n");
+
        return size;
 }

2)  Set fuse timeout to 10 seconds
sysctl -w fs.fuse.default_request_timeout=10

3) Start fuse daemon
./example/hello ./tmp/fuse

4) Read from fuse
cat /tmp/fuse/hello

5) Get pid of cat process
top -b | grep cat

6) Kill cat process (within 10 seconds)
 sudo kill -9 <cat-pid>

7) Wait 60 seconds for fuse's read request to complete

From what it sounds like, this is exactly what you are doing as well?

I added some kernel-side logs and I'm seeing that the read request is
timing out after ~10 seconds and handled by the timeout handler
successfully.

On the fuse daemon side, these are the logs I'm seeing from the above repro:
./example/hello /tmp/fuse -f -d

FUSE library version: 3.17.0
nullpath_ok: 0
unique: 2, opcode: INIT (26), nodeid: 0, insize: 104, pid: 0
INIT: 7.40
flags=0x73fffffb
max_readahead=0x00020000
   INIT: 7.40
   flags=0x4040f039
   max_readahead=0x00020000
   max_write=0x00100000
   max_background=0
   congestion_threshold=0
   time_gran=1
   unique: 2, success, outsize: 80
unique: 4, opcode: LOOKUP (1), nodeid: 1, insize: 46, pid: 673
LOOKUP /hello
getattr[NULL] /hello
   NODEID: 2
   unique: 4, success, outsize: 144
unique: 6, opcode: OPEN (14), nodeid: 2, insize: 48, pid: 673
open flags: 0x8000 /hello
   open[0] flags: 0x8000 /hello
   unique: 6, success, outsize: 32
unique: 8, opcode: READ (15), nodeid: 2, insize: 80, pid: 673
read[0] 4096 bytes from 0 flags: 0x8000
unique: 10, opcode: FLUSH (25), nodeid: 2, insize: 64, pid: 673
   unique: 10, error: -38 (Function not implemented), outsize: 16
unique: 11, opcode: INTERRUPT (36), nodeid: 0, insize: 48, pid: 0
FUSE_INTERRUPT: reply to kernel to disable interrupt
   unique: 11, error: -38 (Function not implemented), outsize: 16

unique: 12, opcode: RELEASE (18), nodeid: 2, insize: 64, pid: 0
   unique: 12, success, outsize: 16

hello_read woke up from sleep
   read[0] 13 bytes from 0
   unique: 8, success, outsize: 29


Are these the debug logs you are seeing from the daemon side as well?

Thanks,
Joanne
> >
> > From the crash logs you provided below, it looks like what's happening
> > is that if the process gets killed, the timer isn't getting deleted.
> > I'll look more into what happens in fuse when a process is killed and
> > get back to you on this.
>
> Thanks
>
> --
> Regards
> Yafang

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 0/2] fuse: add timeout option for requests
  2024-07-31 17:52       ` Joanne Koong
@ 2024-07-31 18:46         ` Joanne Koong
  2024-08-01  2:47           ` Yafang Shao
  0 siblings, 1 reply; 34+ messages in thread
From: Joanne Koong @ 2024-07-31 18:46 UTC (permalink / raw)
  To: Yafang Shao; +Cc: miklos, linux-fsdevel, josef, bernd.schubert, kernel-team

On Wed, Jul 31, 2024 at 10:52 AM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Tue, Jul 30, 2024 at 7:14 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > On Wed, Jul 31, 2024 at 2:16 AM Joanne Koong <joannelkoong@gmail.com> wrote:
> > >
> > > On Mon, Jul 29, 2024 at 11:00 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > >
> > > > On Tue, Jul 30, 2024 at 8:28 AM Joanne Koong <joannelkoong@gmail.com> wrote:
> > > > >
> > > > > There are situations where fuse servers can become unresponsive or take
> > > > > too long to reply to a request. Currently there is no upper bound on
> > > > > how long a request may take, which may be frustrating to users who get
> > > > > stuck waiting for a request to complete.
> > > > >
> > > > > This patchset adds a timeout option for requests and two dynamically
> > > > > configurable fuse sysctls "default_request_timeout" and "max_request_timeout"
> > > > > for controlling/enforcing timeout behavior system-wide.
> > > > >
> > > > > Existing fuse servers will not be affected unless they explicitly opt into the
> > > > > timeout.
> > > > >
> > > > > v1: https://lore.kernel.org/linux-fsdevel/20240717213458.1613347-1-joannelkoong@gmail.com/
> > > > > Changes from v1:
> > > > > - Add timeout for background requests
> > > > > - Handle resend race condition
> > > > > - Add sysctls
> > > > >
> > > > > Joanne Koong (2):
> > > > >   fuse: add optional kernel-enforced timeout for requests
> > > > >   fuse: add default_request_timeout and max_request_timeout sysctls
> > > > >
> > > > >  Documentation/admin-guide/sysctl/fs.rst |  17 +++
> > > > >  fs/fuse/Makefile                        |   2 +-
> > > > >  fs/fuse/dev.c                           | 187 +++++++++++++++++++++++-
> > > > >  fs/fuse/fuse_i.h                        |  30 ++++
> > > > >  fs/fuse/inode.c                         |  24 +++
> > > > >  fs/fuse/sysctl.c                        |  42 ++++++
> > > > >  6 files changed, 293 insertions(+), 9 deletions(-)
> > > > >  create mode 100644 fs/fuse/sysctl.c
> > > > >
> > > > > --
> > > > > 2.43.0
> > > > >
> > > >
> > > > Hello Joanne,
> > > >
> > > > Thanks for your update.
> > > >
> > > > I have tested your patches using my test case, which is similar to the
> > > > hello-fuse [0] example, with an additional change as follows:
> > > >
> > > > @@ -125,6 +125,8 @@ static int hello_read(const char *path, char *buf,
> > > > size_t size, off_t offset,
> > > >         } else
> > > >                 size = 0;
> > > >
> > > > +       // TO trigger timeout
> > > > +       sleep(60);
> > > >         return size;
> > > >  }
> > > >
> > > > [0] https://github.com/libfuse/libfuse/blob/master/example/hello.c
> > > >
> > > > However, it triggered a crash with the following setup:
> > > >
> > > > 1. Set FUSE timeout:
> > > >   sysctl -w fs.fuse.default_request_timeout=10
> > > >   sysctl -w fs.fuse.max_request_timeout = 20
> > > >
> > > > 2. Start FUSE daemon:
> > > >   ./hello /tmp/fuse
> > > >
> > > > 3. Read from FUSE:
> > > >   cat /tmp/fuse/hello
> > > >
> > > > 4. Kill the process within 10 seconds (to avoid the timeout being triggered).
> > > >    Then the crash will be triggered.
> > >
> > > Hi Yafang,
> > >
> > > Thanks for trying this out on your use case!
> > >
> > > How consistently are you able to repro this?
> >
> > It triggers the crash every time.
> >
> > > I tried reproing using
> > > your instructions above but I'm not able to get the crash.
> >
> > Please note that it is the `cat /tmp/fuse/hello` process that was
> > killed, not the fuse daemon.
> > The crash seems to occur when the fuse daemon wakes up after
> > sleep(60). Please ensure that the fuse daemon can be woken up.
> >
>
> I'm still not able to trigger the crash by killing the `cat
> /tmp/fuse/hello` process. This is how I'm repro-ing
>
> 1) Add sleep to test code in
> https://github.com/libfuse/libfuse/blob/master/example/hello.c
> @@ -125,6 +126,9 @@ static int hello_read(const char *path, char *buf,
> size_t size, off_t offset,
>         } else
>                 size = 0;
>
> +       sleep(60);
> +       printf("hello_read woke up from sleep\n");
> +
>         return size;
>  }
>
> 2)  Set fuse timeout to 10 seconds
> sysctl -w fs.fuse.default_request_timeout=10
>
> 3) Start fuse daemon
> ./example/hello ./tmp/fuse
>
> 4) Read from fuse
> cat /tmp/fuse/hello
>
> 5) Get pid of cat process
> top -b | grep cat
>
> 6) Kill cat process (within 10 seconds)
>  sudo kill -9 <cat-pid>
>
> 7) Wait 60 seconds for fuse's read request to complete
>
> From what it sounds like, this is exactly what you are doing as well?
>
> I added some kernel-side logs and I'm seeing that the read request is
> timing out after ~10 seconds and handled by the timeout handler
> successfully.
>
> On the fuse daemon side, these are the logs I'm seeing from the above repro:
> ./example/hello /tmp/fuse -f -d
>
> FUSE library version: 3.17.0
> nullpath_ok: 0
> unique: 2, opcode: INIT (26), nodeid: 0, insize: 104, pid: 0
> INIT: 7.40
> flags=0x73fffffb
> max_readahead=0x00020000
>    INIT: 7.40
>    flags=0x4040f039
>    max_readahead=0x00020000
>    max_write=0x00100000
>    max_background=0
>    congestion_threshold=0
>    time_gran=1
>    unique: 2, success, outsize: 80
> unique: 4, opcode: LOOKUP (1), nodeid: 1, insize: 46, pid: 673
> LOOKUP /hello
> getattr[NULL] /hello
>    NODEID: 2
>    unique: 4, success, outsize: 144
> unique: 6, opcode: OPEN (14), nodeid: 2, insize: 48, pid: 673
> open flags: 0x8000 /hello
>    open[0] flags: 0x8000 /hello
>    unique: 6, success, outsize: 32
> unique: 8, opcode: READ (15), nodeid: 2, insize: 80, pid: 673
> read[0] 4096 bytes from 0 flags: 0x8000
> unique: 10, opcode: FLUSH (25), nodeid: 2, insize: 64, pid: 673
>    unique: 10, error: -38 (Function not implemented), outsize: 16
> unique: 11, opcode: INTERRUPT (36), nodeid: 0, insize: 48, pid: 0
> FUSE_INTERRUPT: reply to kernel to disable interrupt
>    unique: 11, error: -38 (Function not implemented), outsize: 16
>
> unique: 12, opcode: RELEASE (18), nodeid: 2, insize: 64, pid: 0
>    unique: 12, success, outsize: 16
>
> hello_read woke up from sleep
>    read[0] 13 bytes from 0
>    unique: 8, success, outsize: 29
>
>
> Are these the debug logs you are seeing from the daemon side as well?
>
> Thanks,
> Joanne
> > >
> > > From the crash logs you provided below, it looks like what's happening
> > > is that if the process gets killed, the timer isn't getting deleted.

When I looked at this log previously, I thought you were repro-ing by
killing the fuse daemon process, not the cat process. When we kill the
cat process, the timer shouldn't be getting deleted. (if the daemon
itself is killed, the timers get deleted)

> > > I'll look more into what happens in fuse when a process is killed and
> > > get back to you on this.

This is the flow of what is happening on the kernel side (verified by
local printks) -

`cat /tmp/fuse/hello`:
Issues a FUSE_READ background request (via fuse_send_readpages(),
fm->fc->async_read). This request will have a timeout of 10 seconds on
it

The cat process is killed:
This does not clean up the request. The request is still on the fpq
processing list.

Timeout on request expires:
The timeout handler runs and properly cleans up / frees the request.

Fuse daemon wakes from sleep and replies to the request:
In dev_do_write(), the kernel won't be able to find this request
(since it timed out and was removed from the fpq processing list) and
return with -ENOENT

> >
> > Thanks
> >
> > --
> > Regards
> > Yafang

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 0/2] fuse: add timeout option for requests
  2024-07-31 18:46         ` Joanne Koong
@ 2024-08-01  2:47           ` Yafang Shao
  2024-08-02 19:05             ` Joanne Koong
  0 siblings, 1 reply; 34+ messages in thread
From: Yafang Shao @ 2024-08-01  2:47 UTC (permalink / raw)
  To: Joanne Koong; +Cc: miklos, linux-fsdevel, josef, bernd.schubert, kernel-team

On Thu, Aug 1, 2024 at 2:46 AM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Wed, Jul 31, 2024 at 10:52 AM Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > On Tue, Jul 30, 2024 at 7:14 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > >
> > > On Wed, Jul 31, 2024 at 2:16 AM Joanne Koong <joannelkoong@gmail.com> wrote:
> > > >
> > > > On Mon, Jul 29, 2024 at 11:00 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > > >
> > > > > On Tue, Jul 30, 2024 at 8:28 AM Joanne Koong <joannelkoong@gmail.com> wrote:
> > > > > >
> > > > > > There are situations where fuse servers can become unresponsive or take
> > > > > > too long to reply to a request. Currently there is no upper bound on
> > > > > > how long a request may take, which may be frustrating to users who get
> > > > > > stuck waiting for a request to complete.
> > > > > >
> > > > > > This patchset adds a timeout option for requests and two dynamically
> > > > > > configurable fuse sysctls "default_request_timeout" and "max_request_timeout"
> > > > > > for controlling/enforcing timeout behavior system-wide.
> > > > > >
> > > > > > Existing fuse servers will not be affected unless they explicitly opt into the
> > > > > > timeout.
> > > > > >
> > > > > > v1: https://lore.kernel.org/linux-fsdevel/20240717213458.1613347-1-joannelkoong@gmail.com/
> > > > > > Changes from v1:
> > > > > > - Add timeout for background requests
> > > > > > - Handle resend race condition
> > > > > > - Add sysctls
> > > > > >
> > > > > > Joanne Koong (2):
> > > > > >   fuse: add optional kernel-enforced timeout for requests
> > > > > >   fuse: add default_request_timeout and max_request_timeout sysctls
> > > > > >
> > > > > >  Documentation/admin-guide/sysctl/fs.rst |  17 +++
> > > > > >  fs/fuse/Makefile                        |   2 +-
> > > > > >  fs/fuse/dev.c                           | 187 +++++++++++++++++++++++-
> > > > > >  fs/fuse/fuse_i.h                        |  30 ++++
> > > > > >  fs/fuse/inode.c                         |  24 +++
> > > > > >  fs/fuse/sysctl.c                        |  42 ++++++
> > > > > >  6 files changed, 293 insertions(+), 9 deletions(-)
> > > > > >  create mode 100644 fs/fuse/sysctl.c
> > > > > >
> > > > > > --
> > > > > > 2.43.0
> > > > > >
> > > > >
> > > > > Hello Joanne,
> > > > >
> > > > > Thanks for your update.
> > > > >
> > > > > I have tested your patches using my test case, which is similar to the
> > > > > hello-fuse [0] example, with an additional change as follows:
> > > > >
> > > > > @@ -125,6 +125,8 @@ static int hello_read(const char *path, char *buf,
> > > > > size_t size, off_t offset,
> > > > >         } else
> > > > >                 size = 0;
> > > > >
> > > > > +       // TO trigger timeout
> > > > > +       sleep(60);
> > > > >         return size;
> > > > >  }
> > > > >
> > > > > [0] https://github.com/libfuse/libfuse/blob/master/example/hello.c
> > > > >
> > > > > However, it triggered a crash with the following setup:
> > > > >
> > > > > 1. Set FUSE timeout:
> > > > >   sysctl -w fs.fuse.default_request_timeout=10
> > > > >   sysctl -w fs.fuse.max_request_timeout = 20
> > > > >
> > > > > 2. Start FUSE daemon:
> > > > >   ./hello /tmp/fuse
> > > > >
> > > > > 3. Read from FUSE:
> > > > >   cat /tmp/fuse/hello
> > > > >
> > > > > 4. Kill the process within 10 seconds (to avoid the timeout being triggered).
> > > > >    Then the crash will be triggered.
> > > >
> > > > Hi Yafang,
> > > >
> > > > Thanks for trying this out on your use case!
> > > >
> > > > How consistently are you able to repro this?
> > >
> > > It triggers the crash every time.
> > >
> > > > I tried reproing using
> > > > your instructions above but I'm not able to get the crash.
> > >
> > > Please note that it is the `cat /tmp/fuse/hello` process that was
> > > killed, not the fuse daemon.
> > > The crash seems to occur when the fuse daemon wakes up after
> > > sleep(60). Please ensure that the fuse daemon can be woken up.
> > >
> >
> > I'm still not able to trigger the crash by killing the `cat
> > /tmp/fuse/hello` process. This is how I'm repro-ing
> >
> > 1) Add sleep to test code in
> > https://github.com/libfuse/libfuse/blob/master/example/hello.c
> > @@ -125,6 +126,9 @@ static int hello_read(const char *path, char *buf,
> > size_t size, off_t offset,
> >         } else
> >                 size = 0;
> >
> > +       sleep(60);
> > +       printf("hello_read woke up from sleep\n");
> > +
> >         return size;
> >  }
> >
> > 2)  Set fuse timeout to 10 seconds
> > sysctl -w fs.fuse.default_request_timeout=10
> >
> > 3) Start fuse daemon
> > ./example/hello ./tmp/fuse
> >
> > 4) Read from fuse
> > cat /tmp/fuse/hello
> >
> > 5) Get pid of cat process
> > top -b | grep cat
> >
> > 6) Kill cat process (within 10 seconds)
> >  sudo kill -9 <cat-pid>
> >
> > 7) Wait 60 seconds for fuse's read request to complete
> >
> > From what it sounds like, this is exactly what you are doing as well?
> >
> > I added some kernel-side logs and I'm seeing that the read request is
> > timing out after ~10 seconds and handled by the timeout handler
> > successfully.
> >
> > On the fuse daemon side, these are the logs I'm seeing from the above repro:
> > ./example/hello /tmp/fuse -f -d
> >
> > FUSE library version: 3.17.0
> > nullpath_ok: 0
> > unique: 2, opcode: INIT (26), nodeid: 0, insize: 104, pid: 0
> > INIT: 7.40
> > flags=0x73fffffb
> > max_readahead=0x00020000
> >    INIT: 7.40
> >    flags=0x4040f039
> >    max_readahead=0x00020000
> >    max_write=0x00100000
> >    max_background=0
> >    congestion_threshold=0
> >    time_gran=1
> >    unique: 2, success, outsize: 80
> > unique: 4, opcode: LOOKUP (1), nodeid: 1, insize: 46, pid: 673
> > LOOKUP /hello
> > getattr[NULL] /hello
> >    NODEID: 2
> >    unique: 4, success, outsize: 144
> > unique: 6, opcode: OPEN (14), nodeid: 2, insize: 48, pid: 673
> > open flags: 0x8000 /hello
> >    open[0] flags: 0x8000 /hello
> >    unique: 6, success, outsize: 32
> > unique: 8, opcode: READ (15), nodeid: 2, insize: 80, pid: 673
> > read[0] 4096 bytes from 0 flags: 0x8000
> > unique: 10, opcode: FLUSH (25), nodeid: 2, insize: 64, pid: 673
> >    unique: 10, error: -38 (Function not implemented), outsize: 16
> > unique: 11, opcode: INTERRUPT (36), nodeid: 0, insize: 48, pid: 0
> > FUSE_INTERRUPT: reply to kernel to disable interrupt
> >    unique: 11, error: -38 (Function not implemented), outsize: 16
> >
> > unique: 12, opcode: RELEASE (18), nodeid: 2, insize: 64, pid: 0
> >    unique: 12, success, outsize: 16
> >
> > hello_read woke up from sleep
> >    read[0] 13 bytes from 0
> >    unique: 8, success, outsize: 29
> >
> >
> > Are these the debug logs you are seeing from the daemon side as well?
> >
> > Thanks,
> > Joanne
> > > >
> > > > From the crash logs you provided below, it looks like what's happening
> > > > is that if the process gets killed, the timer isn't getting deleted.
>
> When I looked at this log previously, I thought you were repro-ing by
> killing the fuse daemon process, not the cat process. When we kill the
> cat process, the timer shouldn't be getting deleted. (if the daemon
> itself is killed, the timers get deleted)
>
> > > > I'll look more into what happens in fuse when a process is killed and
> > > > get back to you on this.
>
> This is the flow of what is happening on the kernel side (verified by
> local printks) -
>
> `cat /tmp/fuse/hello`:
> Issues a FUSE_READ background request (via fuse_send_readpages(),
> fm->fc->async_read). This request will have a timeout of 10 seconds on
> it
>
> The cat process is killed:
> This does not clean up the request. The request is still on the fpq
> processing list.
>
> Timeout on request expires:
> The timeout handler runs and properly cleans up / frees the request.
>
> Fuse daemon wakes from sleep and replies to the request:
> In dev_do_write(), the kernel won't be able to find this request
> (since it timed out and was removed from the fpq processing list) and
> return with -ENOENT

Thank you for your explanation.
I will verify if there are any issues with my test environment.

--
Regards
Yafang

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 0/2] fuse: add timeout option for requests
  2024-08-01  2:47           ` Yafang Shao
@ 2024-08-02 19:05             ` Joanne Koong
  2024-08-04  7:46               ` Yafang Shao
  0 siblings, 1 reply; 34+ messages in thread
From: Joanne Koong @ 2024-08-02 19:05 UTC (permalink / raw)
  To: Yafang Shao; +Cc: miklos, linux-fsdevel, josef, bernd.schubert, kernel-team

On Wed, Jul 31, 2024 at 7:47 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Thu, Aug 1, 2024 at 2:46 AM Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > On Wed, Jul 31, 2024 at 10:52 AM Joanne Koong <joannelkoong@gmail.com> wrote:
> > >
> > > On Tue, Jul 30, 2024 at 7:14 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > >
> > > > On Wed, Jul 31, 2024 at 2:16 AM Joanne Koong <joannelkoong@gmail.com> wrote:
> > > > >
> > > > > On Mon, Jul 29, 2024 at 11:00 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > > > >
> > > > > > On Tue, Jul 30, 2024 at 8:28 AM Joanne Koong <joannelkoong@gmail.com> wrote:
> > > > > > >
> > > > > > > There are situations where fuse servers can become unresponsive or take
> > > > > > > too long to reply to a request. Currently there is no upper bound on
> > > > > > > how long a request may take, which may be frustrating to users who get
> > > > > > > stuck waiting for a request to complete.
> > > > > > >
> > > > > > > This patchset adds a timeout option for requests and two dynamically
> > > > > > > configurable fuse sysctls "default_request_timeout" and "max_request_timeout"
> > > > > > > for controlling/enforcing timeout behavior system-wide.
> > > > > > >
> > > > > > > Existing fuse servers will not be affected unless they explicitly opt into the
> > > > > > > timeout.
> > > > > > >
> > > > > > > v1: https://lore.kernel.org/linux-fsdevel/20240717213458.1613347-1-joannelkoong@gmail.com/
> > > > > > > Changes from v1:
> > > > > > > - Add timeout for background requests
> > > > > > > - Handle resend race condition
> > > > > > > - Add sysctls
> > > > > > >
> > > > > > > Joanne Koong (2):
> > > > > > >   fuse: add optional kernel-enforced timeout for requests
> > > > > > >   fuse: add default_request_timeout and max_request_timeout sysctls
> > > > > > >
> > > > > > >  Documentation/admin-guide/sysctl/fs.rst |  17 +++
> > > > > > >  fs/fuse/Makefile                        |   2 +-
> > > > > > >  fs/fuse/dev.c                           | 187 +++++++++++++++++++++++-
> > > > > > >  fs/fuse/fuse_i.h                        |  30 ++++
> > > > > > >  fs/fuse/inode.c                         |  24 +++
> > > > > > >  fs/fuse/sysctl.c                        |  42 ++++++
> > > > > > >  6 files changed, 293 insertions(+), 9 deletions(-)
> > > > > > >  create mode 100644 fs/fuse/sysctl.c
> > > > > > >
> > > > > > > --
> > > > > > > 2.43.0
> > > > > > >
> > > > > >
> > > > > > Hello Joanne,
> > > > > >
> > > > > > Thanks for your update.
> > > > > >
> > > > > > I have tested your patches using my test case, which is similar to the
> > > > > > hello-fuse [0] example, with an additional change as follows:
> > > > > >
> > > > > > @@ -125,6 +125,8 @@ static int hello_read(const char *path, char *buf,
> > > > > > size_t size, off_t offset,
> > > > > >         } else
> > > > > >                 size = 0;
> > > > > >
> > > > > > +       // TO trigger timeout
> > > > > > +       sleep(60);
> > > > > >         return size;
> > > > > >  }
> > > > > >
> > > > > > [0] https://github.com/libfuse/libfuse/blob/master/example/hello.c
> > > > > >
> > > > > > However, it triggered a crash with the following setup:
> > > > > >
> > > > > > 1. Set FUSE timeout:
> > > > > >   sysctl -w fs.fuse.default_request_timeout=10
> > > > > >   sysctl -w fs.fuse.max_request_timeout = 20
> > > > > >
> > > > > > 2. Start FUSE daemon:
> > > > > >   ./hello /tmp/fuse
> > > > > >
> > > > > > 3. Read from FUSE:
> > > > > >   cat /tmp/fuse/hello
> > > > > >
> > > > > > 4. Kill the process within 10 seconds (to avoid the timeout being triggered).
> > > > > >    Then the crash will be triggered.
> > > > >
> > > > > Hi Yafang,
> > > > >
> > > > > Thanks for trying this out on your use case!
> > > > >
> > > > > How consistently are you able to repro this?
> > > >
> > > > It triggers the crash every time.
> > > >
> > > > > I tried reproing using
> > > > > your instructions above but I'm not able to get the crash.
> > > >
> > > > Please note that it is the `cat /tmp/fuse/hello` process that was
> > > > killed, not the fuse daemon.
> > > > The crash seems to occur when the fuse daemon wakes up after
> > > > sleep(60). Please ensure that the fuse daemon can be woken up.
> > > >
> > >
> > > I'm still not able to trigger the crash by killing the `cat
> > > /tmp/fuse/hello` process. This is how I'm repro-ing
> > >
> > > 1) Add sleep to test code in
> > > https://github.com/libfuse/libfuse/blob/master/example/hello.c
> > > @@ -125,6 +126,9 @@ static int hello_read(const char *path, char *buf,
> > > size_t size, off_t offset,
> > >         } else
> > >                 size = 0;
> > >
> > > +       sleep(60);
> > > +       printf("hello_read woke up from sleep\n");
> > > +
> > >         return size;
> > >  }
> > >
> > > 2)  Set fuse timeout to 10 seconds
> > > sysctl -w fs.fuse.default_request_timeout=10
> > >
> > > 3) Start fuse daemon
> > > ./example/hello ./tmp/fuse
> > >
> > > 4) Read from fuse
> > > cat /tmp/fuse/hello
> > >
> > > 5) Get pid of cat process
> > > top -b | grep cat
> > >
> > > 6) Kill cat process (within 10 seconds)
> > >  sudo kill -9 <cat-pid>
> > >
> > > 7) Wait 60 seconds for fuse's read request to complete
> > >
> > > From what it sounds like, this is exactly what you are doing as well?
> > >
> > > I added some kernel-side logs and I'm seeing that the read request is
> > > timing out after ~10 seconds and handled by the timeout handler
> > > successfully.
> > >
> > > On the fuse daemon side, these are the logs I'm seeing from the above repro:
> > > ./example/hello /tmp/fuse -f -d
> > >
> > > FUSE library version: 3.17.0
> > > nullpath_ok: 0
> > > unique: 2, opcode: INIT (26), nodeid: 0, insize: 104, pid: 0
> > > INIT: 7.40
> > > flags=0x73fffffb
> > > max_readahead=0x00020000
> > >    INIT: 7.40
> > >    flags=0x4040f039
> > >    max_readahead=0x00020000
> > >    max_write=0x00100000
> > >    max_background=0
> > >    congestion_threshold=0
> > >    time_gran=1
> > >    unique: 2, success, outsize: 80
> > > unique: 4, opcode: LOOKUP (1), nodeid: 1, insize: 46, pid: 673
> > > LOOKUP /hello
> > > getattr[NULL] /hello
> > >    NODEID: 2
> > >    unique: 4, success, outsize: 144
> > > unique: 6, opcode: OPEN (14), nodeid: 2, insize: 48, pid: 673
> > > open flags: 0x8000 /hello
> > >    open[0] flags: 0x8000 /hello
> > >    unique: 6, success, outsize: 32
> > > unique: 8, opcode: READ (15), nodeid: 2, insize: 80, pid: 673
> > > read[0] 4096 bytes from 0 flags: 0x8000
> > > unique: 10, opcode: FLUSH (25), nodeid: 2, insize: 64, pid: 673
> > >    unique: 10, error: -38 (Function not implemented), outsize: 16
> > > unique: 11, opcode: INTERRUPT (36), nodeid: 0, insize: 48, pid: 0
> > > FUSE_INTERRUPT: reply to kernel to disable interrupt
> > >    unique: 11, error: -38 (Function not implemented), outsize: 16
> > >
> > > unique: 12, opcode: RELEASE (18), nodeid: 2, insize: 64, pid: 0
> > >    unique: 12, success, outsize: 16
> > >
> > > hello_read woke up from sleep
> > >    read[0] 13 bytes from 0
> > >    unique: 8, success, outsize: 29
> > >
> > >
> > > Are these the debug logs you are seeing from the daemon side as well?
> > >
> > > Thanks,
> > > Joanne
> > > > >
> > > > > From the crash logs you provided below, it looks like what's happening
> > > > > is that if the process gets killed, the timer isn't getting deleted.
> >
> > When I looked at this log previously, I thought you were repro-ing by
> > killing the fuse daemon process, not the cat process. When we kill the
> > cat process, the timer shouldn't be getting deleted. (if the daemon
> > itself is killed, the timers get deleted)
> >
> > > > > I'll look more into what happens in fuse when a process is killed and
> > > > > get back to you on this.
> >
> > This is the flow of what is happening on the kernel side (verified by
> > local printks) -
> >
> > `cat /tmp/fuse/hello`:
> > Issues a FUSE_READ background request (via fuse_send_readpages(),
> > fm->fc->async_read). This request will have a timeout of 10 seconds on
> > it
> >
> > The cat process is killed:
> > This does not clean up the request. The request is still on the fpq
> > processing list.
> >
> > Timeout on request expires:
> > The timeout handler runs and properly cleans up / frees the request.
> >
> > Fuse daemon wakes from sleep and replies to the request:
> > In dev_do_write(), the kernel won't be able to find this request
> > (since it timed out and was removed from the fpq processing list) and
> > return with -ENOENT
>
> Thank you for your explanation.
> I will verify if there are any issues with my test environment.
>
Hi Yafang,

Would you mind adding these printks to your kernel when you run the
repro and pasting what they show?

--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -287,6 +287,9 @@ static void do_fuse_request_end(struct fuse_req
*req, bool from_timer_callback)
        struct fuse_conn *fc = fm->fc;
        struct fuse_iqueue *fiq = &fc->iq;

+       printk("do_fuse_request_end: req=%p, from_timer=%d,
req->timer.func=%d\n",
+              req, from_timer_callback, req->timer.function != NULL);
+
        if (from_timer_callback)
                req->out.h.error = -ETIME;

@@ -415,6 +418,8 @@ static void fuse_request_timeout(struct timer_list *timer)
 {
        struct fuse_req *req = container_of(timer, struct fuse_req, timer);

+       printk("fuse_request_timeout: req=%p\n", req);
+
        /*
         * Request reply is being finished by the kernel right now.
         * No need to time out the request.
@@ -612,6 +617,7 @@ ssize_t fuse_simple_request(struct fuse_mount *fm,
struct fuse_args *args)

        if (!args->noreply)
                __set_bit(FR_ISREPLY, &req->flags);
+       printk("fuse_simple_request: req=%p, op=%u\n", req, args->opcode);
        __fuse_request_send(req);
        ret = req->out.h.error;
        if (!ret && args->out_argvar) {
@@ -673,6 +679,7 @@ int fuse_simple_background(struct fuse_mount *fm,
struct fuse_args *args,

        fuse_args_to_req(req, args);

+       printk("fuse_background_request: req=%p, op=%u\n", req, args->opcode);
        if (!fuse_request_queue_background(req)) {
                fuse_put_request(req);


When I run it on my side, I see

[   68.117740] fuse_background_request: req=00000000874e2f14, op=26
[   68.131440] do_fuse_request_end: req=00000000874e2f14,
from_timer=0, req->timer.func=1
[   71.558538] fuse_simple_request: req=00000000cf643ace, op=1
[   71.559651] do_fuse_request_end: req=00000000cf643ace,
from_timer=0, req->timer.func=1
[   71.561044] fuse_simple_request: req=00000000f2c001f0, op=14
[   71.562524] do_fuse_request_end: req=00000000f2c001f0,
from_timer=0, req->timer.func=1
[   71.563820] fuse_background_request: req=00000000584f2cc3, op=15
[   78.580035] fuse_simple_request: req=00000000ecbee970, op=25
[   78.582614] do_fuse_request_end: req=00000000ecbee970,
from_timer=0, req->timer.func=1
[   81.624722] fuse_request_timeout: req=00000000584f2cc3
[   81.625443] do_fuse_request_end: req=00000000584f2cc3,
from_timer=1, req->timer.func=1
[   81.626377] fuse_background_request: req=00000000b2d792ed, op=18
[   81.627623] do_fuse_request_end: req=00000000b2d792ed,
from_timer=0, req->timer.func=1

I'm seeing only one timer get called, on the read request (opcode=15),
and I'm not seeing do_fuse_request_end having been called on that
request before the timer is invoked.
I'm curious to compare this against the logs on your end.

Thanks!!

> --
> Regards
> Yafang

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 0/2] fuse: add timeout option for requests
  2024-08-02 19:05             ` Joanne Koong
@ 2024-08-04  7:46               ` Yafang Shao
  2024-08-05  5:05                 ` Joanne Koong
  0 siblings, 1 reply; 34+ messages in thread
From: Yafang Shao @ 2024-08-04  7:46 UTC (permalink / raw)
  To: Joanne Koong; +Cc: miklos, linux-fsdevel, josef, bernd.schubert, kernel-team

On Sat, Aug 3, 2024 at 3:05 AM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Wed, Jul 31, 2024 at 7:47 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > On Thu, Aug 1, 2024 at 2:46 AM Joanne Koong <joannelkoong@gmail.com> wrote:
> > >
> > > On Wed, Jul 31, 2024 at 10:52 AM Joanne Koong <joannelkoong@gmail.com> wrote:
> > > >
> > > > On Tue, Jul 30, 2024 at 7:14 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > > >
> > > > > On Wed, Jul 31, 2024 at 2:16 AM Joanne Koong <joannelkoong@gmail.com> wrote:
> > > > > >
> > > > > > On Mon, Jul 29, 2024 at 11:00 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > > > > >
> > > > > > > On Tue, Jul 30, 2024 at 8:28 AM Joanne Koong <joannelkoong@gmail.com> wrote:
> > > > > > > >
> > > > > > > > There are situations where fuse servers can become unresponsive or take
> > > > > > > > too long to reply to a request. Currently there is no upper bound on
> > > > > > > > how long a request may take, which may be frustrating to users who get
> > > > > > > > stuck waiting for a request to complete.
> > > > > > > >
> > > > > > > > This patchset adds a timeout option for requests and two dynamically
> > > > > > > > configurable fuse sysctls "default_request_timeout" and "max_request_timeout"
> > > > > > > > for controlling/enforcing timeout behavior system-wide.
> > > > > > > >
> > > > > > > > Existing fuse servers will not be affected unless they explicitly opt into the
> > > > > > > > timeout.
> > > > > > > >
> > > > > > > > v1: https://lore.kernel.org/linux-fsdevel/20240717213458.1613347-1-joannelkoong@gmail.com/
> > > > > > > > Changes from v1:
> > > > > > > > - Add timeout for background requests
> > > > > > > > - Handle resend race condition
> > > > > > > > - Add sysctls
> > > > > > > >
> > > > > > > > Joanne Koong (2):
> > > > > > > >   fuse: add optional kernel-enforced timeout for requests
> > > > > > > >   fuse: add default_request_timeout and max_request_timeout sysctls
> > > > > > > >
> > > > > > > >  Documentation/admin-guide/sysctl/fs.rst |  17 +++
> > > > > > > >  fs/fuse/Makefile                        |   2 +-
> > > > > > > >  fs/fuse/dev.c                           | 187 +++++++++++++++++++++++-
> > > > > > > >  fs/fuse/fuse_i.h                        |  30 ++++
> > > > > > > >  fs/fuse/inode.c                         |  24 +++
> > > > > > > >  fs/fuse/sysctl.c                        |  42 ++++++
> > > > > > > >  6 files changed, 293 insertions(+), 9 deletions(-)
> > > > > > > >  create mode 100644 fs/fuse/sysctl.c
> > > > > > > >
> > > > > > > > --
> > > > > > > > 2.43.0
> > > > > > > >
> > > > > > >
> > > > > > > Hello Joanne,
> > > > > > >
> > > > > > > Thanks for your update.
> > > > > > >
> > > > > > > I have tested your patches using my test case, which is similar to the
> > > > > > > hello-fuse [0] example, with an additional change as follows:
> > > > > > >
> > > > > > > @@ -125,6 +125,8 @@ static int hello_read(const char *path, char *buf,
> > > > > > > size_t size, off_t offset,
> > > > > > >         } else
> > > > > > >                 size = 0;
> > > > > > >
> > > > > > > +       // TO trigger timeout
> > > > > > > +       sleep(60);
> > > > > > >         return size;
> > > > > > >  }
> > > > > > >
> > > > > > > [0] https://github.com/libfuse/libfuse/blob/master/example/hello.c
> > > > > > >
> > > > > > > However, it triggered a crash with the following setup:
> > > > > > >
> > > > > > > 1. Set FUSE timeout:
> > > > > > >   sysctl -w fs.fuse.default_request_timeout=10
> > > > > > >   sysctl -w fs.fuse.max_request_timeout = 20
> > > > > > >
> > > > > > > 2. Start FUSE daemon:
> > > > > > >   ./hello /tmp/fuse
> > > > > > >
> > > > > > > 3. Read from FUSE:
> > > > > > >   cat /tmp/fuse/hello
> > > > > > >
> > > > > > > 4. Kill the process within 10 seconds (to avoid the timeout being triggered).
> > > > > > >    Then the crash will be triggered.
> > > > > >
> > > > > > Hi Yafang,
> > > > > >
> > > > > > Thanks for trying this out on your use case!
> > > > > >
> > > > > > How consistently are you able to repro this?
> > > > >
> > > > > It triggers the crash every time.
> > > > >
> > > > > > I tried reproing using
> > > > > > your instructions above but I'm not able to get the crash.
> > > > >
> > > > > Please note that it is the `cat /tmp/fuse/hello` process that was
> > > > > killed, not the fuse daemon.
> > > > > The crash seems to occur when the fuse daemon wakes up after
> > > > > sleep(60). Please ensure that the fuse daemon can be woken up.
> > > > >
> > > >
> > > > I'm still not able to trigger the crash by killing the `cat
> > > > /tmp/fuse/hello` process. This is how I'm repro-ing
> > > >
> > > > 1) Add sleep to test code in
> > > > https://github.com/libfuse/libfuse/blob/master/example/hello.c
> > > > @@ -125,6 +126,9 @@ static int hello_read(const char *path, char *buf,
> > > > size_t size, off_t offset,
> > > >         } else
> > > >                 size = 0;
> > > >
> > > > +       sleep(60);
> > > > +       printf("hello_read woke up from sleep\n");
> > > > +
> > > >         return size;
> > > >  }
> > > >
> > > > 2)  Set fuse timeout to 10 seconds
> > > > sysctl -w fs.fuse.default_request_timeout=10
> > > >
> > > > 3) Start fuse daemon
> > > > ./example/hello ./tmp/fuse
> > > >
> > > > 4) Read from fuse
> > > > cat /tmp/fuse/hello
> > > >
> > > > 5) Get pid of cat process
> > > > top -b | grep cat
> > > >
> > > > 6) Kill cat process (within 10 seconds)
> > > >  sudo kill -9 <cat-pid>
> > > >
> > > > 7) Wait 60 seconds for fuse's read request to complete
> > > >
> > > > From what it sounds like, this is exactly what you are doing as well?
> > > >
> > > > I added some kernel-side logs and I'm seeing that the read request is
> > > > timing out after ~10 seconds and handled by the timeout handler
> > > > successfully.
> > > >
> > > > On the fuse daemon side, these are the logs I'm seeing from the above repro:
> > > > ./example/hello /tmp/fuse -f -d
> > > >
> > > > FUSE library version: 3.17.0
> > > > nullpath_ok: 0
> > > > unique: 2, opcode: INIT (26), nodeid: 0, insize: 104, pid: 0
> > > > INIT: 7.40
> > > > flags=0x73fffffb
> > > > max_readahead=0x00020000
> > > >    INIT: 7.40
> > > >    flags=0x4040f039
> > > >    max_readahead=0x00020000
> > > >    max_write=0x00100000
> > > >    max_background=0
> > > >    congestion_threshold=0
> > > >    time_gran=1
> > > >    unique: 2, success, outsize: 80
> > > > unique: 4, opcode: LOOKUP (1), nodeid: 1, insize: 46, pid: 673
> > > > LOOKUP /hello
> > > > getattr[NULL] /hello
> > > >    NODEID: 2
> > > >    unique: 4, success, outsize: 144
> > > > unique: 6, opcode: OPEN (14), nodeid: 2, insize: 48, pid: 673
> > > > open flags: 0x8000 /hello
> > > >    open[0] flags: 0x8000 /hello
> > > >    unique: 6, success, outsize: 32
> > > > unique: 8, opcode: READ (15), nodeid: 2, insize: 80, pid: 673
> > > > read[0] 4096 bytes from 0 flags: 0x8000
> > > > unique: 10, opcode: FLUSH (25), nodeid: 2, insize: 64, pid: 673
> > > >    unique: 10, error: -38 (Function not implemented), outsize: 16
> > > > unique: 11, opcode: INTERRUPT (36), nodeid: 0, insize: 48, pid: 0
> > > > FUSE_INTERRUPT: reply to kernel to disable interrupt
> > > >    unique: 11, error: -38 (Function not implemented), outsize: 16
> > > >
> > > > unique: 12, opcode: RELEASE (18), nodeid: 2, insize: 64, pid: 0
> > > >    unique: 12, success, outsize: 16
> > > >
> > > > hello_read woke up from sleep
> > > >    read[0] 13 bytes from 0
> > > >    unique: 8, success, outsize: 29
> > > >
> > > >
> > > > Are these the debug logs you are seeing from the daemon side as well?
> > > >
> > > > Thanks,
> > > > Joanne
> > > > > >
> > > > > > From the crash logs you provided below, it looks like what's happening
> > > > > > is that if the process gets killed, the timer isn't getting deleted.
> > >
> > > When I looked at this log previously, I thought you were repro-ing by
> > > killing the fuse daemon process, not the cat process. When we kill the
> > > cat process, the timer shouldn't be getting deleted. (if the daemon
> > > itself is killed, the timers get deleted)
> > >
> > > > > > I'll look more into what happens in fuse when a process is killed and
> > > > > > get back to you on this.
> > >
> > > This is the flow of what is happening on the kernel side (verified by
> > > local printks) -
> > >
> > > `cat /tmp/fuse/hello`:
> > > Issues a FUSE_READ background request (via fuse_send_readpages(),
> > > fm->fc->async_read). This request will have a timeout of 10 seconds on
> > > it
> > >
> > > The cat process is killed:
> > > This does not clean up the request. The request is still on the fpq
> > > processing list.
> > >
> > > Timeout on request expires:
> > > The timeout handler runs and properly cleans up / frees the request.
> > >
> > > Fuse daemon wakes from sleep and replies to the request:
> > > In dev_do_write(), the kernel won't be able to find this request
> > > (since it timed out and was removed from the fpq processing list) and
> > > return with -ENOENT
> >
> > Thank you for your explanation.
> > I will verify if there are any issues with my test environment.
> >
> Hi Yafang,
>
> Would you mind adding these printks to your kernel when you run the
> repro and pasting what they show?
>
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -287,6 +287,9 @@ static void do_fuse_request_end(struct fuse_req
> *req, bool from_timer_callback)
>         struct fuse_conn *fc = fm->fc;
>         struct fuse_iqueue *fiq = &fc->iq;
>
> +       printk("do_fuse_request_end: req=%p, from_timer=%d,
> req->timer.func=%d\n",
> +              req, from_timer_callback, req->timer.function != NULL);
> +
>         if (from_timer_callback)
>                 req->out.h.error = -ETIME;
>
> @@ -415,6 +418,8 @@ static void fuse_request_timeout(struct timer_list *timer)
>  {
>         struct fuse_req *req = container_of(timer, struct fuse_req, timer);
>
> +       printk("fuse_request_timeout: req=%p\n", req);
> +
>         /*
>          * Request reply is being finished by the kernel right now.
>          * No need to time out the request.
> @@ -612,6 +617,7 @@ ssize_t fuse_simple_request(struct fuse_mount *fm,
> struct fuse_args *args)
>
>         if (!args->noreply)
>                 __set_bit(FR_ISREPLY, &req->flags);
> +       printk("fuse_simple_request: req=%p, op=%u\n", req, args->opcode);
>         __fuse_request_send(req);
>         ret = req->out.h.error;
>         if (!ret && args->out_argvar) {
> @@ -673,6 +679,7 @@ int fuse_simple_background(struct fuse_mount *fm,
> struct fuse_args *args,
>
>         fuse_args_to_req(req, args);
>
> +       printk("fuse_background_request: req=%p, op=%u\n", req, args->opcode);
>         if (!fuse_request_queue_background(req)) {
>                 fuse_put_request(req);
>
>
> When I run it on my side, I see
>
> [   68.117740] fuse_background_request: req=00000000874e2f14, op=26
> [   68.131440] do_fuse_request_end: req=00000000874e2f14,
> from_timer=0, req->timer.func=1
> [   71.558538] fuse_simple_request: req=00000000cf643ace, op=1
> [   71.559651] do_fuse_request_end: req=00000000cf643ace,
> from_timer=0, req->timer.func=1
> [   71.561044] fuse_simple_request: req=00000000f2c001f0, op=14
> [   71.562524] do_fuse_request_end: req=00000000f2c001f0,
> from_timer=0, req->timer.func=1
> [   71.563820] fuse_background_request: req=00000000584f2cc3, op=15
> [   78.580035] fuse_simple_request: req=00000000ecbee970, op=25
> [   78.582614] do_fuse_request_end: req=00000000ecbee970,
> from_timer=0, req->timer.func=1
> [   81.624722] fuse_request_timeout: req=00000000584f2cc3
> [   81.625443] do_fuse_request_end: req=00000000584f2cc3,
> from_timer=1, req->timer.func=1
> [   81.626377] fuse_background_request: req=00000000b2d792ed, op=18
> [   81.627623] do_fuse_request_end: req=00000000b2d792ed,
> from_timer=0, req->timer.func=1
>
> I'm seeing only one timer get called, on the read request (opcode=15),
> and I'm not seeing do_fuse_request_end having been called on that
> request before the timer is invoked.
> I'm curious to compare this against the logs on your end.

The log on my side is as follows,

[  283.329421] fuse_background_request: req=000000002b4f82d4, op=26
[  283.330043] do_fuse_request_end: req=000000002b4f82d4,
from_timer=0, req->timer.func=0
[  287.889844] fuse_simple_request: req=00000000865e85bf, op=3
[  287.889914] do_fuse_request_end: req=00000000865e85bf,
from_timer=0, req->timer.func=0
[  287.889933] fuse_simple_request: req=00000000865e85bf, op=22
[  287.889994] do_fuse_request_end: req=00000000865e85bf,
from_timer=0, req->timer.func=0
[  287.890096] fuse_simple_request: req=00000000865e85bf, op=27
[  287.890130] do_fuse_request_end: req=00000000865e85bf,
from_timer=0, req->timer.func=0
[  287.890142] fuse_simple_request: req=00000000865e85bf, op=28
[  287.890167] do_fuse_request_end: req=00000000865e85bf,
from_timer=0, req->timer.func=0
[  287.890178] fuse_simple_request: req=00000000865e85bf, op=1
[  287.890191] do_fuse_request_end: req=00000000865e85bf,
from_timer=0, req->timer.func=0
[  287.890209] fuse_simple_request: req=00000000865e85bf, op=28
[  287.890216] do_fuse_request_end: req=00000000865e85bf,
from_timer=0, req->timer.func=0
[  287.890222] fuse_background_request: req=00000000865e85bf, op=29
[  287.890230] do_fuse_request_end: req=00000000865e85bf,
from_timer=0, req->timer.func=0
[  312.311752] fuse_background_request: req=00000000a8da8b44, op=26
[  312.312249] do_fuse_request_end: req=00000000a8da8b44,
from_timer=0, req->timer.func=1
[  317.368786] fuse_simple_request: req=00000000bc4817dd, op=1
[  317.368871] do_fuse_request_end: req=00000000bc4817dd,
from_timer=0, req->timer.func=1
[  317.368910] fuse_simple_request: req=00000000bc4817dd, op=14
[  317.368942] do_fuse_request_end: req=00000000bc4817dd,
from_timer=0, req->timer.func=1
[  317.368967] fuse_simple_request: req=00000000bc4817dd, op=15
[  327.855189] fuse_request_timeout: req=00000000bc4817dd
[  327.855195] do_fuse_request_end: req=00000000bc4817dd,
from_timer=1, req->timer.func=1
[  327.855218] fuse_simple_request: req=00000000c34cc363, op=15
[  327.855328] fuse_simple_request: req=00000000c34cc363, op=25
[  327.855401] do_fuse_request_end: req=00000000c34cc363,
from_timer=0, req->timer.func=1
[  327.855496] fuse_background_request: req=00000000c34cc363, op=18
[  327.855508] do_fuse_request_end: req=00000000c34cc363,
from_timer=0, req->timer.func=1
[  338.095136] Oops: general protection fault, probably for
non-canonical address 0xdead00000000012a: 0000 [#1] PREEMPT SMP NOPTI
[  338.096415] CPU: 58 PID: 0 Comm: swapper/58 Kdump: loaded Not
tainted 6.10.0+ #8
[  338.098219] RIP: 0010:__run_timers+0x27e/0x360
[  338.098686] Code: 07 48 c7 43 08 00 00 00 00 48 85 c0 74 78 4d 8b
2f 4c 89 6b 08 0f 1f 44 00 00 49 8b 45 00 49 8b 55 08 48 89 02 48 85
c0 74 04 <48> 89 50 08 4d 8b 65 18 49 c7 45 08 00 00 00 00 48 b8 22 01
00 00
[  338.100381] RSP: 0018:ffffb4ef808bced8 EFLAGS: 00010086
[  338.100907] RAX: dead000000000122 RBX: ffff9827ffca13c0 RCX: 0000000000000001
[  338.101623] RDX: ffffb4ef808bcef8 RSI: 0000000000000000 RDI: ffff9827ffca13e8
[  338.102333] RBP: ffffb4ef808bcf70 R08: 000000000000008b R09: ffff9827ffca1430
[  338.103020] R10: ffffffff93e060c0 R11: 0000000000000089 R12: 0000000000000001
[  338.103726] R13: ffff97e9dc06a0a0 R14: 0000000100009200 R15: ffffb4ef808bcef8
[  338.104439] FS:  0000000000000000(0000) GS:ffff9827ffc80000(0000)
knlGS:0000000000000000
[  338.105229] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  338.105795] CR2: 000000c002f99340 CR3: 0000000148254001 CR4: 0000000000370ef0
[  338.106502] Call Trace:
[  338.106836]  <IRQ>
[  338.107175]  ? show_regs+0x69/0x80
[  338.107603]  ? die_addr+0x38/0x90
[  338.108005]  ? exc_general_protection+0x236/0x490
[  338.108557]  ? asm_exc_general_protection+0x27/0x30
[  338.109095]  ? __run_timers+0x27e/0x360
[  338.109563]  ? __run_timers+0x1b4/0x360
[  338.110009]  ? kvm_sched_clock_read+0x11/0x20
[  338.110528]  ? sched_clock_noinstr+0x9/0x10
[  338.111002]  ? sched_clock+0x10/0x30
[  338.111447]  ? sched_clock_cpu+0x10/0x190
[  338.111914]  run_timer_softirq+0x3a/0x60
[  338.112406]  handle_softirqs+0x118/0x350
[  338.112859]  irq_exit_rcu+0x60/0x80
[  338.113295]  sysvec_apic_timer_interrupt+0x7f/0x90
[  338.113823]  </IRQ>
[  338.114147]  <TASK>
[  338.114447]  asm_sysvec_apic_timer_interrupt+0x1b/0x20
[  338.115002] RIP: 0010:default_idle+0xb/0x20
[  338.115498] Code: 00 4d 29 c8 4c 01 c7 4c 29 c2 e9 6e ff ff ff 90
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 eb 07 0f 00 2d b3 51 33
00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40
00 90
[  338.117337] RSP: 0018:ffffb4ef8028fe18 EFLAGS: 00000246
[  338.117894] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 0001b48ebb3a1032
[  338.118673] RDX: 0000000000000001 RSI: ffffffff9412e060 RDI: ffff9827ffcbc8e0
[  338.119415] RBP: ffffb4ef8028fe20 R08: 0000004eb7fb01b4 R09: 0000000000000001
[  338.120151] R10: ffffffff93e56080 R11: 0000000000000001 R12: 0000000000000001
[  338.120872] R13: ffffffff9412e060 R14: ffffffff9412e0e0 R15: 0000000000000001
[  338.121615]  ? ct_kernel_exit.constprop.0+0x79/0x90
[  338.122171]  ? arch_cpu_idle+0x9/0x10
[  338.122602]  default_enter_idle+0x22/0x2f
[  338.123064]  cpuidle_enter_state+0x88/0x430
[  338.123556]  cpuidle_enter+0x34/0x50
[  338.123978]  call_cpuidle+0x22/0x50
[  338.124449]  cpuidle_idle_call+0xd2/0x120
[  338.124909]  do_idle+0x77/0xd0
[  338.125313]  cpu_startup_entry+0x2c/0x30
[  338.125763]  start_secondary+0x117/0x140
[  338.126240]  common_startup_64+0x13e/0x141
[  338.126711]  </TASK>

In addition to the hello-fuse, there is another FUSE daemon, lxcfs,
running on my test server. After disabling lxcfs, the system no longer
panics, but there are still error logs:

[  285.804534] fuse_background_request: req=0000000063502a93, op=26
[  285.805041] do_fuse_request_end: req=0000000063502a93,
from_timer=0, req->timer.func=1
[  290.967412] fuse_simple_request: req=000000003f362e4b, op=1
[  290.967480] do_fuse_request_end: req=000000003f362e4b,
from_timer=0, req->timer.func=1
[  290.967517] fuse_simple_request: req=000000003f362e4b, op=14
[  290.967585] do_fuse_request_end: req=000000003f362e4b,
from_timer=0, req->timer.func=1
[  290.967655] fuse_simple_request: req=000000003f362e4b, op=15
[  300.996023] fuse_request_timeout: req=000000003f362e4b
[  300.996030] do_fuse_request_end: req=000000003f362e4b,
from_timer=1, req->timer.func=1
[  300.996066] fuse_simple_request: req=00000000b4182f02, op=15
[  300.996180] fuse_simple_request: req=000000003f362e4b, op=25
[  300.996185] ==================================================================
[  300.996980] BUG: KFENCE: use-after-free write in enqueue_timer+0x24/0xb0

[  300.997788] Use-after-free write at 0x0000000022312cb7 (in kfence-#156):
[  300.998476]  enqueue_timer+0x24/0xb0
[  300.998479]  __mod_timer+0x23b/0x360
[  300.998481]  add_timer+0x20/0x30
[  300.998483]  fuse_simple_request+0x1bc/0x2f0 [fuse]
[  300.998506]  fuse_flush+0x1ac/0x1f0 [fuse]
[  300.998511]  filp_flush+0x39/0x90
[  300.998517]  filp_close+0x15/0x30
[  300.998519]  put_files_struct+0x77/0xe0
[  300.998522]  exit_files+0x47/0x60
[  300.998524]  do_exit+0x262/0x480
[  300.998528]  do_group_exit+0x34/0x90
[  300.998531]  get_signal+0x92f/0x980
[  300.998534]  arch_do_signal_or_restart+0x2a/0x100
[  300.998537]  syscall_exit_to_user_mode+0xe3/0x1a0
[  300.998541]  do_syscall_64+0x71/0x170
[  300.998545]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

[  300.998759] kfence-#156: 0x00000000b4182f02-0x0000000084fc5c46,
size=200, cache=ip4-frags

[  300.998761] allocated by task 15064 on cpu 26 at 300.996061s:
[  300.998766]  fuse_request_alloc+0x21/0xb0 [fuse]
[  300.998771]  fuse_get_req+0xde/0x270 [fuse]
[  300.998775]  fuse_simple_request+0x33/0x2f0 [fuse]
[  300.998779]  fuse_do_readpage+0x15e/0x200 [fuse]
[  300.998783]  fuse_read_folio+0x29/0x60 [fuse]
[  300.998787]  filemap_read_folio+0x3b/0xe0
[  300.998791]  filemap_update_page+0x236/0x2d0
[  300.998792]  filemap_get_pages+0x225/0x390
[  300.998794]  filemap_read+0xed/0x3a0
[  300.998796]  generic_file_read_iter+0xb8/0x100
[  300.998798]  fuse_file_read_iter+0xd8/0x150 [fuse]
[  300.998804]  vfs_read+0x25e/0x340
[  300.998806]  ksys_read+0x67/0xf0
[  300.998808]  __x64_sys_read+0x19/0x20
[  300.998810]  x64_sys_call+0x1709/0x20b0
[  300.998813]  do_syscall_64+0x65/0x170
[  300.998815]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

[  300.998817] freed by task 15064 on cpu 26 at 300.996084s:
[  300.998822]  fuse_put_request+0x89/0xf0 [fuse]
[  300.998826]  fuse_simple_request+0xe1/0x2f0 [fuse]
[  300.998830]  fuse_do_readpage+0x15e/0x200 [fuse]
[  300.998835]  fuse_read_folio+0x29/0x60 [fuse]
[  300.998839]  filemap_read_folio+0x3b/0xe0
[  300.998840]  filemap_update_page+0x236/0x2d0
[  300.998842]  filemap_get_pages+0x225/0x390
[  300.998844]  filemap_read+0xed/0x3a0
[  300.998846]  generic_file_read_iter+0xb8/0x100
[  300.998848]  fuse_file_read_iter+0xd8/0x150 [fuse]
[  300.998852]  vfs_read+0x25e/0x340
[  300.998854]  ksys_read+0x67/0xf0
[  300.998856]  __x64_sys_read+0x19/0x20
[  300.998857]  x64_sys_call+0x1709/0x20b0
[  300.998859]  do_syscall_64+0x65/0x170
[  300.998860]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

[  300.999115] CPU: 26 PID: 15064 Comm: cat Kdump: loaded Not tainted 6.10.0+ #8
[  301.000803] ==================================================================
[  301.001695] do_fuse_request_end: req=000000003f362e4b,
from_timer=0, req->timer.func=1
[  301.001723] fuse_background_request: req=000000003f362e4b, op=18
[  301.001767] do_fuse_request_end: req=000000003f362e4b,
from_timer=0, req->timer.func=1
[  311.235964] fuse_request_timeout: req=00000000b4182f02
[  311.235969] ------------[ cut here ]------------
[  311.235970] list_del corruption, ffff9a8072d3a000->next is
LIST_POISON1 (dead000000000100)
[  311.235982] WARNING: CPU: 26 PID: 0 at lib/list_debug.c:56
__list_del_entry_valid_or_report+0x8a/0xf0
[  311.236036] CPU: 26 PID: 0 Comm: swapper/26 Kdump: loaded Tainted:
G    B              6.10.0+ #8
[  311.236040] RIP: 0010:__list_del_entry_valid_or_report+0x8a/0xf0
[  311.236043] Code: 31 c0 5d c3 cc cc cc cc 48 c7 c7 60 7a 5e b0 e8
cc ea a4 ff 0f 0b 31 c0 5d c3 cc cc cc cc 48 c7 c7 88 7a 5e b0 e8 b6
ea a4 ff <0f> 0b 31 c0 5d c3 cc cc cc cc 48 89 ca 48 c7 c7 c0 7a 5e b0
e8 9d
[  311.236045] RSP: 0018:ffffb6364056ce60 EFLAGS: 00010282
[  311.236047] RAX: 0000000000000000 RBX: ffff9a8072d3a0a0 RCX: 0000000000000027
[  311.236048] RDX: ffff9a807f4a0848 RSI: 0000000000000001 RDI: ffff9a807f4a0840
[  311.236049] RBP: ffffb6364056ce60 R08: 0000000000000000 R09: ffffb6364056cce0
[  311.236050] R10: ffffb6364056ccd8 R11: ffffffffb1017ee8 R12: ffff9a8072d3a000
[  311.236051] R13: ffff9a420d5af000 R14: 0000000100002800 R15: ffff9a420d5af054
[  311.236054] FS:  0000000000000000(0000) GS:ffff9a807f480000(0000)
knlGS:0000000000000000
[  311.236056] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  311.236057] CR2: 000000c000dc5000 CR3: 000000010cc38003 CR4: 0000000000370ef0
[  311.236058] Call Trace:
[  311.236059]  <IRQ>
[  311.236061]  ? show_regs+0x69/0x80
[  311.236065]  ? __warn+0x88/0x130
[  311.236068]  ? __list_del_entry_valid_or_report+0x8a/0xf0
[  311.236070]  ? report_bug+0x18f/0x1a0
[  311.236074]  ? handle_bug+0x40/0x70
[  311.236077]  ? exc_invalid_op+0x19/0x70
[  311.236079]  ? asm_exc_invalid_op+0x1b/0x20
[  311.236083]  ? __list_del_entry_valid_or_report+0x8a/0xf0
[  311.236086]  fuse_request_timeout+0x15c/0x1a0 [fuse]
[  311.236094]  ? __pfx_fuse_request_timeout+0x10/0x10 [fuse]
[  311.236099]  call_timer_fn+0x2c/0x130
[  311.236102]  ? __pfx_fuse_request_timeout+0x10/0x10 [fuse]
[  311.236106]  __run_timers+0x2c2/0x360
[  311.236108]  ? kvm_sched_clock_read+0x11/0x20
[  311.236110]  ? sched_clock_noinstr+0x9/0x10
[  311.236111]  ? sched_clock+0x10/0x30
[  311.236114]  ? sched_clock_cpu+0x10/0x190
[  311.236116]  run_timer_softirq+0x3a/0x60
[  311.236118]  handle_softirqs+0x118/0x350
[  311.236121]  irq_exit_rcu+0x60/0x80
[  311.236123]  sysvec_apic_timer_interrupt+0x7f/0x90
[  311.236124]  </IRQ>
[  311.236125]  <TASK>
[  311.236126]  asm_sysvec_apic_timer_interrupt+0x1b/0x20
[  311.236128] RIP: 0010:default_idle+0xb/0x20
[  311.236130] Code: 00 4d 29 c8 4c 01 c7 4c 29 c2 e9 6e ff ff ff 90
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 eb 07 0f 00 2d b3 51 33
00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40
00 90
[  311.236131] RSP: 0018:ffffb6364018fe18 EFLAGS: 00000246
[  311.236133] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 0001c0582ca6ada0
[  311.236134] RDX: 0000000000000001 RSI: ffffffffb112e060 RDI: ffff9a807f4bc8e0
[  311.236135] RBP: ffffb6364018fe20 R08: 00000048770ca8ec R09: 0000000000000001
[  311.236135] R10: ffffffffb0e56080 R11: 0000000000000001 R12: 0000000000000001
[  311.236136] R13: ffffffffb112e060 R14: ffffffffb112e0e0 R15: 0000000000000001
[  311.236138]  ? ct_kernel_exit.constprop.0+0x79/0x90
[  311.236140]  ? arch_cpu_idle+0x9/0x10
[  311.236142]  default_enter_idle+0x22/0x2f
[  311.236144]  cpuidle_enter_state+0x88/0x430
[  311.236146]  cpuidle_enter+0x34/0x50
[  311.236150]  call_cpuidle+0x22/0x50
[  311.236151]  cpuidle_idle_call+0xd2/0x120
[  311.236154]  do_idle+0x77/0xd0
[  311.236156]  cpu_startup_entry+0x2c/0x30
[  311.236158]  start_secondary+0x117/0x140
[  311.236160]  common_startup_64+0x13e/0x141
[  311.236163]  </TASK>
[  311.236163] ---[ end trace 0000000000000000 ]---
[  311.236165] do_fuse_request_end: req=00000000b4182f02,
from_timer=1, req->timer.func=1
[  311.236166] ------------[ cut here ]------------
[  311.236167] refcount_t: underflow; use-after-free.
[  311.236174] WARNING: CPU: 26 PID: 0 at lib/refcount.c:28
refcount_warn_saturate+0xc2/0x110
[  311.236207] CPU: 26 PID: 0 Comm: swapper/26 Kdump: loaded Tainted:
G    B   W          6.10.0+ #8
[  311.236209] RIP: 0010:refcount_warn_saturate+0xc2/0x110
[  311.236211] Code: 01 e8 d2 72 a6 ff 0f 0b 5d c3 cc cc cc cc 80 3d
33 d4 b1 01 00 75 81 48 c7 c7 30 69 5e b0 c6 05 23 d4 b1 01 01 e8 ae
72 a6 ff <0f> 0b 5d c3 cc cc cc cc 80 3d 0d d4 b1 01 00 0f 85 59 ff ff
ff 48
[  311.236212] RSP: 0018:ffffb6364056cdf8 EFLAGS: 00010286
[  311.236213] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000027
[  311.236214] RDX: ffff9a807f4a0848 RSI: 0000000000000001 RDI: ffff9a807f4a0840
[  311.236215] RBP: ffffb6364056cdf8 R08: 0000000000000000 R09: ffffb6364056cc78
[  311.236216] R10: ffffb6364056cc70 R11: ffffffffb1017ee8 R12: ffff9a8072d3a000
[  311.236217] R13: ffff9a420d5af000 R14: ffff9a42426a6ec0 R15: ffff9a8072d3a010
[  311.236219] FS:  0000000000000000(0000) GS:ffff9a807f480000(0000)
knlGS:0000000000000000
[  311.236221] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  311.236222] CR2: 000000c000dc5000 CR3: 000000010cc38003 CR4: 0000000000370ef0
[  311.236223] Call Trace:
[  311.236223]  <IRQ>
[  311.236224]  ? show_regs+0x69/0x80
[  311.236233]  ? __warn+0x88/0x130
[  311.236235]  ? refcount_warn_saturate+0xc2/0x110
[  311.236236]  ? report_bug+0x18f/0x1a0
[  311.236238]  ? handle_bug+0x40/0x70
[  311.236240]  ? exc_invalid_op+0x19/0x70
[  311.236242]  ? asm_exc_invalid_op+0x1b/0x20
[  311.236244]  ? refcount_warn_saturate+0xc2/0x110
[  311.236246]  ? refcount_warn_saturate+0xc2/0x110
[  311.236247]  fuse_put_request+0xc6/0xf0 [fuse]
[  311.236253]  do_fuse_request_end+0xcc/0x1e0 [fuse]
[  311.236258]  fuse_request_timeout+0xac/0x1a0 [fuse]
[  311.236263]  ? __pfx_fuse_request_timeout+0x10/0x10 [fuse]
[  311.236267]  call_timer_fn+0x2c/0x130
[  311.236269]  ? __pfx_fuse_request_timeout+0x10/0x10 [fuse]
[  311.236274]  __run_timers+0x2c2/0x360
[  311.236275]  ? kvm_sched_clock_read+0x11/0x20
[  311.236277]  ? sched_clock_noinstr+0x9/0x10
[  311.236278]  ? sched_clock+0x10/0x30
[  311.236280]  ? sched_clock_cpu+0x10/0x190
[  311.236281]  run_timer_softirq+0x3a/0x60
[  311.236283]  handle_softirqs+0x118/0x350
[  311.236285]  irq_exit_rcu+0x60/0x80
[  311.236286]  sysvec_apic_timer_interrupt+0x7f/0x90
[  311.236288]  </IRQ>
[  311.236288]  <TASK>
[  311.236289]  asm_sysvec_apic_timer_interrupt+0x1b/0x20
[  311.236291] RIP: 0010:default_idle+0xb/0x20
[  311.236293] Code: 00 4d 29 c8 4c 01 c7 4c 29 c2 e9 6e ff ff ff 90
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 eb 07 0f 00 2d b3 51 33
00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40
00 90
[  311.236294] RSP: 0018:ffffb6364018fe18 EFLAGS: 00000246
[  311.236295] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 0001c0582ca6ada0
[  311.236296] RDX: 0000000000000001 RSI: ffffffffb112e060 RDI: ffff9a807f4bc8e0
[  311.236297] RBP: ffffb6364018fe20 R08: 00000048770ca8ec R09: 0000000000000001
[  311.236298] R10: ffffffffb0e56080 R11: 0000000000000001 R12: 0000000000000001
[  311.236299] R13: ffffffffb112e060 R14: ffffffffb112e0e0 R15: 0000000000000001
[  311.236300]  ? ct_kernel_exit.constprop.0+0x79/0x90
[  311.236302]  ? arch_cpu_idle+0x9/0x10
[  311.236304]  default_enter_idle+0x22/0x2f
[  311.236306]  cpuidle_enter_state+0x88/0x430
[  311.236308]  cpuidle_enter+0x34/0x50
[  311.236310]  call_cpuidle+0x22/0x50
[  311.236311]  cpuidle_idle_call+0xd2/0x120
[  311.236313]  do_idle+0x77/0xd0
[  311.236315]  cpu_startup_entry+0x2c/0x30
[  311.236317]  start_secondary+0x117/0x140
[  311.236318]  common_startup_64+0x13e/0x141
[  311.236320]  </TASK>
[  311.236321] ---[ end trace 0000000000000000 ]---

I wish I could provide you with a clear explanation of what happened
in my test environment, but I haven't had the time to delve into the
details yet.


--
Regards
Yafang

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/2] fuse: add optional kernel-enforced timeout for requests
  2024-07-30  0:23 ` [PATCH v2 1/2] fuse: add optional kernel-enforced timeout " Joanne Koong
@ 2024-08-04 22:46   ` Bernd Schubert
  2024-08-05  4:45     ` Joanne Koong
  2024-08-05  4:52   ` Joanne Koong
  2024-08-05  7:32   ` Jingbo Xu
  2 siblings, 1 reply; 34+ messages in thread
From: Bernd Schubert @ 2024-08-04 22:46 UTC (permalink / raw)
  To: Joanne Koong, miklos, linux-fsdevel; +Cc: josef, laoar.shao, kernel-team



On 7/30/24 02:23, Joanne Koong wrote:
> There are situations where fuse servers can become unresponsive or take
> too long to reply to a request. Currently there is no upper bound on
> how long a request may take, which may be frustrating to users who get
> stuck waiting for a request to complete.
> 
> This commit adds a timeout option (in seconds) for requests. If the
> timeout elapses before the server replies to the request, the request
> will fail with -ETIME.
> 
> There are 3 possibilities for a request that times out:
> a) The request times out before the request has been sent to userspace
> b) The request times out after the request has been sent to userspace
> and before it receives a reply from the server
> c) The request times out after the request has been sent to userspace
> and the server replies while the kernel is timing out the request
> 
> While a request timeout is being handled, there may be other handlers
> running at the same time if:
> a) the kernel is forwarding the request to the server
> b) the kernel is processing the server's reply to the request
> c) the request is being re-sent
> d) the connection is aborting
> e) the device is getting released
> 
> Proper synchronization must be added to ensure that the request is
> handled correctly in all of these cases. To this effect, there is a new
> FR_FINISHING bit added to the request flags, which is set atomically by
> either the timeout handler (see fuse_request_timeout()) which is invoked
> after the request timeout elapses or set by the request reply handler
> (see dev_do_write()), whichever gets there first. If the reply handler
> and the timeout handler are executing simultaneously and the reply handler
> sets FR_FINISHING before the timeout handler, then the request will be
> handled as if the timeout did not elapse. If the timeout handler sets
> FR_FINISHING before the reply handler, then the request will fail with
> -ETIME and the request will be cleaned up.
> 
> Currently, this is the refcount lifecycle of a request:
> 
> Synchronous request is created:
> fuse_simple_request -> allocates request, sets refcount to 1
>   __fuse_request_send -> acquires refcount
>     queues request and waits for reply...
> fuse_simple_request -> drops refcount
> 
> Background request is created:
> fuse_simple_background -> allocates request, sets refcount to 1
> 
> Request is replied to:
> fuse_dev_do_write
>   fuse_request_end -> drops refcount on request
> 
> Proper acquires on the request reference must be added to ensure that the
> timeout handler does not drop the last refcount on the request while
> other handlers may be operating on the request. Please note that the
> timeout handler may get invoked at any phase of the request's
> lifetime (eg before the request has been forwarded to userspace, etc).
> 
> It is always guaranteed that there is a refcount on the request when the
> timeout handler is executing. The timeout handler will be either
> deactivated by the reply/abort/release handlers, or if the timeout
> handler is concurrently executing on another CPU, the reply/abort/release
> handlers will wait for the timeout handler to finish executing first before
> it drops the final refcount on the request.
> 
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
>  fs/fuse/dev.c    | 187 +++++++++++++++++++++++++++++++++++++++++++++--
>  fs/fuse/fuse_i.h |  14 ++++
>  fs/fuse/inode.c  |   7 ++
>  3 files changed, 200 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index 9eb191b5c4de..9992bc5f4469 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -31,6 +31,8 @@ MODULE_ALIAS("devname:fuse");
>  
>  static struct kmem_cache *fuse_req_cachep;
>  
> +static void fuse_request_timeout(struct timer_list *timer);
> +
>  static struct fuse_dev *fuse_get_dev(struct file *file)
>  {
>  	/*
> @@ -48,6 +50,8 @@ static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
>  	refcount_set(&req->count, 1);
>  	__set_bit(FR_PENDING, &req->flags);
>  	req->fm = fm;
> +	if (fm->fc->req_timeout)
> +		timer_setup(&req->timer, fuse_request_timeout, 0);
>  }
>  
>  static struct fuse_req *fuse_request_alloc(struct fuse_mount *fm, gfp_t flags)
> @@ -277,12 +281,15 @@ static void flush_bg_queue(struct fuse_conn *fc)
>   * the 'end' callback is called if given, else the reference to the
>   * request is released
>   */
> -void fuse_request_end(struct fuse_req *req)
> +static void do_fuse_request_end(struct fuse_req *req, bool from_timer_callback)
>  {
>  	struct fuse_mount *fm = req->fm;
>  	struct fuse_conn *fc = fm->fc;
>  	struct fuse_iqueue *fiq = &fc->iq;
>  
> +	if (from_timer_callback)
> +		req->out.h.error = -ETIME;
> +
>  	if (test_and_set_bit(FR_FINISHED, &req->flags))
>  		goto put_request;
>  
> @@ -296,8 +303,6 @@ void fuse_request_end(struct fuse_req *req)
>  		list_del_init(&req->intr_entry);
>  		spin_unlock(&fiq->lock);
>  	}
> -	WARN_ON(test_bit(FR_PENDING, &req->flags));
> -	WARN_ON(test_bit(FR_SENT, &req->flags));
>  	if (test_bit(FR_BACKGROUND, &req->flags)) {
>  		spin_lock(&fc->bg_lock);
>  		clear_bit(FR_BACKGROUND, &req->flags);
> @@ -324,13 +329,105 @@ void fuse_request_end(struct fuse_req *req)
>  		wake_up(&req->waitq);
>  	}
>  
> +	if (!from_timer_callback && req->timer.function)
> +		timer_delete_sync(&req->timer);
> +
>  	if (test_bit(FR_ASYNC, &req->flags))
>  		req->args->end(fm, req->args, req->out.h.error);
>  put_request:
>  	fuse_put_request(req);
>  }
> +
> +void fuse_request_end(struct fuse_req *req)
> +{
> +	WARN_ON(test_bit(FR_PENDING, &req->flags));
> +	WARN_ON(test_bit(FR_SENT, &req->flags));
> +
> +	do_fuse_request_end(req, false);
> +}
>  EXPORT_SYMBOL_GPL(fuse_request_end);
>  
> +static void timeout_inflight_req(struct fuse_req *req)
> +{
> +	struct fuse_conn *fc = req->fm->fc;
> +	struct fuse_iqueue *fiq = &fc->iq;
> +	struct fuse_pqueue *fpq;
> +
> +	spin_lock(&fiq->lock);
> +	fpq = req->fpq;
> +	spin_unlock(&fiq->lock);
> +
> +	/*
> +	 * If fpq has not been set yet, then the request is aborting (which
> +	 * clears FR_PENDING flag) before dev_do_read (which sets req->fpq)
> +	 * has been called. Let the abort handler handle this request.
> +	 */
> +	if (!fpq)
> +		return;
> +
> +	spin_lock(&fpq->lock);
> +	if (!fpq->connected || req->out.h.error == -ECONNABORTED) {
> +		/*
> +		 * Connection is being aborted or the fuse_dev is being released.
> +		 * The abort / release will clean up the request
> +		 */
> +		spin_unlock(&fpq->lock);
> +		return;
> +	}
> +
> +	if (!test_bit(FR_PRIVATE, &req->flags))
> +		list_del_init(&req->list);
> +
> +	spin_unlock(&fpq->lock);
> +
> +	do_fuse_request_end(req, true);
> +}
> +
> +static void timeout_pending_req(struct fuse_req *req)
> +{
> +	struct fuse_conn *fc = req->fm->fc;
> +	struct fuse_iqueue *fiq = &fc->iq;
> +	bool background = test_bit(FR_BACKGROUND, &req->flags);
> +
> +	if (background)
> +		spin_lock(&fc->bg_lock);
> +	spin_lock(&fiq->lock);
> +
> +	if (!test_bit(FR_PENDING, &req->flags)) {
> +		spin_unlock(&fiq->lock);
> +		if (background)
> +			spin_unlock(&fc->bg_lock);
> +		timeout_inflight_req(req);
> +		return;
> +	}
> +
> +	if (!test_bit(FR_PRIVATE, &req->flags))
> +		list_del_init(&req->list);
> +
> +	spin_unlock(&fiq->lock);
> +	if (background)
> +		spin_unlock(&fc->bg_lock);
> +
> +	do_fuse_request_end(req, true);
> +}
> +
> +static void fuse_request_timeout(struct timer_list *timer)
> +{
> +	struct fuse_req *req = container_of(timer, struct fuse_req, timer);

Let's say the timeout thread races with the thread that does
fuse_dev_do_write() and that thread is much faster and already calls :

fuse_dev_do_write():
	fuse_request_end(req);
	fuse_put_request(req);
out:
	return err ? err : nbytes;


(What I mean is that the timeout triggered, but did not reach
FR_FINISHING yet and at the same time another thread on another core
calls fuse_dev_do_write()).

> +
> +	/*
> +	 * Request reply is being finished by the kernel right now.
> +	 * No need to time out the request.
> +	 */
> +	if (test_and_set_bit(FR_FINISHING, &req->flags))
> +		return;

Wouldn't that trigger an UAF when the fuse_dev_do_write() was proceding
much faster and already released the request?

> +
> +	if (test_bit(FR_PENDING, &req->flags))
> +		timeout_pending_req(req);
> +	else
> +		timeout_inflight_req(req);
> +}
> +
>  static int queue_interrupt(struct fuse_req *req)
>  {
>  	struct fuse_iqueue *fiq = &req->fm->fc->iq;
> @@ -409,7 +506,8 @@ static void request_wait_answer(struct fuse_req *req)
>  
>  static void __fuse_request_send(struct fuse_req *req)
>  {
> -	struct fuse_iqueue *fiq = &req->fm->fc->iq;
> +	struct fuse_conn *fc = req->fm->fc;
> +	struct fuse_iqueue *fiq = &fc->iq;
>  
>  	BUG_ON(test_bit(FR_BACKGROUND, &req->flags));
>  	spin_lock(&fiq->lock);
> @@ -421,6 +519,10 @@ static void __fuse_request_send(struct fuse_req *req)
>  		/* acquire extra reference, since request is still needed
>  		   after fuse_request_end() */
>  		__fuse_get_request(req);
> +		if (req->timer.function) {
> +			req->timer.expires = jiffies + fc->req_timeout;
> +			add_timer(&req->timer);
> +		}

Does this leave a chance to put in a timeout of 0, if someone first sets
 fc->req_timeout and then sets it back to 0?


(I'm going to continue reviewing tomorrow, gets very late here).


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/2] fuse: add optional kernel-enforced timeout for requests
  2024-08-04 22:46   ` Bernd Schubert
@ 2024-08-05  4:45     ` Joanne Koong
  2024-08-05 13:05       ` Bernd Schubert
  0 siblings, 1 reply; 34+ messages in thread
From: Joanne Koong @ 2024-08-05  4:45 UTC (permalink / raw)
  To: Bernd Schubert; +Cc: miklos, linux-fsdevel, josef, laoar.shao, kernel-team

On Sun, Aug 4, 2024 at 3:46 PM Bernd Schubert
<bernd.schubert@fastmail.fm> wrote:
>
>
>
> On 7/30/24 02:23, Joanne Koong wrote:
> > There are situations where fuse servers can become unresponsive or take
> > too long to reply to a request. Currently there is no upper bound on
> > how long a request may take, which may be frustrating to users who get
> > stuck waiting for a request to complete.
> >
> > This commit adds a timeout option (in seconds) for requests. If the
> > timeout elapses before the server replies to the request, the request
> > will fail with -ETIME.
> >
> > There are 3 possibilities for a request that times out:
> > a) The request times out before the request has been sent to userspace
> > b) The request times out after the request has been sent to userspace
> > and before it receives a reply from the server
> > c) The request times out after the request has been sent to userspace
> > and the server replies while the kernel is timing out the request
> >
> > While a request timeout is being handled, there may be other handlers
> > running at the same time if:
> > a) the kernel is forwarding the request to the server
> > b) the kernel is processing the server's reply to the request
> > c) the request is being re-sent
> > d) the connection is aborting
> > e) the device is getting released
> >
> > Proper synchronization must be added to ensure that the request is
> > handled correctly in all of these cases. To this effect, there is a new
> > FR_FINISHING bit added to the request flags, which is set atomically by
> > either the timeout handler (see fuse_request_timeout()) which is invoked
> > after the request timeout elapses or set by the request reply handler
> > (see dev_do_write()), whichever gets there first. If the reply handler
> > and the timeout handler are executing simultaneously and the reply handler
> > sets FR_FINISHING before the timeout handler, then the request will be
> > handled as if the timeout did not elapse. If the timeout handler sets
> > FR_FINISHING before the reply handler, then the request will fail with
> > -ETIME and the request will be cleaned up.
> >
> > Currently, this is the refcount lifecycle of a request:
> >
> > Synchronous request is created:
> > fuse_simple_request -> allocates request, sets refcount to 1
> >   __fuse_request_send -> acquires refcount
> >     queues request and waits for reply...
> > fuse_simple_request -> drops refcount
> >
> > Background request is created:
> > fuse_simple_background -> allocates request, sets refcount to 1
> >
> > Request is replied to:
> > fuse_dev_do_write
> >   fuse_request_end -> drops refcount on request
> >
> > Proper acquires on the request reference must be added to ensure that the
> > timeout handler does not drop the last refcount on the request while
> > other handlers may be operating on the request. Please note that the
> > timeout handler may get invoked at any phase of the request's
> > lifetime (eg before the request has been forwarded to userspace, etc).
> >
> > It is always guaranteed that there is a refcount on the request when the
> > timeout handler is executing. The timeout handler will be either
> > deactivated by the reply/abort/release handlers, or if the timeout
> > handler is concurrently executing on another CPU, the reply/abort/release
> > handlers will wait for the timeout handler to finish executing first before
> > it drops the final refcount on the request.
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > ---
> >  fs/fuse/dev.c    | 187 +++++++++++++++++++++++++++++++++++++++++++++--
> >  fs/fuse/fuse_i.h |  14 ++++
> >  fs/fuse/inode.c  |   7 ++
> >  3 files changed, 200 insertions(+), 8 deletions(-)
> >
> > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> > index 9eb191b5c4de..9992bc5f4469 100644
> > --- a/fs/fuse/dev.c
> > +++ b/fs/fuse/dev.c
> > @@ -31,6 +31,8 @@ MODULE_ALIAS("devname:fuse");
> >
> >  static struct kmem_cache *fuse_req_cachep;
> >
> > +static void fuse_request_timeout(struct timer_list *timer);
> > +
> >  static struct fuse_dev *fuse_get_dev(struct file *file)
> >  {
> >       /*
> > @@ -48,6 +50,8 @@ static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
> >       refcount_set(&req->count, 1);
> >       __set_bit(FR_PENDING, &req->flags);
> >       req->fm = fm;
> > +     if (fm->fc->req_timeout)
> > +             timer_setup(&req->timer, fuse_request_timeout, 0);
> >  }
> >
> >  static struct fuse_req *fuse_request_alloc(struct fuse_mount *fm, gfp_t flags)
> > @@ -277,12 +281,15 @@ static void flush_bg_queue(struct fuse_conn *fc)
> >   * the 'end' callback is called if given, else the reference to the
> >   * request is released
> >   */
> > -void fuse_request_end(struct fuse_req *req)
> > +static void do_fuse_request_end(struct fuse_req *req, bool from_timer_callback)
> >  {
> >       struct fuse_mount *fm = req->fm;
> >       struct fuse_conn *fc = fm->fc;
> >       struct fuse_iqueue *fiq = &fc->iq;
> >
> > +     if (from_timer_callback)
> > +             req->out.h.error = -ETIME;
> > +
> >       if (test_and_set_bit(FR_FINISHED, &req->flags))
> >               goto put_request;
> >
> > @@ -296,8 +303,6 @@ void fuse_request_end(struct fuse_req *req)
> >               list_del_init(&req->intr_entry);
> >               spin_unlock(&fiq->lock);
> >       }
> > -     WARN_ON(test_bit(FR_PENDING, &req->flags));
> > -     WARN_ON(test_bit(FR_SENT, &req->flags));
> >       if (test_bit(FR_BACKGROUND, &req->flags)) {
> >               spin_lock(&fc->bg_lock);
> >               clear_bit(FR_BACKGROUND, &req->flags);
> > @@ -324,13 +329,105 @@ void fuse_request_end(struct fuse_req *req)
> >               wake_up(&req->waitq);
> >       }
> >
> > +     if (!from_timer_callback && req->timer.function)
> > +             timer_delete_sync(&req->timer);
> > +
> >       if (test_bit(FR_ASYNC, &req->flags))
> >               req->args->end(fm, req->args, req->out.h.error);
> >  put_request:
> >       fuse_put_request(req);
> >  }
> > +
> > +void fuse_request_end(struct fuse_req *req)
> > +{
> > +     WARN_ON(test_bit(FR_PENDING, &req->flags));
> > +     WARN_ON(test_bit(FR_SENT, &req->flags));
> > +
> > +     do_fuse_request_end(req, false);
> > +}
> >  EXPORT_SYMBOL_GPL(fuse_request_end);
> >
> > +static void timeout_inflight_req(struct fuse_req *req)
> > +{
> > +     struct fuse_conn *fc = req->fm->fc;
> > +     struct fuse_iqueue *fiq = &fc->iq;
> > +     struct fuse_pqueue *fpq;
> > +
> > +     spin_lock(&fiq->lock);
> > +     fpq = req->fpq;
> > +     spin_unlock(&fiq->lock);
> > +
> > +     /*
> > +      * If fpq has not been set yet, then the request is aborting (which
> > +      * clears FR_PENDING flag) before dev_do_read (which sets req->fpq)
> > +      * has been called. Let the abort handler handle this request.
> > +      */
> > +     if (!fpq)
> > +             return;
> > +
> > +     spin_lock(&fpq->lock);
> > +     if (!fpq->connected || req->out.h.error == -ECONNABORTED) {
> > +             /*
> > +              * Connection is being aborted or the fuse_dev is being released.
> > +              * The abort / release will clean up the request
> > +              */
> > +             spin_unlock(&fpq->lock);
> > +             return;
> > +     }
> > +
> > +     if (!test_bit(FR_PRIVATE, &req->flags))
> > +             list_del_init(&req->list);
> > +
> > +     spin_unlock(&fpq->lock);
> > +
> > +     do_fuse_request_end(req, true);
> > +}
> > +
> > +static void timeout_pending_req(struct fuse_req *req)
> > +{
> > +     struct fuse_conn *fc = req->fm->fc;
> > +     struct fuse_iqueue *fiq = &fc->iq;
> > +     bool background = test_bit(FR_BACKGROUND, &req->flags);
> > +
> > +     if (background)
> > +             spin_lock(&fc->bg_lock);
> > +     spin_lock(&fiq->lock);
> > +
> > +     if (!test_bit(FR_PENDING, &req->flags)) {
> > +             spin_unlock(&fiq->lock);
> > +             if (background)
> > +                     spin_unlock(&fc->bg_lock);
> > +             timeout_inflight_req(req);
> > +             return;
> > +     }
> > +
> > +     if (!test_bit(FR_PRIVATE, &req->flags))
> > +             list_del_init(&req->list);
> > +
> > +     spin_unlock(&fiq->lock);
> > +     if (background)
> > +             spin_unlock(&fc->bg_lock);
> > +
> > +     do_fuse_request_end(req, true);
> > +}
> > +
> > +static void fuse_request_timeout(struct timer_list *timer)
> > +{
> > +     struct fuse_req *req = container_of(timer, struct fuse_req, timer);
>
> Let's say the timeout thread races with the thread that does
> fuse_dev_do_write() and that thread is much faster and already calls :
>
> fuse_dev_do_write():
>         fuse_request_end(req);
>         fuse_put_request(req);
> out:
>         return err ? err : nbytes;
>
>
> (What I mean is that the timeout triggered, but did not reach
> FR_FINISHING yet and at the same time another thread on another core
> calls fuse_dev_do_write()).
>
> > +
> > +     /*
> > +      * Request reply is being finished by the kernel right now.
> > +      * No need to time out the request.
> > +      */
> > +     if (test_and_set_bit(FR_FINISHING, &req->flags))
> > +             return;
>
> Wouldn't that trigger an UAF when the fuse_dev_do_write() was proceding
> much faster and already released the request?

I don't believe so. In fuse_dev_do_write(), the call to
fuse_request_end() will call timer_delete_sync(), which will either
cancel the timer or wait for the timer to finish running if it's
concurrently running on another CPU.
>
> > +
> > +     if (test_bit(FR_PENDING, &req->flags))
> > +             timeout_pending_req(req);
> > +     else
> > +             timeout_inflight_req(req);
> > +}
> > +
> >  static int queue_interrupt(struct fuse_req *req)
> >  {
> >       struct fuse_iqueue *fiq = &req->fm->fc->iq;
> > @@ -409,7 +506,8 @@ static void request_wait_answer(struct fuse_req *req)
> >
> >  static void __fuse_request_send(struct fuse_req *req)
> >  {
> > -     struct fuse_iqueue *fiq = &req->fm->fc->iq;
> > +     struct fuse_conn *fc = req->fm->fc;
> > +     struct fuse_iqueue *fiq = &fc->iq;
> >
> >       BUG_ON(test_bit(FR_BACKGROUND, &req->flags));
> >       spin_lock(&fiq->lock);
> > @@ -421,6 +519,10 @@ static void __fuse_request_send(struct fuse_req *req)
> >               /* acquire extra reference, since request is still needed
> >                  after fuse_request_end() */
> >               __fuse_get_request(req);
> > +             if (req->timer.function) {
> > +                     req->timer.expires = jiffies + fc->req_timeout;
> > +                     add_timer(&req->timer);
> > +             }
>
> Does this leave a chance to put in a timeout of 0, if someone first sets
>  fc->req_timeout and then sets it back to 0?

I don't think so. The req_timeout is per connection and specified at
mount time. Once the fc->req_timeout is set for the connection it
can't be changed even if the default_req_timeout sysctl gets set to 0.

>
>
> (I'm going to continue reviewing tomorrow, gets very late here).

Thanks for reviewing.
>
>
> Thanks,
> Bernd

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/2] fuse: add optional kernel-enforced timeout for requests
  2024-07-30  0:23 ` [PATCH v2 1/2] fuse: add optional kernel-enforced timeout " Joanne Koong
  2024-08-04 22:46   ` Bernd Schubert
@ 2024-08-05  4:52   ` Joanne Koong
  2024-08-05 13:26     ` Bernd Schubert
  2024-08-05  7:32   ` Jingbo Xu
  2 siblings, 1 reply; 34+ messages in thread
From: Joanne Koong @ 2024-08-05  4:52 UTC (permalink / raw)
  To: miklos, linux-fsdevel; +Cc: josef, bernd.schubert, laoar.shao, kernel-team

On Mon, Jul 29, 2024 at 5:28 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> There are situations where fuse servers can become unresponsive or take
> too long to reply to a request. Currently there is no upper bound on
> how long a request may take, which may be frustrating to users who get
> stuck waiting for a request to complete.
>
> This commit adds a timeout option (in seconds) for requests. If the
> timeout elapses before the server replies to the request, the request
> will fail with -ETIME.
>
> There are 3 possibilities for a request that times out:
> a) The request times out before the request has been sent to userspace
> b) The request times out after the request has been sent to userspace
> and before it receives a reply from the server
> c) The request times out after the request has been sent to userspace
> and the server replies while the kernel is timing out the request
>
> While a request timeout is being handled, there may be other handlers
> running at the same time if:
> a) the kernel is forwarding the request to the server
> b) the kernel is processing the server's reply to the request
> c) the request is being re-sent
> d) the connection is aborting
> e) the device is getting released
>
> Proper synchronization must be added to ensure that the request is
> handled correctly in all of these cases. To this effect, there is a new
> FR_FINISHING bit added to the request flags, which is set atomically by
> either the timeout handler (see fuse_request_timeout()) which is invoked
> after the request timeout elapses or set by the request reply handler
> (see dev_do_write()), whichever gets there first. If the reply handler
> and the timeout handler are executing simultaneously and the reply handler
> sets FR_FINISHING before the timeout handler, then the request will be
> handled as if the timeout did not elapse. If the timeout handler sets
> FR_FINISHING before the reply handler, then the request will fail with
> -ETIME and the request will be cleaned up.
>
> Currently, this is the refcount lifecycle of a request:
>
> Synchronous request is created:
> fuse_simple_request -> allocates request, sets refcount to 1
>   __fuse_request_send -> acquires refcount
>     queues request and waits for reply...
> fuse_simple_request -> drops refcount
>
> Background request is created:
> fuse_simple_background -> allocates request, sets refcount to 1
>
> Request is replied to:
> fuse_dev_do_write
>   fuse_request_end -> drops refcount on request
>
> Proper acquires on the request reference must be added to ensure that the
> timeout handler does not drop the last refcount on the request while
> other handlers may be operating on the request. Please note that the
> timeout handler may get invoked at any phase of the request's
> lifetime (eg before the request has been forwarded to userspace, etc).
>
> It is always guaranteed that there is a refcount on the request when the
> timeout handler is executing. The timeout handler will be either
> deactivated by the reply/abort/release handlers, or if the timeout
> handler is concurrently executing on another CPU, the reply/abort/release
> handlers will wait for the timeout handler to finish executing first before
> it drops the final refcount on the request.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
>  fs/fuse/dev.c    | 187 +++++++++++++++++++++++++++++++++++++++++++++--
>  fs/fuse/fuse_i.h |  14 ++++
>  fs/fuse/inode.c  |   7 ++
>  3 files changed, 200 insertions(+), 8 deletions(-)
>
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index 9eb191b5c4de..9992bc5f4469 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -31,6 +31,8 @@ MODULE_ALIAS("devname:fuse");
>
>  static struct kmem_cache *fuse_req_cachep;
>
> +static void fuse_request_timeout(struct timer_list *timer);
> +
>  static struct fuse_dev *fuse_get_dev(struct file *file)
>  {
>         /*
> @@ -48,6 +50,8 @@ static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
>         refcount_set(&req->count, 1);
>         __set_bit(FR_PENDING, &req->flags);
>         req->fm = fm;
> +       if (fm->fc->req_timeout)
> +               timer_setup(&req->timer, fuse_request_timeout, 0);
>  }
>
>  static struct fuse_req *fuse_request_alloc(struct fuse_mount *fm, gfp_t flags)
> @@ -277,12 +281,15 @@ static void flush_bg_queue(struct fuse_conn *fc)
>   * the 'end' callback is called if given, else the reference to the
>   * request is released
>   */
> -void fuse_request_end(struct fuse_req *req)
> +static void do_fuse_request_end(struct fuse_req *req, bool from_timer_callback)
>  {
>         struct fuse_mount *fm = req->fm;
>         struct fuse_conn *fc = fm->fc;
>         struct fuse_iqueue *fiq = &fc->iq;
>
> +       if (from_timer_callback)
> +               req->out.h.error = -ETIME;
> +
>         if (test_and_set_bit(FR_FINISHED, &req->flags))
>                 goto put_request;
>
> @@ -296,8 +303,6 @@ void fuse_request_end(struct fuse_req *req)
>                 list_del_init(&req->intr_entry);
>                 spin_unlock(&fiq->lock);
>         }
> -       WARN_ON(test_bit(FR_PENDING, &req->flags));
> -       WARN_ON(test_bit(FR_SENT, &req->flags));
>         if (test_bit(FR_BACKGROUND, &req->flags)) {
>                 spin_lock(&fc->bg_lock);
>                 clear_bit(FR_BACKGROUND, &req->flags);
> @@ -324,13 +329,105 @@ void fuse_request_end(struct fuse_req *req)
>                 wake_up(&req->waitq);
>         }
>
> +       if (!from_timer_callback && req->timer.function)
> +               timer_delete_sync(&req->timer);
> +
>         if (test_bit(FR_ASYNC, &req->flags))
>                 req->args->end(fm, req->args, req->out.h.error);
>  put_request:
>         fuse_put_request(req);
>  }
> +
> +void fuse_request_end(struct fuse_req *req)
> +{
> +       WARN_ON(test_bit(FR_PENDING, &req->flags));
> +       WARN_ON(test_bit(FR_SENT, &req->flags));
> +
> +       do_fuse_request_end(req, false);
> +}
>  EXPORT_SYMBOL_GPL(fuse_request_end);
>
> +static void timeout_inflight_req(struct fuse_req *req)
> +{
> +       struct fuse_conn *fc = req->fm->fc;
> +       struct fuse_iqueue *fiq = &fc->iq;
> +       struct fuse_pqueue *fpq;
> +
> +       spin_lock(&fiq->lock);
> +       fpq = req->fpq;
> +       spin_unlock(&fiq->lock);
> +
> +       /*
> +        * If fpq has not been set yet, then the request is aborting (which
> +        * clears FR_PENDING flag) before dev_do_read (which sets req->fpq)
> +        * has been called. Let the abort handler handle this request.
> +        */
> +       if (!fpq)
> +               return;
> +
> +       spin_lock(&fpq->lock);
> +       if (!fpq->connected || req->out.h.error == -ECONNABORTED) {
> +               /*
> +                * Connection is being aborted or the fuse_dev is being released.
> +                * The abort / release will clean up the request
> +                */
> +               spin_unlock(&fpq->lock);
> +               return;
> +       }
> +
> +       if (!test_bit(FR_PRIVATE, &req->flags))
> +               list_del_init(&req->list);
> +
> +       spin_unlock(&fpq->lock);
> +
> +       do_fuse_request_end(req, true);
> +}
> +
> +static void timeout_pending_req(struct fuse_req *req)
> +{
> +       struct fuse_conn *fc = req->fm->fc;
> +       struct fuse_iqueue *fiq = &fc->iq;
> +       bool background = test_bit(FR_BACKGROUND, &req->flags);
> +
> +       if (background)
> +               spin_lock(&fc->bg_lock);
> +       spin_lock(&fiq->lock);
> +
> +       if (!test_bit(FR_PENDING, &req->flags)) {
> +               spin_unlock(&fiq->lock);
> +               if (background)
> +                       spin_unlock(&fc->bg_lock);
> +               timeout_inflight_req(req);
> +               return;
> +       }
> +
> +       if (!test_bit(FR_PRIVATE, &req->flags))
> +               list_del_init(&req->list);
> +
> +       spin_unlock(&fiq->lock);
> +       if (background)
> +               spin_unlock(&fc->bg_lock);
> +
> +       do_fuse_request_end(req, true);
> +}
> +
> +static void fuse_request_timeout(struct timer_list *timer)
> +{
> +       struct fuse_req *req = container_of(timer, struct fuse_req, timer);
> +
> +       /*
> +        * Request reply is being finished by the kernel right now.
> +        * No need to time out the request.
> +        */
> +       if (test_and_set_bit(FR_FINISHING, &req->flags))
> +               return;
> +
> +       if (test_bit(FR_PENDING, &req->flags))
> +               timeout_pending_req(req);
> +       else
> +               timeout_inflight_req(req);
> +}
> +
>  static int queue_interrupt(struct fuse_req *req)
>  {
>         struct fuse_iqueue *fiq = &req->fm->fc->iq;
> @@ -409,7 +506,8 @@ static void request_wait_answer(struct fuse_req *req)
>
>  static void __fuse_request_send(struct fuse_req *req)
>  {
> -       struct fuse_iqueue *fiq = &req->fm->fc->iq;
> +       struct fuse_conn *fc = req->fm->fc;
> +       struct fuse_iqueue *fiq = &fc->iq;
>
>         BUG_ON(test_bit(FR_BACKGROUND, &req->flags));
>         spin_lock(&fiq->lock);
> @@ -421,6 +519,10 @@ static void __fuse_request_send(struct fuse_req *req)
>                 /* acquire extra reference, since request is still needed
>                    after fuse_request_end() */
>                 __fuse_get_request(req);
> +               if (req->timer.function) {
> +                       req->timer.expires = jiffies + fc->req_timeout;
> +                       add_timer(&req->timer);
> +               }
>                 queue_request_and_unlock(fiq, req);
>
>                 request_wait_answer(req);
> @@ -539,6 +641,10 @@ static bool fuse_request_queue_background(struct fuse_req *req)
>                 if (fc->num_background == fc->max_background)
>                         fc->blocked = 1;
>                 list_add_tail(&req->list, &fc->bg_queue);
> +               if (req->timer.function) {
> +                       req->timer.expires = jiffies + fc->req_timeout;
> +                       add_timer(&req->timer);
> +               }
>                 flush_bg_queue(fc);
>                 queued = true;
>         }
> @@ -1268,6 +1374,9 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
>         req = list_entry(fiq->pending.next, struct fuse_req, list);
>         clear_bit(FR_PENDING, &req->flags);
>         list_del_init(&req->list);
> +       /* Acquire a reference in case the timeout handler starts executing */
> +       __fuse_get_request(req);
> +       req->fpq = fpq;
>         spin_unlock(&fiq->lock);
>
>         args = req->args;
> @@ -1280,6 +1389,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
>                 if (args->opcode == FUSE_SETXATTR)
>                         req->out.h.error = -E2BIG;
>                 fuse_request_end(req);
> +               fuse_put_request(req);
>                 goto restart;

While rereading through fuse_dev_do_read, I just realized we also need
to handle the race condition for the error edge cases (here and in the
"goto out_end;"), since the timeout handler could have finished
executing by the time we hit the error edge case. We need to
test_and_set_bit(FR_FINISHING) so that either the timeout_handler or
dev_do_read cleans up the request, but not both. I'll fix this for v3.

>         }
>         spin_lock(&fpq->lock);
> @@ -1316,13 +1426,33 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
>         }
>         hash = fuse_req_hash(req->in.h.unique);
>         list_move_tail(&req->list, &fpq->processing[hash]);
> -       __fuse_get_request(req);
>         set_bit(FR_SENT, &req->flags);
>         spin_unlock(&fpq->lock);
>         /* matches barrier in request_wait_answer() */
>         smp_mb__after_atomic();
>         if (test_bit(FR_INTERRUPTED, &req->flags))
>                 queue_interrupt(req);
> +
> +       /*
> +        * Check if the timeout handler is running / ran. If it did, we need to
> +        * remove the request from any lists in case the timeout handler finished
> +        * before dev_do_read moved the request to the processing list.
> +        *
> +        * Check FR_SENT to distinguish whether the timeout or the write handler
> +        * is finishing the request. However, there can be the case where the
> +        * timeout handler and resend handler are running concurrently, so we
> +        * need to also check the FR_PENDING bit.
> +        */
> +       if (test_bit(FR_FINISHING, &req->flags) &&
> +           (test_bit(FR_SENT, &req->flags) || test_bit(FR_PENDING, &req->flags))) {
> +               spin_lock(&fpq->lock);
> +               if (!test_bit(FR_PRIVATE, &req->flags))
> +                       list_del_init(&req->list);
> +               spin_unlock(&fpq->lock);
> +               fuse_put_request(req);
> +               return -ETIME;
> +       }
> +
>         fuse_put_request(req);
>
>         return reqsize;
> @@ -1332,6 +1462,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
>                 list_del_init(&req->list);
>         spin_unlock(&fpq->lock);
>         fuse_request_end(req);
> +       fuse_put_request(req);
>         return err;
>
>   err_unlock:
> @@ -1806,8 +1937,25 @@ static void fuse_resend(struct fuse_conn *fc)
>                 struct fuse_pqueue *fpq = &fud->pq;
>
>                 spin_lock(&fpq->lock);
> -               for (i = 0; i < FUSE_PQ_HASH_SIZE; i++)
> +               for (i = 0; i < FUSE_PQ_HASH_SIZE; i++) {
> +                       list_for_each_entry(req, &fpq->processing[i], list) {
> +                               /*
> +                                * We must acquire a reference here in case the timeout
> +                                * handler is running at the same time. Else the
> +                                * request might get freed out from under us
> +                                */
> +                               __fuse_get_request(req);
> +
> +                               /*
> +                                * While we have an acquired reference on the request,
> +                                * the request must remain on the list so that we
> +                                * can release the reference on it
> +                                */
> +                               set_bit(FR_PRIVATE, &req->flags);
> +                       }
> +
>                         list_splice_tail_init(&fpq->processing[i], &to_queue);
> +               }
>                 spin_unlock(&fpq->lock);
>         }
>         spin_unlock(&fc->lock);
> @@ -1820,6 +1968,12 @@ static void fuse_resend(struct fuse_conn *fc)
>         }
>
>         spin_lock(&fiq->lock);
> +       list_for_each_entry_safe(req, next, &to_queue, list) {
> +               if (test_bit(FR_FINISHING, &req->flags))
> +                       list_del_init(&req->list);
> +               clear_bit(FR_PRIVATE, &req->flags);
> +               fuse_put_request(req);
> +       }
>         /* iq and pq requests are both oldest to newest */
>         list_splice(&to_queue, &fiq->pending);
>         fiq->ops->wake_pending_and_unlock(fiq);
> @@ -1951,9 +2105,10 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
>                 goto copy_finish;
>         }
>
> +       __fuse_get_request(req);
> +
>         /* Is it an interrupt reply ID? */
>         if (oh.unique & FUSE_INT_REQ_BIT) {
> -               __fuse_get_request(req);
>                 spin_unlock(&fpq->lock);
>
>                 err = 0;
> @@ -1969,6 +2124,13 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
>                 goto copy_finish;
>         }
>
> +       if (test_and_set_bit(FR_FINISHING, &req->flags)) {
> +               /* timeout handler is already finishing the request */
> +               spin_unlock(&fpq->lock);
> +               fuse_put_request(req);
> +               goto copy_finish;
> +       }
> +
>         clear_bit(FR_SENT, &req->flags);
>         list_move(&req->list, &fpq->io);
>         req->out.h = oh;
> @@ -1995,6 +2157,7 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
>         spin_unlock(&fpq->lock);
>
>         fuse_request_end(req);
> +       fuse_put_request(req);
>  out:
>         return err ? err : nbytes;
>
> @@ -2260,13 +2423,21 @@ int fuse_dev_release(struct inode *inode, struct file *file)
>         if (fud) {
>                 struct fuse_conn *fc = fud->fc;
>                 struct fuse_pqueue *fpq = &fud->pq;
> +               struct fuse_req *req;
>                 LIST_HEAD(to_end);
>                 unsigned int i;
>
>                 spin_lock(&fpq->lock);
>                 WARN_ON(!list_empty(&fpq->io));
> -               for (i = 0; i < FUSE_PQ_HASH_SIZE; i++)
> +               for (i = 0; i < FUSE_PQ_HASH_SIZE; i++) {
> +                       /*
> +                        * Set the req error here so that the timeout
> +                        * handler knows it's being released
> +                        */
> +                       list_for_each_entry(req, &fpq->processing[i], list)
> +                               req->out.h.error = -ECONNABORTED;
>                         list_splice_init(&fpq->processing[i], &to_end);
> +               }
>                 spin_unlock(&fpq->lock);
>
>                 end_requests(&to_end);
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index f23919610313..2b616c5977b4 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -375,6 +375,8 @@ struct fuse_io_priv {
>   * FR_FINISHED:                request is finished
>   * FR_PRIVATE:         request is on private list
>   * FR_ASYNC:           request is asynchronous
> + * FR_FINISHING:       request is being finished, by either the timeout handler
> + *                     or the reply handler
>   */
>  enum fuse_req_flag {
>         FR_ISREPLY,
> @@ -389,6 +391,7 @@ enum fuse_req_flag {
>         FR_FINISHED,
>         FR_PRIVATE,
>         FR_ASYNC,
> +       FR_FINISHING,
>  };
>
>  /**
> @@ -435,6 +438,12 @@ struct fuse_req {
>
>         /** fuse_mount this request belongs to */
>         struct fuse_mount *fm;
> +
> +       /** page queue this request has been added to */
> +       struct fuse_pqueue *fpq;
> +
> +       /** optional timer for request replies, if timeout is enabled */
> +       struct timer_list timer;
>  };
>
>  struct fuse_iqueue;
> @@ -574,6 +583,8 @@ struct fuse_fs_context {
>         enum fuse_dax_mode dax_mode;
>         unsigned int max_read;
>         unsigned int blksize;
> +       /*  Request timeout (in seconds). 0 = no timeout (infinite wait) */
> +       unsigned int req_timeout;
>         const char *subtype;
>
>         /* DAX device, may be NULL */
> @@ -633,6 +644,9 @@ struct fuse_conn {
>         /** Constrain ->max_pages to this value during feature negotiation */
>         unsigned int max_pages_limit;
>
> +       /* Request timeout (in jiffies). 0 = no timeout (infinite wait) */
> +       unsigned long req_timeout;
> +
>         /** Input queue */
>         struct fuse_iqueue iq;
>
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index 99e44ea7d875..9e69006fc026 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -733,6 +733,7 @@ enum {
>         OPT_ALLOW_OTHER,
>         OPT_MAX_READ,
>         OPT_BLKSIZE,
> +       OPT_REQUEST_TIMEOUT,
>         OPT_ERR
>  };
>
> @@ -747,6 +748,7 @@ static const struct fs_parameter_spec fuse_fs_parameters[] = {
>         fsparam_u32     ("max_read",            OPT_MAX_READ),
>         fsparam_u32     ("blksize",             OPT_BLKSIZE),
>         fsparam_string  ("subtype",             OPT_SUBTYPE),
> +       fsparam_u32     ("request_timeout",     OPT_REQUEST_TIMEOUT),
>         {}
>  };
>
> @@ -830,6 +832,10 @@ static int fuse_parse_param(struct fs_context *fsc, struct fs_parameter *param)
>                 ctx->blksize = result.uint_32;
>                 break;
>
> +       case OPT_REQUEST_TIMEOUT:
> +               ctx->req_timeout = result.uint_32;
> +               break;
> +
>         default:
>                 return -EINVAL;
>         }
> @@ -1724,6 +1730,7 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
>         fc->group_id = ctx->group_id;
>         fc->legacy_opts_show = ctx->legacy_opts_show;
>         fc->max_read = max_t(unsigned int, 4096, ctx->max_read);
> +       fc->req_timeout = ctx->req_timeout * HZ;
>         fc->destroy = ctx->destroy;
>         fc->no_control = ctx->no_control;
>         fc->no_force_umount = ctx->no_force_umount;
> --
> 2.43.0
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 0/2] fuse: add timeout option for requests
  2024-08-04  7:46               ` Yafang Shao
@ 2024-08-05  5:05                 ` Joanne Koong
  2024-08-06 16:23                   ` Joanne Koong
  0 siblings, 1 reply; 34+ messages in thread
From: Joanne Koong @ 2024-08-05  5:05 UTC (permalink / raw)
  To: Yafang Shao; +Cc: miklos, linux-fsdevel, josef, bernd.schubert, kernel-team

On Sun, Aug 4, 2024 at 12:48 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Sat, Aug 3, 2024 at 3:05 AM Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > On Wed, Jul 31, 2024 at 7:47 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > >
> > > On Thu, Aug 1, 2024 at 2:46 AM Joanne Koong <joannelkoong@gmail.com> wrote:
> > > >
> > > > On Wed, Jul 31, 2024 at 10:52 AM Joanne Koong <joannelkoong@gmail.com> wrote:
> > > > >
> > > > > On Tue, Jul 30, 2024 at 7:14 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > > > >
> > > > > > On Wed, Jul 31, 2024 at 2:16 AM Joanne Koong <joannelkoong@gmail.com> wrote:
> > > > > > >
> > > > > > > On Mon, Jul 29, 2024 at 11:00 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > > > > > >
> > > > > > > > On Tue, Jul 30, 2024 at 8:28 AM Joanne Koong <joannelkoong@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > There are situations where fuse servers can become unresponsive or take
> > > > > > > > > too long to reply to a request. Currently there is no upper bound on
> > > > > > > > > how long a request may take, which may be frustrating to users who get
> > > > > > > > > stuck waiting for a request to complete.
> > > > > > > > >
> > > > > > > > > This patchset adds a timeout option for requests and two dynamically
> > > > > > > > > configurable fuse sysctls "default_request_timeout" and "max_request_timeout"
> > > > > > > > > for controlling/enforcing timeout behavior system-wide.
> > > > > > > > >
> > > > > > > > > Existing fuse servers will not be affected unless they explicitly opt into the
> > > > > > > > > timeout.
> > > > > > > > >
> > > > > > > > > v1: https://lore.kernel.org/linux-fsdevel/20240717213458.1613347-1-joannelkoong@gmail.com/
> > > > > > > > > Changes from v1:
> > > > > > > > > - Add timeout for background requests
> > > > > > > > > - Handle resend race condition
> > > > > > > > > - Add sysctls
> > > > > > > > >
> > > > > > > > > Joanne Koong (2):
> > > > > > > > >   fuse: add optional kernel-enforced timeout for requests
> > > > > > > > >   fuse: add default_request_timeout and max_request_timeout sysctls
> > > > > > > > >
> > > > > > > > >  Documentation/admin-guide/sysctl/fs.rst |  17 +++
> > > > > > > > >  fs/fuse/Makefile                        |   2 +-
> > > > > > > > >  fs/fuse/dev.c                           | 187 +++++++++++++++++++++++-
> > > > > > > > >  fs/fuse/fuse_i.h                        |  30 ++++
> > > > > > > > >  fs/fuse/inode.c                         |  24 +++
> > > > > > > > >  fs/fuse/sysctl.c                        |  42 ++++++
> > > > > > > > >  6 files changed, 293 insertions(+), 9 deletions(-)
> > > > > > > > >  create mode 100644 fs/fuse/sysctl.c
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > 2.43.0
> > > > > > > > >
> > > > > > > >
> > > > > > > > Hello Joanne,
> > > > > > > >
> > > > > > > > Thanks for your update.
> > > > > > > >
> > > > > > > > I have tested your patches using my test case, which is similar to the
> > > > > > > > hello-fuse [0] example, with an additional change as follows:
> > > > > > > >
> > > > > > > > @@ -125,6 +125,8 @@ static int hello_read(const char *path, char *buf,
> > > > > > > > size_t size, off_t offset,
> > > > > > > >         } else
> > > > > > > >                 size = 0;
> > > > > > > >
> > > > > > > > +       // TO trigger timeout
> > > > > > > > +       sleep(60);
> > > > > > > >         return size;
> > > > > > > >  }
> > > > > > > >
> > > > > > > > [0] https://github.com/libfuse/libfuse/blob/master/example/hello.c
> > > > > > > >
> > > > > > > > However, it triggered a crash with the following setup:
> > > > > > > >
> > > > > > > > 1. Set FUSE timeout:
> > > > > > > >   sysctl -w fs.fuse.default_request_timeout=10
> > > > > > > >   sysctl -w fs.fuse.max_request_timeout = 20
> > > > > > > >
> > > > > > > > 2. Start FUSE daemon:
> > > > > > > >   ./hello /tmp/fuse
> > > > > > > >
> > > > > > > > 3. Read from FUSE:
> > > > > > > >   cat /tmp/fuse/hello
> > > > > > > >
> > > > > > > > 4. Kill the process within 10 seconds (to avoid the timeout being triggered).
> > > > > > > >    Then the crash will be triggered.
> > > > > > >
> > > > > > > Hi Yafang,
> > > > > > >
> > > > > > > Thanks for trying this out on your use case!
> > > > > > >
> > > > > > > How consistently are you able to repro this?
> > > > > >
> > > > > > It triggers the crash every time.
> > > > > >
> > > > > > > I tried reproing using
> > > > > > > your instructions above but I'm not able to get the crash.
> > > > > >
> > > > > > Please note that it is the `cat /tmp/fuse/hello` process that was
> > > > > > killed, not the fuse daemon.
> > > > > > The crash seems to occur when the fuse daemon wakes up after
> > > > > > sleep(60). Please ensure that the fuse daemon can be woken up.
> > > > > >
> > > > >
> > > > > I'm still not able to trigger the crash by killing the `cat
> > > > > /tmp/fuse/hello` process. This is how I'm repro-ing
> > > > >
> > > > > 1) Add sleep to test code in
> > > > > https://github.com/libfuse/libfuse/blob/master/example/hello.c
> > > > > @@ -125,6 +126,9 @@ static int hello_read(const char *path, char *buf,
> > > > > size_t size, off_t offset,
> > > > >         } else
> > > > >                 size = 0;
> > > > >
> > > > > +       sleep(60);
> > > > > +       printf("hello_read woke up from sleep\n");
> > > > > +
> > > > >         return size;
> > > > >  }
> > > > >
> > > > > 2)  Set fuse timeout to 10 seconds
> > > > > sysctl -w fs.fuse.default_request_timeout=10
> > > > >
> > > > > 3) Start fuse daemon
> > > > > ./example/hello ./tmp/fuse
> > > > >
> > > > > 4) Read from fuse
> > > > > cat /tmp/fuse/hello
> > > > >
> > > > > 5) Get pid of cat process
> > > > > top -b | grep cat
> > > > >
> > > > > 6) Kill cat process (within 10 seconds)
> > > > >  sudo kill -9 <cat-pid>
> > > > >
> > > > > 7) Wait 60 seconds for fuse's read request to complete
> > > > >
> > > > > From what it sounds like, this is exactly what you are doing as well?
> > > > >
> > > > > I added some kernel-side logs and I'm seeing that the read request is
> > > > > timing out after ~10 seconds and handled by the timeout handler
> > > > > successfully.
> > > > >
> > > > > On the fuse daemon side, these are the logs I'm seeing from the above repro:
> > > > > ./example/hello /tmp/fuse -f -d
> > > > >
> > > > > FUSE library version: 3.17.0
> > > > > nullpath_ok: 0
> > > > > unique: 2, opcode: INIT (26), nodeid: 0, insize: 104, pid: 0
> > > > > INIT: 7.40
> > > > > flags=0x73fffffb
> > > > > max_readahead=0x00020000
> > > > >    INIT: 7.40
> > > > >    flags=0x4040f039
> > > > >    max_readahead=0x00020000
> > > > >    max_write=0x00100000
> > > > >    max_background=0
> > > > >    congestion_threshold=0
> > > > >    time_gran=1
> > > > >    unique: 2, success, outsize: 80
> > > > > unique: 4, opcode: LOOKUP (1), nodeid: 1, insize: 46, pid: 673
> > > > > LOOKUP /hello
> > > > > getattr[NULL] /hello
> > > > >    NODEID: 2
> > > > >    unique: 4, success, outsize: 144
> > > > > unique: 6, opcode: OPEN (14), nodeid: 2, insize: 48, pid: 673
> > > > > open flags: 0x8000 /hello
> > > > >    open[0] flags: 0x8000 /hello
> > > > >    unique: 6, success, outsize: 32
> > > > > unique: 8, opcode: READ (15), nodeid: 2, insize: 80, pid: 673
> > > > > read[0] 4096 bytes from 0 flags: 0x8000
> > > > > unique: 10, opcode: FLUSH (25), nodeid: 2, insize: 64, pid: 673
> > > > >    unique: 10, error: -38 (Function not implemented), outsize: 16
> > > > > unique: 11, opcode: INTERRUPT (36), nodeid: 0, insize: 48, pid: 0
> > > > > FUSE_INTERRUPT: reply to kernel to disable interrupt
> > > > >    unique: 11, error: -38 (Function not implemented), outsize: 16
> > > > >
> > > > > unique: 12, opcode: RELEASE (18), nodeid: 2, insize: 64, pid: 0
> > > > >    unique: 12, success, outsize: 16
> > > > >
> > > > > hello_read woke up from sleep
> > > > >    read[0] 13 bytes from 0
> > > > >    unique: 8, success, outsize: 29
> > > > >
> > > > >
> > > > > Are these the debug logs you are seeing from the daemon side as well?
> > > > >
> > > > > Thanks,
> > > > > Joanne
> > > > > > >
> > > > > > > From the crash logs you provided below, it looks like what's happening
> > > > > > > is that if the process gets killed, the timer isn't getting deleted.
> > > >
> > > > When I looked at this log previously, I thought you were repro-ing by
> > > > killing the fuse daemon process, not the cat process. When we kill the
> > > > cat process, the timer shouldn't be getting deleted. (if the daemon
> > > > itself is killed, the timers get deleted)
> > > >
> > > > > > > I'll look more into what happens in fuse when a process is killed and
> > > > > > > get back to you on this.
> > > >
> > > > This is the flow of what is happening on the kernel side (verified by
> > > > local printks) -
> > > >
> > > > `cat /tmp/fuse/hello`:
> > > > Issues a FUSE_READ background request (via fuse_send_readpages(),
> > > > fm->fc->async_read). This request will have a timeout of 10 seconds on
> > > > it
> > > >
> > > > The cat process is killed:
> > > > This does not clean up the request. The request is still on the fpq
> > > > processing list.
> > > >
> > > > Timeout on request expires:
> > > > The timeout handler runs and properly cleans up / frees the request.
> > > >
> > > > Fuse daemon wakes from sleep and replies to the request:
> > > > In dev_do_write(), the kernel won't be able to find this request
> > > > (since it timed out and was removed from the fpq processing list) and
> > > > return with -ENOENT
> > >
> > > Thank you for your explanation.
> > > I will verify if there are any issues with my test environment.
> > >
> > Hi Yafang,
> >
> > Would you mind adding these printks to your kernel when you run the
> > repro and pasting what they show?
> >
> > --- a/fs/fuse/dev.c
> > +++ b/fs/fuse/dev.c
> > @@ -287,6 +287,9 @@ static void do_fuse_request_end(struct fuse_req
> > *req, bool from_timer_callback)
> >         struct fuse_conn *fc = fm->fc;
> >         struct fuse_iqueue *fiq = &fc->iq;
> >
> > +       printk("do_fuse_request_end: req=%p, from_timer=%d,
> > req->timer.func=%d\n",
> > +              req, from_timer_callback, req->timer.function != NULL);
> > +
> >         if (from_timer_callback)
> >                 req->out.h.error = -ETIME;
> >
> > @@ -415,6 +418,8 @@ static void fuse_request_timeout(struct timer_list *timer)
> >  {
> >         struct fuse_req *req = container_of(timer, struct fuse_req, timer);
> >
> > +       printk("fuse_request_timeout: req=%p\n", req);
> > +
> >         /*
> >          * Request reply is being finished by the kernel right now.
> >          * No need to time out the request.
> > @@ -612,6 +617,7 @@ ssize_t fuse_simple_request(struct fuse_mount *fm,
> > struct fuse_args *args)
> >
> >         if (!args->noreply)
> >                 __set_bit(FR_ISREPLY, &req->flags);
> > +       printk("fuse_simple_request: req=%p, op=%u\n", req, args->opcode);
> >         __fuse_request_send(req);
> >         ret = req->out.h.error;
> >         if (!ret && args->out_argvar) {
> > @@ -673,6 +679,7 @@ int fuse_simple_background(struct fuse_mount *fm,
> > struct fuse_args *args,
> >
> >         fuse_args_to_req(req, args);
> >
> > +       printk("fuse_background_request: req=%p, op=%u\n", req, args->opcode);
> >         if (!fuse_request_queue_background(req)) {
> >                 fuse_put_request(req);
> >
> >
> > When I run it on my side, I see
> >
> > [   68.117740] fuse_background_request: req=00000000874e2f14, op=26
> > [   68.131440] do_fuse_request_end: req=00000000874e2f14,
> > from_timer=0, req->timer.func=1
> > [   71.558538] fuse_simple_request: req=00000000cf643ace, op=1
> > [   71.559651] do_fuse_request_end: req=00000000cf643ace,
> > from_timer=0, req->timer.func=1
> > [   71.561044] fuse_simple_request: req=00000000f2c001f0, op=14
> > [   71.562524] do_fuse_request_end: req=00000000f2c001f0,
> > from_timer=0, req->timer.func=1
> > [   71.563820] fuse_background_request: req=00000000584f2cc3, op=15
> > [   78.580035] fuse_simple_request: req=00000000ecbee970, op=25
> > [   78.582614] do_fuse_request_end: req=00000000ecbee970,
> > from_timer=0, req->timer.func=1
> > [   81.624722] fuse_request_timeout: req=00000000584f2cc3
> > [   81.625443] do_fuse_request_end: req=00000000584f2cc3,
> > from_timer=1, req->timer.func=1
> > [   81.626377] fuse_background_request: req=00000000b2d792ed, op=18
> > [   81.627623] do_fuse_request_end: req=00000000b2d792ed,
> > from_timer=0, req->timer.func=1
> >
> > I'm seeing only one timer get called, on the read request (opcode=15),
> > and I'm not seeing do_fuse_request_end having been called on that
> > request before the timer is invoked.
> > I'm curious to compare this against the logs on your end.
>
> The log on my side is as follows,
Thank you Yafang. These logs are very helpful.

>
> [  283.329421] fuse_background_request: req=000000002b4f82d4, op=26
> [  283.330043] do_fuse_request_end: req=000000002b4f82d4,
> from_timer=0, req->timer.func=0
> [  287.889844] fuse_simple_request: req=00000000865e85bf, op=3
> [  287.889914] do_fuse_request_end: req=00000000865e85bf,
> from_timer=0, req->timer.func=0
> [  287.889933] fuse_simple_request: req=00000000865e85bf, op=22
> [  287.889994] do_fuse_request_end: req=00000000865e85bf,
> from_timer=0, req->timer.func=0
> [  287.890096] fuse_simple_request: req=00000000865e85bf, op=27
> [  287.890130] do_fuse_request_end: req=00000000865e85bf,
> from_timer=0, req->timer.func=0
> [  287.890142] fuse_simple_request: req=00000000865e85bf, op=28
> [  287.890167] do_fuse_request_end: req=00000000865e85bf,
> from_timer=0, req->timer.func=0
> [  287.890178] fuse_simple_request: req=00000000865e85bf, op=1
> [  287.890191] do_fuse_request_end: req=00000000865e85bf,
> from_timer=0, req->timer.func=0
> [  287.890209] fuse_simple_request: req=00000000865e85bf, op=28
> [  287.890216] do_fuse_request_end: req=00000000865e85bf,
> from_timer=0, req->timer.func=0
> [  287.890222] fuse_background_request: req=00000000865e85bf, op=29
> [  287.890230] do_fuse_request_end: req=00000000865e85bf,
> from_timer=0, req->timer.func=0
> [  312.311752] fuse_background_request: req=00000000a8da8b44, op=26
> [  312.312249] do_fuse_request_end: req=00000000a8da8b44,
> from_timer=0, req->timer.func=1
> [  317.368786] fuse_simple_request: req=00000000bc4817dd, op=1
> [  317.368871] do_fuse_request_end: req=00000000bc4817dd,
> from_timer=0, req->timer.func=1
> [  317.368910] fuse_simple_request: req=00000000bc4817dd, op=14
> [  317.368942] do_fuse_request_end: req=00000000bc4817dd,
> from_timer=0, req->timer.func=1
> [  317.368967] fuse_simple_request: req=00000000bc4817dd, op=15
> [  327.855189] fuse_request_timeout: req=00000000bc4817dd
> [  327.855195] do_fuse_request_end: req=00000000bc4817dd,
> from_timer=1, req->timer.func=1
> [  327.855218] fuse_simple_request: req=00000000c34cc363, op=15
> [  327.855328] fuse_simple_request: req=00000000c34cc363, op=25
> [  327.855401] do_fuse_request_end: req=00000000c34cc363,
> from_timer=0, req->timer.func=1
> [  327.855496] fuse_background_request: req=00000000c34cc363, op=18
> [  327.855508] do_fuse_request_end: req=00000000c34cc363,
> from_timer=0, req->timer.func=1
> [  338.095136] Oops: general protection fault, probably for
> non-canonical address 0xdead00000000012a: 0000 [#1] PREEMPT SMP NOPTI
> [  338.096415] CPU: 58 PID: 0 Comm: swapper/58 Kdump: loaded Not
> tainted 6.10.0+ #8
> [  338.098219] RIP: 0010:__run_timers+0x27e/0x360
> [  338.098686] Code: 07 48 c7 43 08 00 00 00 00 48 85 c0 74 78 4d 8b
> 2f 4c 89 6b 08 0f 1f 44 00 00 49 8b 45 00 49 8b 55 08 48 89 02 48 85
> c0 74 04 <48> 89 50 08 4d 8b 65 18 49 c7 45 08 00 00 00 00 48 b8 22 01
> 00 00
> [  338.100381] RSP: 0018:ffffb4ef808bced8 EFLAGS: 00010086
> [  338.100907] RAX: dead000000000122 RBX: ffff9827ffca13c0 RCX: 0000000000000001
> [  338.101623] RDX: ffffb4ef808bcef8 RSI: 0000000000000000 RDI: ffff9827ffca13e8
> [  338.102333] RBP: ffffb4ef808bcf70 R08: 000000000000008b R09: ffff9827ffca1430
> [  338.103020] R10: ffffffff93e060c0 R11: 0000000000000089 R12: 0000000000000001
> [  338.103726] R13: ffff97e9dc06a0a0 R14: 0000000100009200 R15: ffffb4ef808bcef8
> [  338.104439] FS:  0000000000000000(0000) GS:ffff9827ffc80000(0000)
> knlGS:0000000000000000
> [  338.105229] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  338.105795] CR2: 000000c002f99340 CR3: 0000000148254001 CR4: 0000000000370ef0
> [  338.106502] Call Trace:
> [  338.106836]  <IRQ>
> [  338.107175]  ? show_regs+0x69/0x80
> [  338.107603]  ? die_addr+0x38/0x90
> [  338.108005]  ? exc_general_protection+0x236/0x490
> [  338.108557]  ? asm_exc_general_protection+0x27/0x30
> [  338.109095]  ? __run_timers+0x27e/0x360
> [  338.109563]  ? __run_timers+0x1b4/0x360
> [  338.110009]  ? kvm_sched_clock_read+0x11/0x20
> [  338.110528]  ? sched_clock_noinstr+0x9/0x10
> [  338.111002]  ? sched_clock+0x10/0x30
> [  338.111447]  ? sched_clock_cpu+0x10/0x190
> [  338.111914]  run_timer_softirq+0x3a/0x60
> [  338.112406]  handle_softirqs+0x118/0x350
> [  338.112859]  irq_exit_rcu+0x60/0x80
> [  338.113295]  sysvec_apic_timer_interrupt+0x7f/0x90
> [  338.113823]  </IRQ>
> [  338.114147]  <TASK>
> [  338.114447]  asm_sysvec_apic_timer_interrupt+0x1b/0x20
> [  338.115002] RIP: 0010:default_idle+0xb/0x20
> [  338.115498] Code: 00 4d 29 c8 4c 01 c7 4c 29 c2 e9 6e ff ff ff 90
> 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 eb 07 0f 00 2d b3 51 33
> 00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40
> 00 90
> [  338.117337] RSP: 0018:ffffb4ef8028fe18 EFLAGS: 00000246
> [  338.117894] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 0001b48ebb3a1032
> [  338.118673] RDX: 0000000000000001 RSI: ffffffff9412e060 RDI: ffff9827ffcbc8e0
> [  338.119415] RBP: ffffb4ef8028fe20 R08: 0000004eb7fb01b4 R09: 0000000000000001
> [  338.120151] R10: ffffffff93e56080 R11: 0000000000000001 R12: 0000000000000001
> [  338.120872] R13: ffffffff9412e060 R14: ffffffff9412e0e0 R15: 0000000000000001
> [  338.121615]  ? ct_kernel_exit.constprop.0+0x79/0x90
> [  338.122171]  ? arch_cpu_idle+0x9/0x10
> [  338.122602]  default_enter_idle+0x22/0x2f
> [  338.123064]  cpuidle_enter_state+0x88/0x430
> [  338.123556]  cpuidle_enter+0x34/0x50
> [  338.123978]  call_cpuidle+0x22/0x50
> [  338.124449]  cpuidle_idle_call+0xd2/0x120
> [  338.124909]  do_idle+0x77/0xd0
> [  338.125313]  cpu_startup_entry+0x2c/0x30
> [  338.125763]  start_secondary+0x117/0x140
> [  338.126240]  common_startup_64+0x13e/0x141
> [  338.126711]  </TASK>
>
> In addition to the hello-fuse, there is another FUSE daemon, lxcfs,
> running on my test server. After disabling lxcfs, the system no longer
> panics, but there are still error logs:
>
> [  285.804534] fuse_background_request: req=0000000063502a93, op=26
> [  285.805041] do_fuse_request_end: req=0000000063502a93,
> from_timer=0, req->timer.func=1
> [  290.967412] fuse_simple_request: req=000000003f362e4b, op=1
> [  290.967480] do_fuse_request_end: req=000000003f362e4b,
> from_timer=0, req->timer.func=1
> [  290.967517] fuse_simple_request: req=000000003f362e4b, op=14
> [  290.967585] do_fuse_request_end: req=000000003f362e4b,
> from_timer=0, req->timer.func=1
> [  290.967655] fuse_simple_request: req=000000003f362e4b, op=15
> [  300.996023] fuse_request_timeout: req=000000003f362e4b
> [  300.996030] do_fuse_request_end: req=000000003f362e4b,
> from_timer=1, req->timer.func=1
> [  300.996066] fuse_simple_request: req=00000000b4182f02, op=15
> [  300.996180] fuse_simple_request: req=000000003f362e4b, op=25
> [  300.996185] ==================================================================
> [  300.996980] BUG: KFENCE: use-after-free write in enqueue_timer+0x24/0xb0
>
> [  300.997788] Use-after-free write at 0x0000000022312cb7 (in kfence-#156):
> [  300.998476]  enqueue_timer+0x24/0xb0
> [  300.998479]  __mod_timer+0x23b/0x360
> [  300.998481]  add_timer+0x20/0x30
> [  300.998483]  fuse_simple_request+0x1bc/0x2f0 [fuse]
> [  300.998506]  fuse_flush+0x1ac/0x1f0 [fuse]
> [  300.998511]  filp_flush+0x39/0x90
> [  300.998517]  filp_close+0x15/0x30
> [  300.998519]  put_files_struct+0x77/0xe0
> [  300.998522]  exit_files+0x47/0x60
> [  300.998524]  do_exit+0x262/0x480
> [  300.998528]  do_group_exit+0x34/0x90
> [  300.998531]  get_signal+0x92f/0x980
> [  300.998534]  arch_do_signal_or_restart+0x2a/0x100
> [  300.998537]  syscall_exit_to_user_mode+0xe3/0x1a0
> [  300.998541]  do_syscall_64+0x71/0x170
> [  300.998545]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> [  300.998759] kfence-#156: 0x00000000b4182f02-0x0000000084fc5c46,
> size=200, cache=ip4-frags
>
> [  300.998761] allocated by task 15064 on cpu 26 at 300.996061s:
> [  300.998766]  fuse_request_alloc+0x21/0xb0 [fuse]
> [  300.998771]  fuse_get_req+0xde/0x270 [fuse]
> [  300.998775]  fuse_simple_request+0x33/0x2f0 [fuse]
> [  300.998779]  fuse_do_readpage+0x15e/0x200 [fuse]
> [  300.998783]  fuse_read_folio+0x29/0x60 [fuse]
> [  300.998787]  filemap_read_folio+0x3b/0xe0
> [  300.998791]  filemap_update_page+0x236/0x2d0
> [  300.998792]  filemap_get_pages+0x225/0x390
> [  300.998794]  filemap_read+0xed/0x3a0
> [  300.998796]  generic_file_read_iter+0xb8/0x100
> [  300.998798]  fuse_file_read_iter+0xd8/0x150 [fuse]
> [  300.998804]  vfs_read+0x25e/0x340
> [  300.998806]  ksys_read+0x67/0xf0
> [  300.998808]  __x64_sys_read+0x19/0x20
> [  300.998810]  x64_sys_call+0x1709/0x20b0
> [  300.998813]  do_syscall_64+0x65/0x170
> [  300.998815]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> [  300.998817] freed by task 15064 on cpu 26 at 300.996084s:
> [  300.998822]  fuse_put_request+0x89/0xf0 [fuse]
> [  300.998826]  fuse_simple_request+0xe1/0x2f0 [fuse]
> [  300.998830]  fuse_do_readpage+0x15e/0x200 [fuse]
> [  300.998835]  fuse_read_folio+0x29/0x60 [fuse]
> [  300.998839]  filemap_read_folio+0x3b/0xe0
> [  300.998840]  filemap_update_page+0x236/0x2d0
> [  300.998842]  filemap_get_pages+0x225/0x390
> [  300.998844]  filemap_read+0xed/0x3a0
> [  300.998846]  generic_file_read_iter+0xb8/0x100
> [  300.998848]  fuse_file_read_iter+0xd8/0x150 [fuse]
> [  300.998852]  vfs_read+0x25e/0x340
> [  300.998854]  ksys_read+0x67/0xf0
> [  300.998856]  __x64_sys_read+0x19/0x20
> [  300.998857]  x64_sys_call+0x1709/0x20b0
> [  300.998859]  do_syscall_64+0x65/0x170
> [  300.998860]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

This is very interesting. These logs (and the ones above with the
lxcfs server running concurrently) are showing that the read request
was freed but not through the do_fuse_request_end path. It's weird
that fuse_simple_request reached fuse_put_request without
do_fuse_request_end having been called (which is the only place where
FR_FINISHED gets set and wakes up the wait events in
request_wait_answer).

I'll take a deeper look tomorrow and try to make more sense of it.

>
> [  300.999115] CPU: 26 PID: 15064 Comm: cat Kdump: loaded Not tainted 6.10.0+ #8
> [  301.000803] ==================================================================
> [  301.001695] do_fuse_request_end: req=000000003f362e4b,
> from_timer=0, req->timer.func=1
> [  301.001723] fuse_background_request: req=000000003f362e4b, op=18
> [  301.001767] do_fuse_request_end: req=000000003f362e4b,
> from_timer=0, req->timer.func=1
> [  311.235964] fuse_request_timeout: req=00000000b4182f02
> [  311.235969] ------------[ cut here ]------------
> [  311.235970] list_del corruption, ffff9a8072d3a000->next is
> LIST_POISON1 (dead000000000100)
> [  311.235982] WARNING: CPU: 26 PID: 0 at lib/list_debug.c:56
> __list_del_entry_valid_or_report+0x8a/0xf0
> [  311.236036] CPU: 26 PID: 0 Comm: swapper/26 Kdump: loaded Tainted:
> G    B              6.10.0+ #8
> [  311.236040] RIP: 0010:__list_del_entry_valid_or_report+0x8a/0xf0
> [  311.236043] Code: 31 c0 5d c3 cc cc cc cc 48 c7 c7 60 7a 5e b0 e8
> cc ea a4 ff 0f 0b 31 c0 5d c3 cc cc cc cc 48 c7 c7 88 7a 5e b0 e8 b6
> ea a4 ff <0f> 0b 31 c0 5d c3 cc cc cc cc 48 89 ca 48 c7 c7 c0 7a 5e b0
> e8 9d
> [  311.236045] RSP: 0018:ffffb6364056ce60 EFLAGS: 00010282
> [  311.236047] RAX: 0000000000000000 RBX: ffff9a8072d3a0a0 RCX: 0000000000000027
> [  311.236048] RDX: ffff9a807f4a0848 RSI: 0000000000000001 RDI: ffff9a807f4a0840
> [  311.236049] RBP: ffffb6364056ce60 R08: 0000000000000000 R09: ffffb6364056cce0
> [  311.236050] R10: ffffb6364056ccd8 R11: ffffffffb1017ee8 R12: ffff9a8072d3a000
> [  311.236051] R13: ffff9a420d5af000 R14: 0000000100002800 R15: ffff9a420d5af054
> [  311.236054] FS:  0000000000000000(0000) GS:ffff9a807f480000(0000)
> knlGS:0000000000000000
> [  311.236056] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  311.236057] CR2: 000000c000dc5000 CR3: 000000010cc38003 CR4: 0000000000370ef0
> [  311.236058] Call Trace:
> [  311.236059]  <IRQ>
> [  311.236061]  ? show_regs+0x69/0x80
> [  311.236065]  ? __warn+0x88/0x130
> [  311.236068]  ? __list_del_entry_valid_or_report+0x8a/0xf0
> [  311.236070]  ? report_bug+0x18f/0x1a0
> [  311.236074]  ? handle_bug+0x40/0x70
> [  311.236077]  ? exc_invalid_op+0x19/0x70
> [  311.236079]  ? asm_exc_invalid_op+0x1b/0x20
> [  311.236083]  ? __list_del_entry_valid_or_report+0x8a/0xf0
> [  311.236086]  fuse_request_timeout+0x15c/0x1a0 [fuse]
> [  311.236094]  ? __pfx_fuse_request_timeout+0x10/0x10 [fuse]
> [  311.236099]  call_timer_fn+0x2c/0x130
> [  311.236102]  ? __pfx_fuse_request_timeout+0x10/0x10 [fuse]
> [  311.236106]  __run_timers+0x2c2/0x360
> [  311.236108]  ? kvm_sched_clock_read+0x11/0x20
> [  311.236110]  ? sched_clock_noinstr+0x9/0x10
> [  311.236111]  ? sched_clock+0x10/0x30
> [  311.236114]  ? sched_clock_cpu+0x10/0x190
> [  311.236116]  run_timer_softirq+0x3a/0x60
> [  311.236118]  handle_softirqs+0x118/0x350
> [  311.236121]  irq_exit_rcu+0x60/0x80
> [  311.236123]  sysvec_apic_timer_interrupt+0x7f/0x90
> [  311.236124]  </IRQ>
> [  311.236125]  <TASK>
> [  311.236126]  asm_sysvec_apic_timer_interrupt+0x1b/0x20
> [  311.236128] RIP: 0010:default_idle+0xb/0x20
> [  311.236130] Code: 00 4d 29 c8 4c 01 c7 4c 29 c2 e9 6e ff ff ff 90
> 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 eb 07 0f 00 2d b3 51 33
> 00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40
> 00 90
> [  311.236131] RSP: 0018:ffffb6364018fe18 EFLAGS: 00000246
> [  311.236133] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 0001c0582ca6ada0
> [  311.236134] RDX: 0000000000000001 RSI: ffffffffb112e060 RDI: ffff9a807f4bc8e0
> [  311.236135] RBP: ffffb6364018fe20 R08: 00000048770ca8ec R09: 0000000000000001
> [  311.236135] R10: ffffffffb0e56080 R11: 0000000000000001 R12: 0000000000000001
> [  311.236136] R13: ffffffffb112e060 R14: ffffffffb112e0e0 R15: 0000000000000001
> [  311.236138]  ? ct_kernel_exit.constprop.0+0x79/0x90
> [  311.236140]  ? arch_cpu_idle+0x9/0x10
> [  311.236142]  default_enter_idle+0x22/0x2f
> [  311.236144]  cpuidle_enter_state+0x88/0x430
> [  311.236146]  cpuidle_enter+0x34/0x50
> [  311.236150]  call_cpuidle+0x22/0x50
> [  311.236151]  cpuidle_idle_call+0xd2/0x120
> [  311.236154]  do_idle+0x77/0xd0
> [  311.236156]  cpu_startup_entry+0x2c/0x30
> [  311.236158]  start_secondary+0x117/0x140
> [  311.236160]  common_startup_64+0x13e/0x141
> [  311.236163]  </TASK>
> [  311.236163] ---[ end trace 0000000000000000 ]---
> [  311.236165] do_fuse_request_end: req=00000000b4182f02,
> from_timer=1, req->timer.func=1
> [  311.236166] ------------[ cut here ]------------
> [  311.236167] refcount_t: underflow; use-after-free.
> [  311.236174] WARNING: CPU: 26 PID: 0 at lib/refcount.c:28
> refcount_warn_saturate+0xc2/0x110
> [  311.236207] CPU: 26 PID: 0 Comm: swapper/26 Kdump: loaded Tainted:
> G    B   W          6.10.0+ #8
> [  311.236209] RIP: 0010:refcount_warn_saturate+0xc2/0x110
> [  311.236211] Code: 01 e8 d2 72 a6 ff 0f 0b 5d c3 cc cc cc cc 80 3d
> 33 d4 b1 01 00 75 81 48 c7 c7 30 69 5e b0 c6 05 23 d4 b1 01 01 e8 ae
> 72 a6 ff <0f> 0b 5d c3 cc cc cc cc 80 3d 0d d4 b1 01 00 0f 85 59 ff ff
> ff 48
> [  311.236212] RSP: 0018:ffffb6364056cdf8 EFLAGS: 00010286
> [  311.236213] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000027
> [  311.236214] RDX: ffff9a807f4a0848 RSI: 0000000000000001 RDI: ffff9a807f4a0840
> [  311.236215] RBP: ffffb6364056cdf8 R08: 0000000000000000 R09: ffffb6364056cc78
> [  311.236216] R10: ffffb6364056cc70 R11: ffffffffb1017ee8 R12: ffff9a8072d3a000
> [  311.236217] R13: ffff9a420d5af000 R14: ffff9a42426a6ec0 R15: ffff9a8072d3a010
> [  311.236219] FS:  0000000000000000(0000) GS:ffff9a807f480000(0000)
> knlGS:0000000000000000
> [  311.236221] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  311.236222] CR2: 000000c000dc5000 CR3: 000000010cc38003 CR4: 0000000000370ef0
> [  311.236223] Call Trace:
> [  311.236223]  <IRQ>
> [  311.236224]  ? show_regs+0x69/0x80
> [  311.236233]  ? __warn+0x88/0x130
> [  311.236235]  ? refcount_warn_saturate+0xc2/0x110
> [  311.236236]  ? report_bug+0x18f/0x1a0
> [  311.236238]  ? handle_bug+0x40/0x70
> [  311.236240]  ? exc_invalid_op+0x19/0x70
> [  311.236242]  ? asm_exc_invalid_op+0x1b/0x20
> [  311.236244]  ? refcount_warn_saturate+0xc2/0x110
> [  311.236246]  ? refcount_warn_saturate+0xc2/0x110
> [  311.236247]  fuse_put_request+0xc6/0xf0 [fuse]
> [  311.236253]  do_fuse_request_end+0xcc/0x1e0 [fuse]
> [  311.236258]  fuse_request_timeout+0xac/0x1a0 [fuse]
> [  311.236263]  ? __pfx_fuse_request_timeout+0x10/0x10 [fuse]
> [  311.236267]  call_timer_fn+0x2c/0x130
> [  311.236269]  ? __pfx_fuse_request_timeout+0x10/0x10 [fuse]
> [  311.236274]  __run_timers+0x2c2/0x360
> [  311.236275]  ? kvm_sched_clock_read+0x11/0x20
> [  311.236277]  ? sched_clock_noinstr+0x9/0x10
> [  311.236278]  ? sched_clock+0x10/0x30
> [  311.236280]  ? sched_clock_cpu+0x10/0x190
> [  311.236281]  run_timer_softirq+0x3a/0x60
> [  311.236283]  handle_softirqs+0x118/0x350
> [  311.236285]  irq_exit_rcu+0x60/0x80
> [  311.236286]  sysvec_apic_timer_interrupt+0x7f/0x90
> [  311.236288]  </IRQ>
> [  311.236288]  <TASK>
> [  311.236289]  asm_sysvec_apic_timer_interrupt+0x1b/0x20
> [  311.236291] RIP: 0010:default_idle+0xb/0x20
> [  311.236293] Code: 00 4d 29 c8 4c 01 c7 4c 29 c2 e9 6e ff ff ff 90
> 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 eb 07 0f 00 2d b3 51 33
> 00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40
> 00 90
> [  311.236294] RSP: 0018:ffffb6364018fe18 EFLAGS: 00000246
> [  311.236295] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 0001c0582ca6ada0
> [  311.236296] RDX: 0000000000000001 RSI: ffffffffb112e060 RDI: ffff9a807f4bc8e0
> [  311.236297] RBP: ffffb6364018fe20 R08: 00000048770ca8ec R09: 0000000000000001
> [  311.236298] R10: ffffffffb0e56080 R11: 0000000000000001 R12: 0000000000000001
> [  311.236299] R13: ffffffffb112e060 R14: ffffffffb112e0e0 R15: 0000000000000001
> [  311.236300]  ? ct_kernel_exit.constprop.0+0x79/0x90
> [  311.236302]  ? arch_cpu_idle+0x9/0x10
> [  311.236304]  default_enter_idle+0x22/0x2f
> [  311.236306]  cpuidle_enter_state+0x88/0x430
> [  311.236308]  cpuidle_enter+0x34/0x50
> [  311.236310]  call_cpuidle+0x22/0x50
> [  311.236311]  cpuidle_idle_call+0xd2/0x120
> [  311.236313]  do_idle+0x77/0xd0
> [  311.236315]  cpu_startup_entry+0x2c/0x30
> [  311.236317]  start_secondary+0x117/0x140
> [  311.236318]  common_startup_64+0x13e/0x141
> [  311.236320]  </TASK>
> [  311.236321] ---[ end trace 0000000000000000 ]---
>
> I wish I could provide you with a clear explanation of what happened
> in my test environment, but I haven't had the time to delve into the
> details yet.
>
>
> --
> Regards
> Yafang

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/2] fuse: add optional kernel-enforced timeout for requests
  2024-07-30  0:23 ` [PATCH v2 1/2] fuse: add optional kernel-enforced timeout " Joanne Koong
  2024-08-04 22:46   ` Bernd Schubert
  2024-08-05  4:52   ` Joanne Koong
@ 2024-08-05  7:32   ` Jingbo Xu
  2024-08-05 22:53     ` Joanne Koong
  2 siblings, 1 reply; 34+ messages in thread
From: Jingbo Xu @ 2024-08-05  7:32 UTC (permalink / raw)
  To: Joanne Koong, miklos, linux-fsdevel
  Cc: josef, bernd.schubert, laoar.shao, kernel-team



On 7/30/24 8:23 AM, Joanne Koong wrote:
> There are situations where fuse servers can become unresponsive or take
> too long to reply to a request. Currently there is no upper bound on
> how long a request may take, which may be frustrating to users who get
> stuck waiting for a request to complete.
> 
> This commit adds a timeout option (in seconds) for requests. If the
> timeout elapses before the server replies to the request, the request
> will fail with -ETIME.
> 
> There are 3 possibilities for a request that times out:
> a) The request times out before the request has been sent to userspace
> b) The request times out after the request has been sent to userspace
> and before it receives a reply from the server
> c) The request times out after the request has been sent to userspace
> and the server replies while the kernel is timing out the request
> 
> While a request timeout is being handled, there may be other handlers
> running at the same time if:
> a) the kernel is forwarding the request to the server
> b) the kernel is processing the server's reply to the request
> c) the request is being re-sent
> d) the connection is aborting
> e) the device is getting released
> 
> Proper synchronization must be added to ensure that the request is
> handled correctly in all of these cases. To this effect, there is a new
> FR_FINISHING bit added to the request flags, which is set atomically by
> either the timeout handler (see fuse_request_timeout()) which is invoked
> after the request timeout elapses or set by the request reply handler
> (see dev_do_write()), whichever gets there first. If the reply handler
> and the timeout handler are executing simultaneously and the reply handler
> sets FR_FINISHING before the timeout handler, then the request will be
> handled as if the timeout did not elapse. If the timeout handler sets
> FR_FINISHING before the reply handler, then the request will fail with
> -ETIME and the request will be cleaned up.
> 
> Currently, this is the refcount lifecycle of a request:
> 
> Synchronous request is created:
> fuse_simple_request -> allocates request, sets refcount to 1
>   __fuse_request_send -> acquires refcount
>     queues request and waits for reply...
> fuse_simple_request -> drops refcount
> 
> Background request is created:
> fuse_simple_background -> allocates request, sets refcount to 1
> 
> Request is replied to:
> fuse_dev_do_write
>   fuse_request_end -> drops refcount on request
> 
> Proper acquires on the request reference must be added to ensure that the
> timeout handler does not drop the last refcount on the request while
> other handlers may be operating on the request. Please note that the
> timeout handler may get invoked at any phase of the request's
> lifetime (eg before the request has been forwarded to userspace, etc).
> 
> It is always guaranteed that there is a refcount on the request when the
> timeout handler is executing. The timeout handler will be either
> deactivated by the reply/abort/release handlers, or if the timeout
> handler is concurrently executing on another CPU, the reply/abort/release
> handlers will wait for the timeout handler to finish executing first before
> it drops the final refcount on the request.
> 
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
>  fs/fuse/dev.c    | 187 +++++++++++++++++++++++++++++++++++++++++++++--
>  fs/fuse/fuse_i.h |  14 ++++
>  fs/fuse/inode.c  |   7 ++
>  3 files changed, 200 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index 9eb191b5c4de..9992bc5f4469 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -31,6 +31,8 @@ MODULE_ALIAS("devname:fuse");
>  
>  static struct kmem_cache *fuse_req_cachep;
>  
> +static void fuse_request_timeout(struct timer_list *timer);
> +
>  static struct fuse_dev *fuse_get_dev(struct file *file)
>  {
>  	/*
> @@ -48,6 +50,8 @@ static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
>  	refcount_set(&req->count, 1);
>  	__set_bit(FR_PENDING, &req->flags);
>  	req->fm = fm;
> +	if (fm->fc->req_timeout)
> +		timer_setup(&req->timer, fuse_request_timeout, 0);
>  }
>  
>  static struct fuse_req *fuse_request_alloc(struct fuse_mount *fm, gfp_t flags)
> @@ -277,12 +281,15 @@ static void flush_bg_queue(struct fuse_conn *fc)
>   * the 'end' callback is called if given, else the reference to the
>   * request is released
>   */
> -void fuse_request_end(struct fuse_req *req)
> +static void do_fuse_request_end(struct fuse_req *req, bool from_timer_callback)
>  {
>  	struct fuse_mount *fm = req->fm;
>  	struct fuse_conn *fc = fm->fc;
>  	struct fuse_iqueue *fiq = &fc->iq;
>  
> +	if (from_timer_callback)
> +		req->out.h.error = -ETIME;
> +

FMHO, could we move the above error assignment up to the caller to make
do_fuse_request_end() look cleaner?


>  	if (test_and_set_bit(FR_FINISHED, &req->flags))
>  		goto put_request;
>  
> @@ -296,8 +303,6 @@ void fuse_request_end(struct fuse_req *req)
>  		list_del_init(&req->intr_entry);
>  		spin_unlock(&fiq->lock);
>  	}
> -	WARN_ON(test_bit(FR_PENDING, &req->flags));
> -	WARN_ON(test_bit(FR_SENT, &req->flags));
>  	if (test_bit(FR_BACKGROUND, &req->flags)) {
>  		spin_lock(&fc->bg_lock);
>  		clear_bit(FR_BACKGROUND, &req->flags);
> @@ -324,13 +329,105 @@ void fuse_request_end(struct fuse_req *req)
>  		wake_up(&req->waitq);
>  	}
>  
> +	if (!from_timer_callback && req->timer.function)
> +		timer_delete_sync(&req->timer);
> +

Similarly, move the caller i.e. fuse_request_end() call
timer_delete_sync() instead?


>  	if (test_bit(FR_ASYNC, &req->flags))
>  		req->args->end(fm, req->args, req->out.h.error);
>  put_request:
>  	fuse_put_request(req);
>  }
> +
> +void fuse_request_end(struct fuse_req *req)
> +{
> +	WARN_ON(test_bit(FR_PENDING, &req->flags));
> +	WARN_ON(test_bit(FR_SENT, &req->flags));
> +
> +	do_fuse_request_end(req, false);
> +}
>  EXPORT_SYMBOL_GPL(fuse_request_end);
>  
> +static void timeout_inflight_req(struct fuse_req *req)
> +{
> +	struct fuse_conn *fc = req->fm->fc;
> +	struct fuse_iqueue *fiq = &fc->iq;
> +	struct fuse_pqueue *fpq;
> +
> +	spin_lock(&fiq->lock);
> +	fpq = req->fpq;
> +	spin_unlock(&fiq->lock);
> +
> +	/*
> +	 * If fpq has not been set yet, then the request is aborting (which
> +	 * clears FR_PENDING flag) before dev_do_read (which sets req->fpq)
> +	 * has been called. Let the abort handler handle this request.
> +	 */
> +	if (!fpq)
> +		return;
> +
> +	spin_lock(&fpq->lock);
> +	if (!fpq->connected || req->out.h.error == -ECONNABORTED) {
> +		/*
> +		 * Connection is being aborted or the fuse_dev is being released.
> +		 * The abort / release will clean up the request
> +		 */
> +		spin_unlock(&fpq->lock);
> +		return;
> +	}
> +
> +	if (!test_bit(FR_PRIVATE, &req->flags))
> +		list_del_init(&req->list);
> +
> +	spin_unlock(&fpq->lock);
> +
> +	do_fuse_request_end(req, true);
> +}
> +
> +static void timeout_pending_req(struct fuse_req *req)
> +{
> +	struct fuse_conn *fc = req->fm->fc;
> +	struct fuse_iqueue *fiq = &fc->iq;
> +	bool background = test_bit(FR_BACKGROUND, &req->flags);
> +
> +	if (background)
> +		spin_lock(&fc->bg_lock);

Just out of curious, why fc->bg_lock is needed here, which makes the
code look less clean?


> +	spin_lock(&fiq->lock);
> +
> +	if (!test_bit(FR_PENDING, &req->flags)) {
> +		spin_unlock(&fiq->lock);
> +		if (background)
> +			spin_unlock(&fc->bg_lock);
> +		timeout_inflight_req(req);
> +		return;
> +	}
> +
> +	if (!test_bit(FR_PRIVATE, &req->flags))
> +		list_del_init(&req->list);
> +
> +	spin_unlock(&fiq->lock);
> +	if (background)
> +		spin_unlock(&fc->bg_lock);
> +
> +	do_fuse_request_end(req, true);

I'm not sure if special handling for requests in fpq->io list in needed
here.  At least when connection is aborted, thos LOCKED requests in
fpq->io list won't be finished instantly until they are unlocked.



-- 
Thanks,
Jingbo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 2/2] fuse: add default_request_timeout and max_request_timeout sysctls
  2024-07-30  0:23 ` [PATCH v2 2/2] fuse: add default_request_timeout and max_request_timeout sysctls Joanne Koong
  2024-07-30  7:49   ` kernel test robot
  2024-07-30  9:14   ` kernel test robot
@ 2024-08-05  7:38   ` Jingbo Xu
  2024-08-06  1:26     ` Joanne Koong
  2 siblings, 1 reply; 34+ messages in thread
From: Jingbo Xu @ 2024-08-05  7:38 UTC (permalink / raw)
  To: Joanne Koong, miklos, linux-fsdevel
  Cc: josef, bernd.schubert, laoar.shao, kernel-team



On 7/30/24 8:23 AM, Joanne Koong wrote:
> Introduce two new sysctls, "default_request_timeout" and
> "max_request_timeout". These control timeouts on replies by the
> server to kernel-issued fuse requests.
> 
> "default_request_timeout" sets a timeout if no timeout is specified by
> the fuse server on mount. 0 (default) indicates no timeout should be enforced.
> 
> "max_request_timeout" sets a maximum timeout for fuse requests. If the
> fuse server attempts to set a timeout greater than max_request_timeout,
> the system will default to max_request_timeout. Similarly, if the max
> default timeout is greater than the max request timeout, the system will
> default to the max request timeout. 0 (default) indicates no timeout should
> be enforced.
> 
> $ sysctl -a | grep fuse
> fs.fuse.default_request_timeout = 0
> fs.fuse.max_request_timeout = 0
> 
> $ echo 0x100000000 | sudo tee /proc/sys/fs/fuse/default_request_timeout
> tee: /proc/sys/fs/fuse/default_request_timeout: Invalid argument
> 
> $ echo 0xFFFFFFFF | sudo tee /proc/sys/fs/fuse/default_request_timeout
> 0xFFFFFFFF
> 
> $ sysctl -a | grep fuse
> fs.fuse.default_request_timeout = 4294967295
> fs.fuse.max_request_timeout = 0
> 
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>

Why do we introduce a new sysfs knob for fuse instead of putting it
under fusectl?  Though we also encounter some issues with fusectl since
fusectl doesn't work well in container scenarios as it doesn't support
FS_USERNS_MOUNT.


-- 
Thanks,
Jingbo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/2] fuse: add optional kernel-enforced timeout for requests
  2024-08-05  4:45     ` Joanne Koong
@ 2024-08-05 13:05       ` Bernd Schubert
  0 siblings, 0 replies; 34+ messages in thread
From: Bernd Schubert @ 2024-08-05 13:05 UTC (permalink / raw)
  To: Joanne Koong; +Cc: miklos, linux-fsdevel, josef, laoar.shao, kernel-team



On 8/5/24 06:45, Joanne Koong wrote:
> On Sun, Aug 4, 2024 at 3:46 PM Bernd Schubert
> <bernd.schubert@fastmail.fm> wrote:
>>
>>
>>
>> On 7/30/24 02:23, Joanne Koong wrote:
>>> There are situations where fuse servers can become unresponsive or take
>>> too long to reply to a request. Currently there is no upper bound on
>>> how long a request may take, which may be frustrating to users who get
>>> stuck waiting for a request to complete.
>>>
>>> This commit adds a timeout option (in seconds) for requests. If the
>>> timeout elapses before the server replies to the request, the request
>>> will fail with -ETIME.
>>>
>>> There are 3 possibilities for a request that times out:
>>> a) The request times out before the request has been sent to userspace
>>> b) The request times out after the request has been sent to userspace
>>> and before it receives a reply from the server
>>> c) The request times out after the request has been sent to userspace
>>> and the server replies while the kernel is timing out the request
>>>
>>> While a request timeout is being handled, there may be other handlers
>>> running at the same time if:
>>> a) the kernel is forwarding the request to the server
>>> b) the kernel is processing the server's reply to the request
>>> c) the request is being re-sent
>>> d) the connection is aborting
>>> e) the device is getting released
>>>
>>> Proper synchronization must be added to ensure that the request is
>>> handled correctly in all of these cases. To this effect, there is a new
>>> FR_FINISHING bit added to the request flags, which is set atomically by
>>> either the timeout handler (see fuse_request_timeout()) which is invoked
>>> after the request timeout elapses or set by the request reply handler
>>> (see dev_do_write()), whichever gets there first. If the reply handler
>>> and the timeout handler are executing simultaneously and the reply handler
>>> sets FR_FINISHING before the timeout handler, then the request will be
>>> handled as if the timeout did not elapse. If the timeout handler sets
>>> FR_FINISHING before the reply handler, then the request will fail with
>>> -ETIME and the request will be cleaned up.
>>>
>>> Currently, this is the refcount lifecycle of a request:
>>>
>>> Synchronous request is created:
>>> fuse_simple_request -> allocates request, sets refcount to 1
>>>    __fuse_request_send -> acquires refcount
>>>      queues request and waits for reply...
>>> fuse_simple_request -> drops refcount
>>>
>>> Background request is created:
>>> fuse_simple_background -> allocates request, sets refcount to 1
>>>
>>> Request is replied to:
>>> fuse_dev_do_write
>>>    fuse_request_end -> drops refcount on request
>>>
>>> Proper acquires on the request reference must be added to ensure that the
>>> timeout handler does not drop the last refcount on the request while
>>> other handlers may be operating on the request. Please note that the
>>> timeout handler may get invoked at any phase of the request's
>>> lifetime (eg before the request has been forwarded to userspace, etc).
>>>
>>> It is always guaranteed that there is a refcount on the request when the
>>> timeout handler is executing. The timeout handler will be either
>>> deactivated by the reply/abort/release handlers, or if the timeout
>>> handler is concurrently executing on another CPU, the reply/abort/release
>>> handlers will wait for the timeout handler to finish executing first before
>>> it drops the final refcount on the request.
>>>
>>> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
>>> ---
>>>   fs/fuse/dev.c    | 187 +++++++++++++++++++++++++++++++++++++++++++++--
>>>   fs/fuse/fuse_i.h |  14 ++++
>>>   fs/fuse/inode.c  |   7 ++
>>>   3 files changed, 200 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
>>> index 9eb191b5c4de..9992bc5f4469 100644
>>> --- a/fs/fuse/dev.c
>>> +++ b/fs/fuse/dev.c
>>> @@ -31,6 +31,8 @@ MODULE_ALIAS("devname:fuse");
>>>
>>>   static struct kmem_cache *fuse_req_cachep;
>>>
>>> +static void fuse_request_timeout(struct timer_list *timer);
>>> +
>>>   static struct fuse_dev *fuse_get_dev(struct file *file)
>>>   {
>>>        /*
>>> @@ -48,6 +50,8 @@ static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
>>>        refcount_set(&req->count, 1);
>>>        __set_bit(FR_PENDING, &req->flags);
>>>        req->fm = fm;
>>> +     if (fm->fc->req_timeout)
>>> +             timer_setup(&req->timer, fuse_request_timeout, 0);
>>>   }
>>>
>>>   static struct fuse_req *fuse_request_alloc(struct fuse_mount *fm, gfp_t flags)
>>> @@ -277,12 +281,15 @@ static void flush_bg_queue(struct fuse_conn *fc)
>>>    * the 'end' callback is called if given, else the reference to the
>>>    * request is released
>>>    */
>>> -void fuse_request_end(struct fuse_req *req)
>>> +static void do_fuse_request_end(struct fuse_req *req, bool from_timer_callback)
>>>   {
>>>        struct fuse_mount *fm = req->fm;
>>>        struct fuse_conn *fc = fm->fc;
>>>        struct fuse_iqueue *fiq = &fc->iq;
>>>
>>> +     if (from_timer_callback)
>>> +             req->out.h.error = -ETIME;
>>> +
>>>        if (test_and_set_bit(FR_FINISHED, &req->flags))
>>>                goto put_request;
>>>
>>> @@ -296,8 +303,6 @@ void fuse_request_end(struct fuse_req *req)
>>>                list_del_init(&req->intr_entry);
>>>                spin_unlock(&fiq->lock);
>>>        }
>>> -     WARN_ON(test_bit(FR_PENDING, &req->flags));
>>> -     WARN_ON(test_bit(FR_SENT, &req->flags));
>>>        if (test_bit(FR_BACKGROUND, &req->flags)) {
>>>                spin_lock(&fc->bg_lock);
>>>                clear_bit(FR_BACKGROUND, &req->flags);
>>> @@ -324,13 +329,105 @@ void fuse_request_end(struct fuse_req *req)
>>>                wake_up(&req->waitq);
>>>        }
>>>
>>> +     if (!from_timer_callback && req->timer.function)
>>> +             timer_delete_sync(&req->timer);
>>> +
>>>        if (test_bit(FR_ASYNC, &req->flags))
>>>                req->args->end(fm, req->args, req->out.h.error);
>>>   put_request:
>>>        fuse_put_request(req);
>>>   }
>>> +
>>> +void fuse_request_end(struct fuse_req *req)
>>> +{
>>> +     WARN_ON(test_bit(FR_PENDING, &req->flags));
>>> +     WARN_ON(test_bit(FR_SENT, &req->flags));
>>> +
>>> +     do_fuse_request_end(req, false);
>>> +}
>>>   EXPORT_SYMBOL_GPL(fuse_request_end);
>>>
>>> +static void timeout_inflight_req(struct fuse_req *req)
>>> +{
>>> +     struct fuse_conn *fc = req->fm->fc;
>>> +     struct fuse_iqueue *fiq = &fc->iq;
>>> +     struct fuse_pqueue *fpq;
>>> +
>>> +     spin_lock(&fiq->lock);
>>> +     fpq = req->fpq;
>>> +     spin_unlock(&fiq->lock);
>>> +
>>> +     /*
>>> +      * If fpq has not been set yet, then the request is aborting (which
>>> +      * clears FR_PENDING flag) before dev_do_read (which sets req->fpq)
>>> +      * has been called. Let the abort handler handle this request.
>>> +      */
>>> +     if (!fpq)
>>> +             return;
>>> +
>>> +     spin_lock(&fpq->lock);
>>> +     if (!fpq->connected || req->out.h.error == -ECONNABORTED) {
>>> +             /*
>>> +              * Connection is being aborted or the fuse_dev is being released.
>>> +              * The abort / release will clean up the request
>>> +              */
>>> +             spin_unlock(&fpq->lock);
>>> +             return;
>>> +     }
>>> +
>>> +     if (!test_bit(FR_PRIVATE, &req->flags))
>>> +             list_del_init(&req->list);
>>> +
>>> +     spin_unlock(&fpq->lock);
>>> +
>>> +     do_fuse_request_end(req, true);
>>> +}
>>> +
>>> +static void timeout_pending_req(struct fuse_req *req)
>>> +{
>>> +     struct fuse_conn *fc = req->fm->fc;
>>> +     struct fuse_iqueue *fiq = &fc->iq;
>>> +     bool background = test_bit(FR_BACKGROUND, &req->flags);
>>> +
>>> +     if (background)
>>> +             spin_lock(&fc->bg_lock);
>>> +     spin_lock(&fiq->lock);
>>> +
>>> +     if (!test_bit(FR_PENDING, &req->flags)) {
>>> +             spin_unlock(&fiq->lock);
>>> +             if (background)
>>> +                     spin_unlock(&fc->bg_lock);
>>> +             timeout_inflight_req(req);
>>> +             return;
>>> +     }
>>> +
>>> +     if (!test_bit(FR_PRIVATE, &req->flags))
>>> +             list_del_init(&req->list);
>>> +
>>> +     spin_unlock(&fiq->lock);
>>> +     if (background)
>>> +             spin_unlock(&fc->bg_lock);
>>> +
>>> +     do_fuse_request_end(req, true);
>>> +}
>>> +
>>> +static void fuse_request_timeout(struct timer_list *timer)
>>> +{
>>> +     struct fuse_req *req = container_of(timer, struct fuse_req, timer);
>>
>> Let's say the timeout thread races with the thread that does
>> fuse_dev_do_write() and that thread is much faster and already calls :
>>
>> fuse_dev_do_write():
>>          fuse_request_end(req);
>>          fuse_put_request(req);
>> out:
>>          return err ? err : nbytes;
>>
>>
>> (What I mean is that the timeout triggered, but did not reach
>> FR_FINISHING yet and at the same time another thread on another core
>> calls fuse_dev_do_write()).
>>
>>> +
>>> +     /*
>>> +      * Request reply is being finished by the kernel right now.
>>> +      * No need to time out the request.
>>> +      */
>>> +     if (test_and_set_bit(FR_FINISHING, &req->flags))
>>> +             return;
>>
>> Wouldn't that trigger an UAF when the fuse_dev_do_write() was proceding
>> much faster and already released the request?
> 
> I don't believe so. In fuse_dev_do_write(), the call to
> fuse_request_end() will call timer_delete_sync(), which will either
> cancel the timer or wait for the timer to finish running if it's
> concurrently running on another CPU.

Yeah you right, I had missed that timer_delete_sync waits.

>>
>>> +
>>> +     if (test_bit(FR_PENDING, &req->flags))
>>> +             timeout_pending_req(req);
>>> +     else
>>> +             timeout_inflight_req(req);
>>> +}
>>> +
>>>   static int queue_interrupt(struct fuse_req *req)
>>>   {
>>>        struct fuse_iqueue *fiq = &req->fm->fc->iq;
>>> @@ -409,7 +506,8 @@ static void request_wait_answer(struct fuse_req *req)
>>>
>>>   static void __fuse_request_send(struct fuse_req *req)
>>>   {
>>> -     struct fuse_iqueue *fiq = &req->fm->fc->iq;
>>> +     struct fuse_conn *fc = req->fm->fc;
>>> +     struct fuse_iqueue *fiq = &fc->iq;
>>>
>>>        BUG_ON(test_bit(FR_BACKGROUND, &req->flags));
>>>        spin_lock(&fiq->lock);
>>> @@ -421,6 +519,10 @@ static void __fuse_request_send(struct fuse_req *req)
>>>                /* acquire extra reference, since request is still needed
>>>                   after fuse_request_end() */
>>>                __fuse_get_request(req);
>>> +             if (req->timer.function) {
>>> +                     req->timer.expires = jiffies + fc->req_timeout;
>>> +                     add_timer(&req->timer);
>>> +             }
>>
>> Does this leave a chance to put in a timeout of 0, if someone first sets
>>   fc->req_timeout and then sets it back to 0?
> 
> I don't think so. The req_timeout is per connection and specified at
> mount time. Once the fc->req_timeout is set for the connection it
> can't be changed even if the default_req_timeout sysctl gets set to 0.

Ah right, I had somehow though changing the sysctl param would update 
connections.

Sorry for the noise!


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/2] fuse: add optional kernel-enforced timeout for requests
  2024-08-05  4:52   ` Joanne Koong
@ 2024-08-05 13:26     ` Bernd Schubert
  2024-08-05 22:10       ` Joanne Koong
  0 siblings, 1 reply; 34+ messages in thread
From: Bernd Schubert @ 2024-08-05 13:26 UTC (permalink / raw)
  To: Joanne Koong, miklos, linux-fsdevel; +Cc: josef, laoar.shao, kernel-team



On 8/5/24 06:52, Joanne Koong wrote:
> On Mon, Jul 29, 2024 at 5:28 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>>
>> There are situations where fuse servers can become unresponsive or take
>> too long to reply to a request. Currently there is no upper bound on
>> how long a request may take, which may be frustrating to users who get
>> stuck waiting for a request to complete.
>>
>> This commit adds a timeout option (in seconds) for requests. If the
>> timeout elapses before the server replies to the request, the request
>> will fail with -ETIME.
>>
>> There are 3 possibilities for a request that times out:
>> a) The request times out before the request has been sent to userspace
>> b) The request times out after the request has been sent to userspace
>> and before it receives a reply from the server
>> c) The request times out after the request has been sent to userspace
>> and the server replies while the kernel is timing out the request
>>
>> While a request timeout is being handled, there may be other handlers
>> running at the same time if:
>> a) the kernel is forwarding the request to the server
>> b) the kernel is processing the server's reply to the request
>> c) the request is being re-sent
>> d) the connection is aborting
>> e) the device is getting released
>>
>> Proper synchronization must be added to ensure that the request is
>> handled correctly in all of these cases. To this effect, there is a new
>> FR_FINISHING bit added to the request flags, which is set atomically by
>> either the timeout handler (see fuse_request_timeout()) which is invoked
>> after the request timeout elapses or set by the request reply handler
>> (see dev_do_write()), whichever gets there first. If the reply handler
>> and the timeout handler are executing simultaneously and the reply handler
>> sets FR_FINISHING before the timeout handler, then the request will be
>> handled as if the timeout did not elapse. If the timeout handler sets
>> FR_FINISHING before the reply handler, then the request will fail with
>> -ETIME and the request will be cleaned up.
>>
>> Currently, this is the refcount lifecycle of a request:
>>
>> Synchronous request is created:
>> fuse_simple_request -> allocates request, sets refcount to 1
>>    __fuse_request_send -> acquires refcount
>>      queues request and waits for reply...
>> fuse_simple_request -> drops refcount
>>
>> Background request is created:
>> fuse_simple_background -> allocates request, sets refcount to 1
>>
>> Request is replied to:
>> fuse_dev_do_write
>>    fuse_request_end -> drops refcount on request
>>
>> Proper acquires on the request reference must be added to ensure that the
>> timeout handler does not drop the last refcount on the request while
>> other handlers may be operating on the request. Please note that the
>> timeout handler may get invoked at any phase of the request's
>> lifetime (eg before the request has been forwarded to userspace, etc).
>>
>> It is always guaranteed that there is a refcount on the request when the
>> timeout handler is executing. The timeout handler will be either
>> deactivated by the reply/abort/release handlers, or if the timeout
>> handler is concurrently executing on another CPU, the reply/abort/release
>> handlers will wait for the timeout handler to finish executing first before
>> it drops the final refcount on the request.
>>
>> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
>> ---
>>   fs/fuse/dev.c    | 187 +++++++++++++++++++++++++++++++++++++++++++++--
>>   fs/fuse/fuse_i.h |  14 ++++
>>   fs/fuse/inode.c  |   7 ++
>>   3 files changed, 200 insertions(+), 8 deletions(-)
>>
>> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
>> index 9eb191b5c4de..9992bc5f4469 100644
>> --- a/fs/fuse/dev.c
>> +++ b/fs/fuse/dev.c
>> @@ -31,6 +31,8 @@ MODULE_ALIAS("devname:fuse");
>>
>>   static struct kmem_cache *fuse_req_cachep;
>>
>> +static void fuse_request_timeout(struct timer_list *timer);
>> +
>>   static struct fuse_dev *fuse_get_dev(struct file *file)
>>   {
>>          /*
>> @@ -48,6 +50,8 @@ static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
>>          refcount_set(&req->count, 1);
>>          __set_bit(FR_PENDING, &req->flags);
>>          req->fm = fm;
>> +       if (fm->fc->req_timeout)
>> +               timer_setup(&req->timer, fuse_request_timeout, 0);
>>   }
>>
>>   static struct fuse_req *fuse_request_alloc(struct fuse_mount *fm, gfp_t flags)
>> @@ -277,12 +281,15 @@ static void flush_bg_queue(struct fuse_conn *fc)
>>    * the 'end' callback is called if given, else the reference to the
>>    * request is released
>>    */
>> -void fuse_request_end(struct fuse_req *req)
>> +static void do_fuse_request_end(struct fuse_req *req, bool from_timer_callback)
>>   {
>>          struct fuse_mount *fm = req->fm;
>>          struct fuse_conn *fc = fm->fc;
>>          struct fuse_iqueue *fiq = &fc->iq;
>>
>> +       if (from_timer_callback)
>> +               req->out.h.error = -ETIME;
>> +
>>          if (test_and_set_bit(FR_FINISHED, &req->flags))
>>                  goto put_request;
>>
>> @@ -296,8 +303,6 @@ void fuse_request_end(struct fuse_req *req)
>>                  list_del_init(&req->intr_entry);
>>                  spin_unlock(&fiq->lock);
>>          }
>> -       WARN_ON(test_bit(FR_PENDING, &req->flags));
>> -       WARN_ON(test_bit(FR_SENT, &req->flags));
>>          if (test_bit(FR_BACKGROUND, &req->flags)) {
>>                  spin_lock(&fc->bg_lock);
>>                  clear_bit(FR_BACKGROUND, &req->flags);
>> @@ -324,13 +329,105 @@ void fuse_request_end(struct fuse_req *req)
>>                  wake_up(&req->waitq);
>>          }
>>
>> +       if (!from_timer_callback && req->timer.function)
>> +               timer_delete_sync(&req->timer);
>> +
>>          if (test_bit(FR_ASYNC, &req->flags))
>>                  req->args->end(fm, req->args, req->out.h.error);
>>   put_request:
>>          fuse_put_request(req);
>>   }
>> +
>> +void fuse_request_end(struct fuse_req *req)
>> +{
>> +       WARN_ON(test_bit(FR_PENDING, &req->flags));
>> +       WARN_ON(test_bit(FR_SENT, &req->flags));
>> +
>> +       do_fuse_request_end(req, false);
>> +}
>>   EXPORT_SYMBOL_GPL(fuse_request_end);
>>
>> +static void timeout_inflight_req(struct fuse_req *req)
>> +{
>> +       struct fuse_conn *fc = req->fm->fc;
>> +       struct fuse_iqueue *fiq = &fc->iq;
>> +       struct fuse_pqueue *fpq;
>> +
>> +       spin_lock(&fiq->lock);
>> +       fpq = req->fpq;
>> +       spin_unlock(&fiq->lock);
>> +
>> +       /*
>> +        * If fpq has not been set yet, then the request is aborting (which
>> +        * clears FR_PENDING flag) before dev_do_read (which sets req->fpq)
>> +        * has been called. Let the abort handler handle this request.
>> +        */
>> +       if (!fpq)
>> +               return;
>> +
>> +       spin_lock(&fpq->lock);
>> +       if (!fpq->connected || req->out.h.error == -ECONNABORTED) {
>> +               /*
>> +                * Connection is being aborted or the fuse_dev is being released.
>> +                * The abort / release will clean up the request
>> +                */
>> +               spin_unlock(&fpq->lock);
>> +               return;
>> +       }
>> +
>> +       if (!test_bit(FR_PRIVATE, &req->flags))
>> +               list_del_init(&req->list);
>> +
>> +       spin_unlock(&fpq->lock);
>> +
>> +       do_fuse_request_end(req, true);
>> +}
>> +
>> +static void timeout_pending_req(struct fuse_req *req)
>> +{
>> +       struct fuse_conn *fc = req->fm->fc;
>> +       struct fuse_iqueue *fiq = &fc->iq;
>> +       bool background = test_bit(FR_BACKGROUND, &req->flags);
>> +
>> +       if (background)
>> +               spin_lock(&fc->bg_lock);
>> +       spin_lock(&fiq->lock);
>> +
>> +       if (!test_bit(FR_PENDING, &req->flags)) {
>> +               spin_unlock(&fiq->lock);
>> +               if (background)
>> +                       spin_unlock(&fc->bg_lock);
>> +               timeout_inflight_req(req);
>> +               return;
>> +       }
>> +
>> +       if (!test_bit(FR_PRIVATE, &req->flags))
>> +               list_del_init(&req->list);
>> +
>> +       spin_unlock(&fiq->lock);
>> +       if (background)
>> +               spin_unlock(&fc->bg_lock);
>> +
>> +       do_fuse_request_end(req, true);
>> +}
>> +
>> +static void fuse_request_timeout(struct timer_list *timer)
>> +{
>> +       struct fuse_req *req = container_of(timer, struct fuse_req, timer);
>> +
>> +       /*
>> +        * Request reply is being finished by the kernel right now.
>> +        * No need to time out the request.
>> +        */
>> +       if (test_and_set_bit(FR_FINISHING, &req->flags))
>> +               return;
>> +
>> +       if (test_bit(FR_PENDING, &req->flags))
>> +               timeout_pending_req(req);
>> +       else
>> +               timeout_inflight_req(req);
>> +}
>> +
>>   static int queue_interrupt(struct fuse_req *req)
>>   {
>>          struct fuse_iqueue *fiq = &req->fm->fc->iq;
>> @@ -409,7 +506,8 @@ static void request_wait_answer(struct fuse_req *req)
>>
>>   static void __fuse_request_send(struct fuse_req *req)
>>   {
>> -       struct fuse_iqueue *fiq = &req->fm->fc->iq;
>> +       struct fuse_conn *fc = req->fm->fc;
>> +       struct fuse_iqueue *fiq = &fc->iq;
>>
>>          BUG_ON(test_bit(FR_BACKGROUND, &req->flags));
>>          spin_lock(&fiq->lock);
>> @@ -421,6 +519,10 @@ static void __fuse_request_send(struct fuse_req *req)
>>                  /* acquire extra reference, since request is still needed
>>                     after fuse_request_end() */
>>                  __fuse_get_request(req);
>> +               if (req->timer.function) {
>> +                       req->timer.expires = jiffies + fc->req_timeout;
>> +                       add_timer(&req->timer);
>> +               }
>>                  queue_request_and_unlock(fiq, req);
>>
>>                  request_wait_answer(req);
>> @@ -539,6 +641,10 @@ static bool fuse_request_queue_background(struct fuse_req *req)
>>                  if (fc->num_background == fc->max_background)
>>                          fc->blocked = 1;
>>                  list_add_tail(&req->list, &fc->bg_queue);
>> +               if (req->timer.function) {
>> +                       req->timer.expires = jiffies + fc->req_timeout;
>> +                       add_timer(&req->timer);
>> +               }
>>                  flush_bg_queue(fc);
>>                  queued = true;
>>          }
>> @@ -1268,6 +1374,9 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
>>          req = list_entry(fiq->pending.next, struct fuse_req, list);
>>          clear_bit(FR_PENDING, &req->flags);
>>          list_del_init(&req->list);
>> +       /* Acquire a reference in case the timeout handler starts executing */
>> +       __fuse_get_request(req);
>> +       req->fpq = fpq;
>>          spin_unlock(&fiq->lock);
>>
>>          args = req->args;
>> @@ -1280,6 +1389,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
>>                  if (args->opcode == FUSE_SETXATTR)
>>                          req->out.h.error = -E2BIG;
>>                  fuse_request_end(req);
>> +               fuse_put_request(req);
>>                  goto restart;
> 
> While rereading through fuse_dev_do_read, I just realized we also need
> to handle the race condition for the error edge cases (here and in the
> "goto out_end;"), since the timeout handler could have finished
> executing by the time we hit the error edge case. We need to
> test_and_set_bit(FR_FINISHING) so that either the timeout_handler or
> dev_do_read cleans up the request, but not both. I'll fix this for v3.

I know it would change semantics a bit, but wouldn't it be much easier /
less racy if fuse_dev_do_read() would delete the timer when it takes a
request from fiq->pending and add it back in (with new timeouts) before
it returns the request?

Untested:

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 9992bc5f4469..444f667e2f43 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1379,6 +1379,15 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
         req->fpq = fpq;
         spin_unlock(&fiq->lock);

+       if (req->timer.function) {
+               /* request gets handled, remove the previous timeout */
+               timer_delete_sync(&req->timer);
+               if (test_bit(FR_FINISHED, &req->flags)) {
+                       fuse_put_request(req);
+                       goto restart;
+               }
+       }
+
         args = req->args;
         reqsize = req->in.h.len;

@@ -1433,24 +1442,10 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
         if (test_bit(FR_INTERRUPTED, &req->flags))
                 queue_interrupt(req);

-       /*
-        * Check if the timeout handler is running / ran. If it did, we need to
-        * remove the request from any lists in case the timeout handler finished
-        * before dev_do_read moved the request to the processing list.
-        *
-        * Check FR_SENT to distinguish whether the timeout or the write handler
-        * is finishing the request. However, there can be the case where the
-        * timeout handler and resend handler are running concurrently, so we
-        * need to also check the FR_PENDING bit.
-        */
-       if (test_bit(FR_FINISHING, &req->flags) &&
-           (test_bit(FR_SENT, &req->flags) || test_bit(FR_PENDING, &req->flags))) {
-               spin_lock(&fpq->lock);
-               if (!test_bit(FR_PRIVATE, &req->flags))
-                       list_del_init(&req->list);
-               spin_unlock(&fpq->lock);
-               fuse_put_request(req);
-               return -ETIME;
+       if (req->timer.function) {
+               /* re-arm the request */
+               req->timer.expires = jiffies + fc->req_timeout;
+               add_timer(&req->timer);
         }

         fuse_put_request(req);

Thanks,
Bernd

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/2] fuse: add optional kernel-enforced timeout for requests
  2024-08-05 13:26     ` Bernd Schubert
@ 2024-08-05 22:10       ` Joanne Koong
  2024-08-06 15:43         ` Bernd Schubert
  0 siblings, 1 reply; 34+ messages in thread
From: Joanne Koong @ 2024-08-05 22:10 UTC (permalink / raw)
  To: Bernd Schubert; +Cc: miklos, linux-fsdevel, josef, laoar.shao, kernel-team

On Mon, Aug 5, 2024 at 6:26 AM Bernd Schubert
<bernd.schubert@fastmail.fm> wrote:
>
>
>
> On 8/5/24 06:52, Joanne Koong wrote:
> > On Mon, Jul 29, 2024 at 5:28 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> >>
> >> There are situations where fuse servers can become unresponsive or take
> >> too long to reply to a request. Currently there is no upper bound on
> >> how long a request may take, which may be frustrating to users who get
> >> stuck waiting for a request to complete.
> >>
> >> This commit adds a timeout option (in seconds) for requests. If the
> >> timeout elapses before the server replies to the request, the request
> >> will fail with -ETIME.
> >>
> >> There are 3 possibilities for a request that times out:
> >> a) The request times out before the request has been sent to userspace
> >> b) The request times out after the request has been sent to userspace
> >> and before it receives a reply from the server
> >> c) The request times out after the request has been sent to userspace
> >> and the server replies while the kernel is timing out the request
> >>
> >> While a request timeout is being handled, there may be other handlers
> >> running at the same time if:
> >> a) the kernel is forwarding the request to the server
> >> b) the kernel is processing the server's reply to the request
> >> c) the request is being re-sent
> >> d) the connection is aborting
> >> e) the device is getting released
> >>
> >> Proper synchronization must be added to ensure that the request is
> >> handled correctly in all of these cases. To this effect, there is a new
> >> FR_FINISHING bit added to the request flags, which is set atomically by
> >> either the timeout handler (see fuse_request_timeout()) which is invoked
> >> after the request timeout elapses or set by the request reply handler
> >> (see dev_do_write()), whichever gets there first. If the reply handler
> >> and the timeout handler are executing simultaneously and the reply handler
> >> sets FR_FINISHING before the timeout handler, then the request will be
> >> handled as if the timeout did not elapse. If the timeout handler sets
> >> FR_FINISHING before the reply handler, then the request will fail with
> >> -ETIME and the request will be cleaned up.
> >>
> >> Currently, this is the refcount lifecycle of a request:
> >>
> >> Synchronous request is created:
> >> fuse_simple_request -> allocates request, sets refcount to 1
> >>    __fuse_request_send -> acquires refcount
> >>      queues request and waits for reply...
> >> fuse_simple_request -> drops refcount
> >>
> >> Background request is created:
> >> fuse_simple_background -> allocates request, sets refcount to 1
> >>
> >> Request is replied to:
> >> fuse_dev_do_write
> >>    fuse_request_end -> drops refcount on request
> >>
> >> Proper acquires on the request reference must be added to ensure that the
> >> timeout handler does not drop the last refcount on the request while
> >> other handlers may be operating on the request. Please note that the
> >> timeout handler may get invoked at any phase of the request's
> >> lifetime (eg before the request has been forwarded to userspace, etc).
> >>
> >> It is always guaranteed that there is a refcount on the request when the
> >> timeout handler is executing. The timeout handler will be either
> >> deactivated by the reply/abort/release handlers, or if the timeout
> >> handler is concurrently executing on another CPU, the reply/abort/release
> >> handlers will wait for the timeout handler to finish executing first before
> >> it drops the final refcount on the request.
> >>
> >> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> >> ---
> >>   fs/fuse/dev.c    | 187 +++++++++++++++++++++++++++++++++++++++++++++--
> >>   fs/fuse/fuse_i.h |  14 ++++
> >>   fs/fuse/inode.c  |   7 ++
> >>   3 files changed, 200 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> >> index 9eb191b5c4de..9992bc5f4469 100644
> >> --- a/fs/fuse/dev.c
> >> +++ b/fs/fuse/dev.c
> >> @@ -31,6 +31,8 @@ MODULE_ALIAS("devname:fuse");
> >>
> >>   static struct kmem_cache *fuse_req_cachep;
> >>
> >> +static void fuse_request_timeout(struct timer_list *timer);
> >> +
> >>   static struct fuse_dev *fuse_get_dev(struct file *file)
> >>   {
> >>          /*
> >> @@ -48,6 +50,8 @@ static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
> >>          refcount_set(&req->count, 1);
> >>          __set_bit(FR_PENDING, &req->flags);
> >>          req->fm = fm;
> >> +       if (fm->fc->req_timeout)
> >> +               timer_setup(&req->timer, fuse_request_timeout, 0);
> >>   }
> >>
> >>   static struct fuse_req *fuse_request_alloc(struct fuse_mount *fm, gfp_t flags)
> >> @@ -277,12 +281,15 @@ static void flush_bg_queue(struct fuse_conn *fc)
> >>    * the 'end' callback is called if given, else the reference to the
> >>    * request is released
> >>    */
> >> -void fuse_request_end(struct fuse_req *req)
> >> +static void do_fuse_request_end(struct fuse_req *req, bool from_timer_callback)
> >>   {
> >>          struct fuse_mount *fm = req->fm;
> >>          struct fuse_conn *fc = fm->fc;
> >>          struct fuse_iqueue *fiq = &fc->iq;
> >>
> >> +       if (from_timer_callback)
> >> +               req->out.h.error = -ETIME;
> >> +
> >>          if (test_and_set_bit(FR_FINISHED, &req->flags))
> >>                  goto put_request;
> >>
> >> @@ -296,8 +303,6 @@ void fuse_request_end(struct fuse_req *req)
> >>                  list_del_init(&req->intr_entry);
> >>                  spin_unlock(&fiq->lock);
> >>          }
> >> -       WARN_ON(test_bit(FR_PENDING, &req->flags));
> >> -       WARN_ON(test_bit(FR_SENT, &req->flags));
> >>          if (test_bit(FR_BACKGROUND, &req->flags)) {
> >>                  spin_lock(&fc->bg_lock);
> >>                  clear_bit(FR_BACKGROUND, &req->flags);
> >> @@ -324,13 +329,105 @@ void fuse_request_end(struct fuse_req *req)
> >>                  wake_up(&req->waitq);
> >>          }
> >>
> >> +       if (!from_timer_callback && req->timer.function)
> >> +               timer_delete_sync(&req->timer);
> >> +
> >>          if (test_bit(FR_ASYNC, &req->flags))
> >>                  req->args->end(fm, req->args, req->out.h.error);
> >>   put_request:
> >>          fuse_put_request(req);
> >>   }
> >> +
> >> +void fuse_request_end(struct fuse_req *req)
> >> +{
> >> +       WARN_ON(test_bit(FR_PENDING, &req->flags));
> >> +       WARN_ON(test_bit(FR_SENT, &req->flags));
> >> +
> >> +       do_fuse_request_end(req, false);
> >> +}
> >>   EXPORT_SYMBOL_GPL(fuse_request_end);
> >>
> >> +static void timeout_inflight_req(struct fuse_req *req)
> >> +{
> >> +       struct fuse_conn *fc = req->fm->fc;
> >> +       struct fuse_iqueue *fiq = &fc->iq;
> >> +       struct fuse_pqueue *fpq;
> >> +
> >> +       spin_lock(&fiq->lock);
> >> +       fpq = req->fpq;
> >> +       spin_unlock(&fiq->lock);
> >> +
> >> +       /*
> >> +        * If fpq has not been set yet, then the request is aborting (which
> >> +        * clears FR_PENDING flag) before dev_do_read (which sets req->fpq)
> >> +        * has been called. Let the abort handler handle this request.
> >> +        */
> >> +       if (!fpq)
> >> +               return;
> >> +
> >> +       spin_lock(&fpq->lock);
> >> +       if (!fpq->connected || req->out.h.error == -ECONNABORTED) {
> >> +               /*
> >> +                * Connection is being aborted or the fuse_dev is being released.
> >> +                * The abort / release will clean up the request
> >> +                */
> >> +               spin_unlock(&fpq->lock);
> >> +               return;
> >> +       }
> >> +
> >> +       if (!test_bit(FR_PRIVATE, &req->flags))
> >> +               list_del_init(&req->list);
> >> +
> >> +       spin_unlock(&fpq->lock);
> >> +
> >> +       do_fuse_request_end(req, true);
> >> +}
> >> +
> >> +static void timeout_pending_req(struct fuse_req *req)
> >> +{
> >> +       struct fuse_conn *fc = req->fm->fc;
> >> +       struct fuse_iqueue *fiq = &fc->iq;
> >> +       bool background = test_bit(FR_BACKGROUND, &req->flags);
> >> +
> >> +       if (background)
> >> +               spin_lock(&fc->bg_lock);
> >> +       spin_lock(&fiq->lock);
> >> +
> >> +       if (!test_bit(FR_PENDING, &req->flags)) {
> >> +               spin_unlock(&fiq->lock);
> >> +               if (background)
> >> +                       spin_unlock(&fc->bg_lock);
> >> +               timeout_inflight_req(req);
> >> +               return;
> >> +       }
> >> +
> >> +       if (!test_bit(FR_PRIVATE, &req->flags))
> >> +               list_del_init(&req->list);
> >> +
> >> +       spin_unlock(&fiq->lock);
> >> +       if (background)
> >> +               spin_unlock(&fc->bg_lock);
> >> +
> >> +       do_fuse_request_end(req, true);
> >> +}
> >> +
> >> +static void fuse_request_timeout(struct timer_list *timer)
> >> +{
> >> +       struct fuse_req *req = container_of(timer, struct fuse_req, timer);
> >> +
> >> +       /*
> >> +        * Request reply is being finished by the kernel right now.
> >> +        * No need to time out the request.
> >> +        */
> >> +       if (test_and_set_bit(FR_FINISHING, &req->flags))
> >> +               return;
> >> +
> >> +       if (test_bit(FR_PENDING, &req->flags))
> >> +               timeout_pending_req(req);
> >> +       else
> >> +               timeout_inflight_req(req);
> >> +}
> >> +
> >>   static int queue_interrupt(struct fuse_req *req)
> >>   {
> >>          struct fuse_iqueue *fiq = &req->fm->fc->iq;
> >> @@ -409,7 +506,8 @@ static void request_wait_answer(struct fuse_req *req)
> >>
> >>   static void __fuse_request_send(struct fuse_req *req)
> >>   {
> >> -       struct fuse_iqueue *fiq = &req->fm->fc->iq;
> >> +       struct fuse_conn *fc = req->fm->fc;
> >> +       struct fuse_iqueue *fiq = &fc->iq;
> >>
> >>          BUG_ON(test_bit(FR_BACKGROUND, &req->flags));
> >>          spin_lock(&fiq->lock);
> >> @@ -421,6 +519,10 @@ static void __fuse_request_send(struct fuse_req *req)
> >>                  /* acquire extra reference, since request is still needed
> >>                     after fuse_request_end() */
> >>                  __fuse_get_request(req);
> >> +               if (req->timer.function) {
> >> +                       req->timer.expires = jiffies + fc->req_timeout;
> >> +                       add_timer(&req->timer);
> >> +               }
> >>                  queue_request_and_unlock(fiq, req);
> >>
> >>                  request_wait_answer(req);
> >> @@ -539,6 +641,10 @@ static bool fuse_request_queue_background(struct fuse_req *req)
> >>                  if (fc->num_background == fc->max_background)
> >>                          fc->blocked = 1;
> >>                  list_add_tail(&req->list, &fc->bg_queue);
> >> +               if (req->timer.function) {
> >> +                       req->timer.expires = jiffies + fc->req_timeout;
> >> +                       add_timer(&req->timer);
> >> +               }
> >>                  flush_bg_queue(fc);
> >>                  queued = true;
> >>          }
> >> @@ -1268,6 +1374,9 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
> >>          req = list_entry(fiq->pending.next, struct fuse_req, list);
> >>          clear_bit(FR_PENDING, &req->flags);
> >>          list_del_init(&req->list);
> >> +       /* Acquire a reference in case the timeout handler starts executing */
> >> +       __fuse_get_request(req);
> >> +       req->fpq = fpq;
> >>          spin_unlock(&fiq->lock);
> >>
> >>          args = req->args;
> >> @@ -1280,6 +1389,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
> >>                  if (args->opcode == FUSE_SETXATTR)
> >>                          req->out.h.error = -E2BIG;
> >>                  fuse_request_end(req);
> >> +               fuse_put_request(req);
> >>                  goto restart;
> >
> > While rereading through fuse_dev_do_read, I just realized we also need
> > to handle the race condition for the error edge cases (here and in the
> > "goto out_end;"), since the timeout handler could have finished
> > executing by the time we hit the error edge case. We need to
> > test_and_set_bit(FR_FINISHING) so that either the timeout_handler or
> > dev_do_read cleans up the request, but not both. I'll fix this for v3.
>
> I know it would change semantics a bit, but wouldn't it be much easier /
> less racy if fuse_dev_do_read() would delete the timer when it takes a
> request from fiq->pending and add it back in (with new timeouts) before
> it returns the request?
>

Ooo I really like this idea! I'm worried though that this might allow
potential scenarios where the fuse_dev_do_read gets descheduled after
disarming the timer and a non-trivial amount of time elapses before it
gets scheduled back (eg on a system where the CPU is starved), in
which case the fuse req_timeout value will be (somewhat of) a lie. If
you and others think this is likely fine though, then I'll incorporate
this into v3 which will make this logic a lot simpler :)


Thanks,
Joanne

> Untested:
>
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index 9992bc5f4469..444f667e2f43 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -1379,6 +1379,15 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
>          req->fpq = fpq;
>          spin_unlock(&fiq->lock);
>
> +       if (req->timer.function) {
> +               /* request gets handled, remove the previous timeout */
> +               timer_delete_sync(&req->timer);
> +               if (test_bit(FR_FINISHED, &req->flags)) {
> +                       fuse_put_request(req);
> +                       goto restart;
> +               }
> +       }
> +
>          args = req->args;
>          reqsize = req->in.h.len;
>
> @@ -1433,24 +1442,10 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
>          if (test_bit(FR_INTERRUPTED, &req->flags))
>                  queue_interrupt(req);
>
> -       /*
> -        * Check if the timeout handler is running / ran. If it did, we need to
> -        * remove the request from any lists in case the timeout handler finished
> -        * before dev_do_read moved the request to the processing list.
> -        *
> -        * Check FR_SENT to distinguish whether the timeout or the write handler
> -        * is finishing the request. However, there can be the case where the
> -        * timeout handler and resend handler are running concurrently, so we
> -        * need to also check the FR_PENDING bit.
> -        */
> -       if (test_bit(FR_FINISHING, &req->flags) &&
> -           (test_bit(FR_SENT, &req->flags) || test_bit(FR_PENDING, &req->flags))) {
> -               spin_lock(&fpq->lock);
> -               if (!test_bit(FR_PRIVATE, &req->flags))
> -                       list_del_init(&req->list);
> -               spin_unlock(&fpq->lock);
> -               fuse_put_request(req);
> -               return -ETIME;
> +       if (req->timer.function) {
> +               /* re-arm the request */
> +               req->timer.expires = jiffies + fc->req_timeout;
> +               add_timer(&req->timer);
>          }
>
>          fuse_put_request(req);
>
> Thanks,
> Bernd

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/2] fuse: add optional kernel-enforced timeout for requests
  2024-08-05  7:32   ` Jingbo Xu
@ 2024-08-05 22:53     ` Joanne Koong
  2024-08-06  2:45       ` Jingbo Xu
  2024-08-06 15:50       ` Bernd Schubert
  0 siblings, 2 replies; 34+ messages in thread
From: Joanne Koong @ 2024-08-05 22:53 UTC (permalink / raw)
  To: Jingbo Xu
  Cc: miklos, linux-fsdevel, josef, bernd.schubert, laoar.shao,
	kernel-team

On Mon, Aug 5, 2024 at 12:32 AM Jingbo Xu <jefflexu@linux.alibaba.com> wrote:
>
>
>
> On 7/30/24 8:23 AM, Joanne Koong wrote:
> > There are situations where fuse servers can become unresponsive or take
> > too long to reply to a request. Currently there is no upper bound on
> > how long a request may take, which may be frustrating to users who get
> > stuck waiting for a request to complete.
> >
> > This commit adds a timeout option (in seconds) for requests. If the
> > timeout elapses before the server replies to the request, the request
> > will fail with -ETIME.
> >
> > There are 3 possibilities for a request that times out:
> > a) The request times out before the request has been sent to userspace
> > b) The request times out after the request has been sent to userspace
> > and before it receives a reply from the server
> > c) The request times out after the request has been sent to userspace
> > and the server replies while the kernel is timing out the request
> >
> > While a request timeout is being handled, there may be other handlers
> > running at the same time if:
> > a) the kernel is forwarding the request to the server
> > b) the kernel is processing the server's reply to the request
> > c) the request is being re-sent
> > d) the connection is aborting
> > e) the device is getting released
> >
> > Proper synchronization must be added to ensure that the request is
> > handled correctly in all of these cases. To this effect, there is a new
> > FR_FINISHING bit added to the request flags, which is set atomically by
> > either the timeout handler (see fuse_request_timeout()) which is invoked
> > after the request timeout elapses or set by the request reply handler
> > (see dev_do_write()), whichever gets there first. If the reply handler
> > and the timeout handler are executing simultaneously and the reply handler
> > sets FR_FINISHING before the timeout handler, then the request will be
> > handled as if the timeout did not elapse. If the timeout handler sets
> > FR_FINISHING before the reply handler, then the request will fail with
> > -ETIME and the request will be cleaned up.
> >
> > Currently, this is the refcount lifecycle of a request:
> >
> > Synchronous request is created:
> > fuse_simple_request -> allocates request, sets refcount to 1
> >   __fuse_request_send -> acquires refcount
> >     queues request and waits for reply...
> > fuse_simple_request -> drops refcount
> >
> > Background request is created:
> > fuse_simple_background -> allocates request, sets refcount to 1
> >
> > Request is replied to:
> > fuse_dev_do_write
> >   fuse_request_end -> drops refcount on request
> >
> > Proper acquires on the request reference must be added to ensure that the
> > timeout handler does not drop the last refcount on the request while
> > other handlers may be operating on the request. Please note that the
> > timeout handler may get invoked at any phase of the request's
> > lifetime (eg before the request has been forwarded to userspace, etc).
> >
> > It is always guaranteed that there is a refcount on the request when the
> > timeout handler is executing. The timeout handler will be either
> > deactivated by the reply/abort/release handlers, or if the timeout
> > handler is concurrently executing on another CPU, the reply/abort/release
> > handlers will wait for the timeout handler to finish executing first before
> > it drops the final refcount on the request.
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > ---
> >  fs/fuse/dev.c    | 187 +++++++++++++++++++++++++++++++++++++++++++++--
> >  fs/fuse/fuse_i.h |  14 ++++
> >  fs/fuse/inode.c  |   7 ++
> >  3 files changed, 200 insertions(+), 8 deletions(-)
> >
> > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> > index 9eb191b5c4de..9992bc5f4469 100644
> > --- a/fs/fuse/dev.c
> > +++ b/fs/fuse/dev.c
> > @@ -31,6 +31,8 @@ MODULE_ALIAS("devname:fuse");
> >
> >  static struct kmem_cache *fuse_req_cachep;
> >
> > +static void fuse_request_timeout(struct timer_list *timer);
> > +
> >  static struct fuse_dev *fuse_get_dev(struct file *file)
> >  {
> >       /*
> > @@ -48,6 +50,8 @@ static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
> >       refcount_set(&req->count, 1);
> >       __set_bit(FR_PENDING, &req->flags);
> >       req->fm = fm;
> > +     if (fm->fc->req_timeout)
> > +             timer_setup(&req->timer, fuse_request_timeout, 0);
> >  }
> >
> >  static struct fuse_req *fuse_request_alloc(struct fuse_mount *fm, gfp_t flags)
> > @@ -277,12 +281,15 @@ static void flush_bg_queue(struct fuse_conn *fc)
> >   * the 'end' callback is called if given, else the reference to the
> >   * request is released
> >   */
> > -void fuse_request_end(struct fuse_req *req)
> > +static void do_fuse_request_end(struct fuse_req *req, bool from_timer_callback)
> >  {
> >       struct fuse_mount *fm = req->fm;
> >       struct fuse_conn *fc = fm->fc;
> >       struct fuse_iqueue *fiq = &fc->iq;
> >
> > +     if (from_timer_callback)
> > +             req->out.h.error = -ETIME;
> > +
>
> FMHO, could we move the above error assignment up to the caller to make
> do_fuse_request_end() look cleaner?

Sure, I was thinking that it looks cleaner setting this in
do_fuse_request_end() instead of having to set it in both
timeout_pending_req() and timeout_inflight_req(), but I see your point
as well.
I'll make this change in v3.

>
>
> >       if (test_and_set_bit(FR_FINISHED, &req->flags))
> >               goto put_request;
> >
> > @@ -296,8 +303,6 @@ void fuse_request_end(struct fuse_req *req)
> >               list_del_init(&req->intr_entry);
> >               spin_unlock(&fiq->lock);
> >       }
> > -     WARN_ON(test_bit(FR_PENDING, &req->flags));
> > -     WARN_ON(test_bit(FR_SENT, &req->flags));
> >       if (test_bit(FR_BACKGROUND, &req->flags)) {
> >               spin_lock(&fc->bg_lock);
> >               clear_bit(FR_BACKGROUND, &req->flags);
> > @@ -324,13 +329,105 @@ void fuse_request_end(struct fuse_req *req)
> >               wake_up(&req->waitq);
> >       }
> >
> > +     if (!from_timer_callback && req->timer.function)
> > +             timer_delete_sync(&req->timer);
> > +
>
> Similarly, move the caller i.e. fuse_request_end() call
> timer_delete_sync() instead?

I don't think we can do that because the fuse_put_request() at the end
of this function often holds the last refcount on the request which
frees the request when it releases the ref.

>
>
> >       if (test_bit(FR_ASYNC, &req->flags))
> >               req->args->end(fm, req->args, req->out.h.error);
> >  put_request:
> >       fuse_put_request(req);
> >  }
> > +
> > +void fuse_request_end(struct fuse_req *req)
> > +{
> > +     WARN_ON(test_bit(FR_PENDING, &req->flags));
> > +     WARN_ON(test_bit(FR_SENT, &req->flags));
> > +
> > +     do_fuse_request_end(req, false);
> > +}
> >  EXPORT_SYMBOL_GPL(fuse_request_end);
> >
> > +static void timeout_inflight_req(struct fuse_req *req)
> > +{
> > +     struct fuse_conn *fc = req->fm->fc;
> > +     struct fuse_iqueue *fiq = &fc->iq;
> > +     struct fuse_pqueue *fpq;
> > +
> > +     spin_lock(&fiq->lock);
> > +     fpq = req->fpq;
> > +     spin_unlock(&fiq->lock);
> > +
> > +     /*
> > +      * If fpq has not been set yet, then the request is aborting (which
> > +      * clears FR_PENDING flag) before dev_do_read (which sets req->fpq)
> > +      * has been called. Let the abort handler handle this request.
> > +      */
> > +     if (!fpq)
> > +             return;
> > +
> > +     spin_lock(&fpq->lock);
> > +     if (!fpq->connected || req->out.h.error == -ECONNABORTED) {
> > +             /*
> > +              * Connection is being aborted or the fuse_dev is being released.
> > +              * The abort / release will clean up the request
> > +              */
> > +             spin_unlock(&fpq->lock);
> > +             return;
> > +     }
> > +
> > +     if (!test_bit(FR_PRIVATE, &req->flags))
> > +             list_del_init(&req->list);
> > +
> > +     spin_unlock(&fpq->lock);
> > +
> > +     do_fuse_request_end(req, true);
> > +}
> > +
> > +static void timeout_pending_req(struct fuse_req *req)
> > +{
> > +     struct fuse_conn *fc = req->fm->fc;
> > +     struct fuse_iqueue *fiq = &fc->iq;
> > +     bool background = test_bit(FR_BACKGROUND, &req->flags);
> > +
> > +     if (background)
> > +             spin_lock(&fc->bg_lock);
>
> Just out of curious, why fc->bg_lock is needed here, which makes the
> code look less clean?

The fc->bg_lock is needed because the background request may still be
on the fc->bg_queue when the request times out (eg the request hasn't
been flushed yet). We need to acquire the fc->bg_lock so that we can
delete it from the queue, in case somehting else is modifying the
queue at the same time.

>
>
> > +     spin_lock(&fiq->lock);
> > +
> > +     if (!test_bit(FR_PENDING, &req->flags)) {
> > +             spin_unlock(&fiq->lock);
> > +             if (background)
> > +                     spin_unlock(&fc->bg_lock);
> > +             timeout_inflight_req(req);
> > +             return;
> > +     }
> > +
> > +     if (!test_bit(FR_PRIVATE, &req->flags))
> > +             list_del_init(&req->list);
> > +
> > +     spin_unlock(&fiq->lock);
> > +     if (background)
> > +             spin_unlock(&fc->bg_lock);
> > +
> > +     do_fuse_request_end(req, true);
>
> I'm not sure if special handling for requests in fpq->io list in needed
> here.  At least when connection is aborted, thos LOCKED requests in
> fpq->io list won't be finished instantly until they are unlocked.
>

The places where FR_LOCKED gets set on the request are in
fuse_dev_do_write and fuse_dev_do_read when we do some of the page
copying stuff. In both those functions, this timeout_pending_req()
path isn't hit while we have the lock obtained - in fuse_dev_do_write,
we test and set FR_FINISHING first before doing the page copying (the
timeout handler will return before calling timeout_pending_req()), and
in fuse_dev_do_read, the locking is called after the FR_PENDING flag
has been cleared.

I think there is a possibility that the timeout handler executes
timeout_inflight_req() while the lock is obtained in fuse_dev_do_read
during the page copying, but this patch added an extra
__fuse_get_request() on the request before doing the page copying,
which means the timeout handler will not free out the request while
the lock is held and the page copying is being done.


Thanks,
Joanne
>
>
> --
> Thanks,
> Jingbo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 2/2] fuse: add default_request_timeout and max_request_timeout sysctls
  2024-08-05  7:38   ` Jingbo Xu
@ 2024-08-06  1:26     ` Joanne Koong
  0 siblings, 0 replies; 34+ messages in thread
From: Joanne Koong @ 2024-08-06  1:26 UTC (permalink / raw)
  To: Jingbo Xu
  Cc: miklos, linux-fsdevel, josef, bernd.schubert, laoar.shao,
	kernel-team

On Mon, Aug 5, 2024 at 12:38 AM Jingbo Xu <jefflexu@linux.alibaba.com> wrote:
>
>
>
> On 7/30/24 8:23 AM, Joanne Koong wrote:
> > Introduce two new sysctls, "default_request_timeout" and
> > "max_request_timeout". These control timeouts on replies by the
> > server to kernel-issued fuse requests.
> >
> > "default_request_timeout" sets a timeout if no timeout is specified by
> > the fuse server on mount. 0 (default) indicates no timeout should be enforced.
> >
> > "max_request_timeout" sets a maximum timeout for fuse requests. If the
> > fuse server attempts to set a timeout greater than max_request_timeout,
> > the system will default to max_request_timeout. Similarly, if the max
> > default timeout is greater than the max request timeout, the system will
> > default to the max request timeout. 0 (default) indicates no timeout should
> > be enforced.
> >
> > $ sysctl -a | grep fuse
> > fs.fuse.default_request_timeout = 0
> > fs.fuse.max_request_timeout = 0
> >
> > $ echo 0x100000000 | sudo tee /proc/sys/fs/fuse/default_request_timeout
> > tee: /proc/sys/fs/fuse/default_request_timeout: Invalid argument
> >
> > $ echo 0xFFFFFFFF | sudo tee /proc/sys/fs/fuse/default_request_timeout
> > 0xFFFFFFFF
> >
> > $ sysctl -a | grep fuse
> > fs.fuse.default_request_timeout = 4294967295
> > fs.fuse.max_request_timeout = 0
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
>
> Why do we introduce a new sysfs knob for fuse instead of putting it
> under fusectl?  Though we also encounter some issues with fusectl since
> fusectl doesn't work well in container scenarios as it doesn't support
> FS_USERNS_MOUNT.
>

I think having these constraints configured via sysctl offers more
flexibility (eg if we want to enforce the max or default request
timeout across the board for all incoming connections on the system),
whereas I believe fusectl is on a per-connection basis.

>
> --
> Thanks,
> Jingbo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/2] fuse: add optional kernel-enforced timeout for requests
  2024-08-05 22:53     ` Joanne Koong
@ 2024-08-06  2:45       ` Jingbo Xu
  2024-08-06 16:43         ` Joanne Koong
  2024-08-06 15:50       ` Bernd Schubert
  1 sibling, 1 reply; 34+ messages in thread
From: Jingbo Xu @ 2024-08-06  2:45 UTC (permalink / raw)
  To: Joanne Koong
  Cc: miklos, linux-fsdevel, josef, bernd.schubert, laoar.shao,
	kernel-team



On 8/6/24 6:53 AM, Joanne Koong wrote:
> On Mon, Aug 5, 2024 at 12:32 AM Jingbo Xu <jefflexu@linux.alibaba.com> wrote:
>>
>>
>>
>> On 7/30/24 8:23 AM, Joanne Koong wrote:
>>> There are situations where fuse servers can become unresponsive or take
>>> too long to reply to a request. Currently there is no upper bound on
>>> how long a request may take, which may be frustrating to users who get
>>> stuck waiting for a request to complete.
>>>
>>> This commit adds a timeout option (in seconds) for requests. If the
>>> timeout elapses before the server replies to the request, the request
>>> will fail with -ETIME.
>>>
>>> There are 3 possibilities for a request that times out:
>>> a) The request times out before the request has been sent to userspace
>>> b) The request times out after the request has been sent to userspace
>>> and before it receives a reply from the server
>>> c) The request times out after the request has been sent to userspace
>>> and the server replies while the kernel is timing out the request
>>>
>>> While a request timeout is being handled, there may be other handlers
>>> running at the same time if:
>>> a) the kernel is forwarding the request to the server
>>> b) the kernel is processing the server's reply to the request
>>> c) the request is being re-sent
>>> d) the connection is aborting
>>> e) the device is getting released
>>>
>>> Proper synchronization must be added to ensure that the request is
>>> handled correctly in all of these cases. To this effect, there is a new
>>> FR_FINISHING bit added to the request flags, which is set atomically by
>>> either the timeout handler (see fuse_request_timeout()) which is invoked
>>> after the request timeout elapses or set by the request reply handler
>>> (see dev_do_write()), whichever gets there first. If the reply handler
>>> and the timeout handler are executing simultaneously and the reply handler
>>> sets FR_FINISHING before the timeout handler, then the request will be
>>> handled as if the timeout did not elapse. If the timeout handler sets
>>> FR_FINISHING before the reply handler, then the request will fail with
>>> -ETIME and the request will be cleaned up.
>>>
>>> Currently, this is the refcount lifecycle of a request:
>>>
>>> Synchronous request is created:
>>> fuse_simple_request -> allocates request, sets refcount to 1
>>>   __fuse_request_send -> acquires refcount
>>>     queues request and waits for reply...
>>> fuse_simple_request -> drops refcount
>>>
>>> Background request is created:
>>> fuse_simple_background -> allocates request, sets refcount to 1
>>>
>>> Request is replied to:
>>> fuse_dev_do_write
>>>   fuse_request_end -> drops refcount on request
>>>
>>> Proper acquires on the request reference must be added to ensure that the
>>> timeout handler does not drop the last refcount on the request while
>>> other handlers may be operating on the request. Please note that the
>>> timeout handler may get invoked at any phase of the request's
>>> lifetime (eg before the request has been forwarded to userspace, etc).
>>>
>>> It is always guaranteed that there is a refcount on the request when the
>>> timeout handler is executing. The timeout handler will be either
>>> deactivated by the reply/abort/release handlers, or if the timeout
>>> handler is concurrently executing on another CPU, the reply/abort/release
>>> handlers will wait for the timeout handler to finish executing first before
>>> it drops the final refcount on the request.
>>>
>>> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
>>> ---
>>>  fs/fuse/dev.c    | 187 +++++++++++++++++++++++++++++++++++++++++++++--
>>>  fs/fuse/fuse_i.h |  14 ++++
>>>  fs/fuse/inode.c  |   7 ++
>>>  3 files changed, 200 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
>>> index 9eb191b5c4de..9992bc5f4469 100644
>>> --- a/fs/fuse/dev.c
>>> +++ b/fs/fuse/dev.c
>>> @@ -31,6 +31,8 @@ MODULE_ALIAS("devname:fuse");
>>>
>>>  static struct kmem_cache *fuse_req_cachep;
>>>
>>> +static void fuse_request_timeout(struct timer_list *timer);
>>> +
>>>  static struct fuse_dev *fuse_get_dev(struct file *file)
>>>  {
>>>       /*
>>> @@ -48,6 +50,8 @@ static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
>>>       refcount_set(&req->count, 1);
>>>       __set_bit(FR_PENDING, &req->flags);
>>>       req->fm = fm;
>>> +     if (fm->fc->req_timeout)
>>> +             timer_setup(&req->timer, fuse_request_timeout, 0);
>>>  }
>>>
>>>  static struct fuse_req *fuse_request_alloc(struct fuse_mount *fm, gfp_t flags)
>>> @@ -277,12 +281,15 @@ static void flush_bg_queue(struct fuse_conn *fc)
>>>   * the 'end' callback is called if given, else the reference to the
>>>   * request is released
>>>   */
>>> -void fuse_request_end(struct fuse_req *req)
>>> +static void do_fuse_request_end(struct fuse_req *req, bool from_timer_callback)
>>>  {
>>>       struct fuse_mount *fm = req->fm;
>>>       struct fuse_conn *fc = fm->fc;
>>>       struct fuse_iqueue *fiq = &fc->iq;
>>>
>>> +     if (from_timer_callback)
>>> +             req->out.h.error = -ETIME;
>>> +
>>
>> FMHO, could we move the above error assignment up to the caller to make
>> do_fuse_request_end() look cleaner?
> 
> Sure, I was thinking that it looks cleaner setting this in
> do_fuse_request_end() instead of having to set it in both
> timeout_pending_req() and timeout_inflight_req(), but I see your point
> as well.
> I'll make this change in v3.
> 
>>
>>
>>>       if (test_and_set_bit(FR_FINISHED, &req->flags))
>>>               goto put_request;
>>>
>>> @@ -296,8 +303,6 @@ void fuse_request_end(struct fuse_req *req)
>>>               list_del_init(&req->intr_entry);
>>>               spin_unlock(&fiq->lock);
>>>       }
>>> -     WARN_ON(test_bit(FR_PENDING, &req->flags));
>>> -     WARN_ON(test_bit(FR_SENT, &req->flags));
>>>       if (test_bit(FR_BACKGROUND, &req->flags)) {
>>>               spin_lock(&fc->bg_lock);
>>>               clear_bit(FR_BACKGROUND, &req->flags);
>>> @@ -324,13 +329,105 @@ void fuse_request_end(struct fuse_req *req)
>>>               wake_up(&req->waitq);
>>>       }
>>>
>>> +     if (!from_timer_callback && req->timer.function)
>>> +             timer_delete_sync(&req->timer);
>>> +
>>
>> Similarly, move the caller i.e. fuse_request_end() call
>> timer_delete_sync() instead?
> 
> I don't think we can do that because the fuse_put_request() at the end
> of this function often holds the last refcount on the request which
> frees the request when it releases the ref.

Initially I mean timer_delete_sync() could be called before
do_fuse_request_end() inside fuse_request_end().  But anyway it's a
rough idea just for making the code look cleaner, without thinking if
this logic change is right or not.


>>> +static void timeout_pending_req(struct fuse_req *req)
>>> +{
>>> +     struct fuse_conn *fc = req->fm->fc;
>>> +     struct fuse_iqueue *fiq = &fc->iq;
>>> +     bool background = test_bit(FR_BACKGROUND, &req->flags);
>>> +
>>> +     if (background)
>>> +             spin_lock(&fc->bg_lock);
>>
>> Just out of curious, why fc->bg_lock is needed here, which makes the
>> code look less clean?
> 
> The fc->bg_lock is needed because the background request may still be
> on the fc->bg_queue when the request times out (eg the request hasn't
> been flushed yet). We need to acquire the fc->bg_lock so that we can
> delete it from the queue, in case somehting else is modifying the
> queue at the same time.

I can understand now.  Thanks!

> 
>>
>>
>>> +     spin_lock(&fiq->lock);
>>> +
>>> +     if (!test_bit(FR_PENDING, &req->flags)) {
>>> +             spin_unlock(&fiq->lock);
>>> +             if (background)
>>> +                     spin_unlock(&fc->bg_lock);
>>> +             timeout_inflight_req(req);
>>> +             return;
>>> +     }
>>> +
>>> +     if (!test_bit(FR_PRIVATE, &req->flags))
>>> +             list_del_init(&req->list);
>>> +
>>> +     spin_unlock(&fiq->lock);
>>> +     if (background)
>>> +             spin_unlock(&fc->bg_lock);
>>> +
>>> +     do_fuse_request_end(req, true);
>>
>> I'm not sure if special handling for requests in fpq->io list in needed
>> here.  At least when connection is aborted, thos LOCKED requests in
>> fpq->io list won't be finished instantly until they are unlocked.
>>
> 
> The places where FR_LOCKED gets set on the request are in
> fuse_dev_do_write and fuse_dev_do_read when we do some of the page
> copying stuff. In both those functions, this timeout_pending_req()
> path isn't hit while we have the lock obtained - in fuse_dev_do_write,
> we test and set FR_FINISHING first before doing the page copying (the
> timeout handler will return before calling timeout_pending_req()), and
> in fuse_dev_do_read, the locking is called after the FR_PENDING flag
> has been cleared.
> 
> I think there is a possibility that the timeout handler executes
> timeout_inflight_req() while the lock is obtained in fuse_dev_do_read
> during the page copying, but this patch added an extra
> __fuse_get_request() on the request before doing the page copying,
> which means the timeout handler will not free out the request while
> the lock is held and the page copying is being done.
> 

Yes, this is the only possible place where the timeout handler could
concurrently run while the request is in copying state (i.e. LOCKED).
As I described above, when connection is aborted, the LOCKED requests
will be left there and won't be finished until they are unlocked.  I'm
not sure why this special handling is needed for LOCKED requests, but I
guess it's not because of UAF issue.  From the comment of
lock_request(), "Up to the next unlock_request() there mustn't be
anything that could cause a page-fault.", though I can't understand in
which case there will be a page fault and it will be an issue.

Maybe I'm wrong.  It would be helpful if someone could shed light on this.


-- 
Thanks,
Jingbo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/2] fuse: add optional kernel-enforced timeout for requests
  2024-08-05 22:10       ` Joanne Koong
@ 2024-08-06 15:43         ` Bernd Schubert
  2024-08-06 17:08           ` Joanne Koong
  0 siblings, 1 reply; 34+ messages in thread
From: Bernd Schubert @ 2024-08-06 15:43 UTC (permalink / raw)
  To: Joanne Koong; +Cc: miklos, linux-fsdevel, josef, laoar.shao, kernel-team



On 8/6/24 00:10, Joanne Koong wrote:
> On Mon, Aug 5, 2024 at 6:26 AM Bernd Schubert
> <bernd.schubert@fastmail.fm> wrote:
>>>> @@ -1280,6 +1389,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
>>>>                  if (args->opcode == FUSE_SETXATTR)
>>>>                          req->out.h.error = -E2BIG;
>>>>                  fuse_request_end(req);
>>>> +               fuse_put_request(req);
>>>>                  goto restart;
>>>
>>> While rereading through fuse_dev_do_read, I just realized we also need
>>> to handle the race condition for the error edge cases (here and in the
>>> "goto out_end;"), since the timeout handler could have finished
>>> executing by the time we hit the error edge case. We need to
>>> test_and_set_bit(FR_FINISHING) so that either the timeout_handler or
>>> dev_do_read cleans up the request, but not both. I'll fix this for v3.
>>
>> I know it would change semantics a bit, but wouldn't it be much easier /
>> less racy if fuse_dev_do_read() would delete the timer when it takes a
>> request from fiq->pending and add it back in (with new timeouts) before
>> it returns the request?
>>
> 
> Ooo I really like this idea! I'm worried though that this might allow
> potential scenarios where the fuse_dev_do_read gets descheduled after
> disarming the timer and a non-trivial amount of time elapses before it
> gets scheduled back (eg on a system where the CPU is starved), in
> which case the fuse req_timeout value will be (somewhat of) a lie. If
> you and others think this is likely fine though, then I'll incorporate
> this into v3 which will make this logic a lot simpler :)
> 

In my opinion we only need to worry about fuse server getting stuck. I
think we would have a grave issue if fuse_dev_do_read() gets descheduled
for a long time - the timer might not run either in that case. Main
issue I see with removing/re-adding the timer - it might double the
timeout in worst case. In my personal opinion acceptable as it reduces
code complexity.


Thanks
Bernd

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/2] fuse: add optional kernel-enforced timeout for requests
  2024-08-05 22:53     ` Joanne Koong
  2024-08-06  2:45       ` Jingbo Xu
@ 2024-08-06 15:50       ` Bernd Schubert
  1 sibling, 0 replies; 34+ messages in thread
From: Bernd Schubert @ 2024-08-06 15:50 UTC (permalink / raw)
  To: Joanne Koong, Jingbo Xu
  Cc: miklos, linux-fsdevel, josef, laoar.shao, kernel-team



On 8/6/24 00:53, Joanne Koong wrote:
> On Mon, Aug 5, 2024 at 12:32 AM Jingbo Xu <jefflexu@linux.alibaba.com> wrote:
>>
>>
>>
>> On 7/30/24 8:23 AM, Joanne Koong wrote:
>>> There are situations where fuse servers can become unresponsive or take
>>> too long to reply to a request. Currently there is no upper bound on
>>> how long a request may take, which may be frustrating to users who get
>>> stuck waiting for a request to complete.
>>>
>>> This commit adds a timeout option (in seconds) for requests. If the
>>> timeout elapses before the server replies to the request, the request
>>> will fail with -ETIME.
>>>
>>> There are 3 possibilities for a request that times out:
>>> a) The request times out before the request has been sent to userspace
>>> b) The request times out after the request has been sent to userspace
>>> and before it receives a reply from the server
>>> c) The request times out after the request has been sent to userspace
>>> and the server replies while the kernel is timing out the request
>>>
>>> While a request timeout is being handled, there may be other handlers
>>> running at the same time if:
>>> a) the kernel is forwarding the request to the server
>>> b) the kernel is processing the server's reply to the request
>>> c) the request is being re-sent
>>> d) the connection is aborting
>>> e) the device is getting released
>>>
>>> Proper synchronization must be added to ensure that the request is
>>> handled correctly in all of these cases. To this effect, there is a new
>>> FR_FINISHING bit added to the request flags, which is set atomically by
>>> either the timeout handler (see fuse_request_timeout()) which is invoked
>>> after the request timeout elapses or set by the request reply handler
>>> (see dev_do_write()), whichever gets there first. If the reply handler
>>> and the timeout handler are executing simultaneously and the reply handler
>>> sets FR_FINISHING before the timeout handler, then the request will be
>>> handled as if the timeout did not elapse. If the timeout handler sets
>>> FR_FINISHING before the reply handler, then the request will fail with
>>> -ETIME and the request will be cleaned up.
>>>
>>> Currently, this is the refcount lifecycle of a request:
>>>
>>> Synchronous request is created:
>>> fuse_simple_request -> allocates request, sets refcount to 1
>>>   __fuse_request_send -> acquires refcount
>>>     queues request and waits for reply...
>>> fuse_simple_request -> drops refcount
>>>
>>> Background request is created:
>>> fuse_simple_background -> allocates request, sets refcount to 1
>>>
>>> Request is replied to:
>>> fuse_dev_do_write
>>>   fuse_request_end -> drops refcount on request
>>>
>>> Proper acquires on the request reference must be added to ensure that the
>>> timeout handler does not drop the last refcount on the request while
>>> other handlers may be operating on the request. Please note that the
>>> timeout handler may get invoked at any phase of the request's
>>> lifetime (eg before the request has been forwarded to userspace, etc).
>>>
>>> It is always guaranteed that there is a refcount on the request when the
>>> timeout handler is executing. The timeout handler will be either
>>> deactivated by the reply/abort/release handlers, or if the timeout
>>> handler is concurrently executing on another CPU, the reply/abort/release
>>> handlers will wait for the timeout handler to finish executing first before
>>> it drops the final refcount on the request.
>>>
>>> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
>>> ---
>>>  fs/fuse/dev.c    | 187 +++++++++++++++++++++++++++++++++++++++++++++--
>>>  fs/fuse/fuse_i.h |  14 ++++
>>>  fs/fuse/inode.c  |   7 ++
>>>  3 files changed, 200 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
>>> index 9eb191b5c4de..9992bc5f4469 100644
>>> --- a/fs/fuse/dev.c
>>> +++ b/fs/fuse/dev.c
>>> @@ -31,6 +31,8 @@ MODULE_ALIAS("devname:fuse");
>>>
>>>  static struct kmem_cache *fuse_req_cachep;
>>>
>>> +static void fuse_request_timeout(struct timer_list *timer);
>>> +
>>>  static struct fuse_dev *fuse_get_dev(struct file *file)
>>>  {
>>>       /*
>>> @@ -48,6 +50,8 @@ static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
>>>       refcount_set(&req->count, 1);
>>>       __set_bit(FR_PENDING, &req->flags);
>>>       req->fm = fm;
>>> +     if (fm->fc->req_timeout)
>>> +             timer_setup(&req->timer, fuse_request_timeout, 0);
>>>  }
>>>
>>>  static struct fuse_req *fuse_request_alloc(struct fuse_mount *fm, gfp_t flags)
>>> @@ -277,12 +281,15 @@ static void flush_bg_queue(struct fuse_conn *fc)
>>>   * the 'end' callback is called if given, else the reference to the
>>>   * request is released
>>>   */
>>> -void fuse_request_end(struct fuse_req *req)
>>> +static void do_fuse_request_end(struct fuse_req *req, bool from_timer_callback)
>>>  {
>>>       struct fuse_mount *fm = req->fm;
>>>       struct fuse_conn *fc = fm->fc;
>>>       struct fuse_iqueue *fiq = &fc->iq;
>>>
>>> +     if (from_timer_callback)
>>> +             req->out.h.error = -ETIME;
>>> +
>>
>> FMHO, could we move the above error assignment up to the caller to make
>> do_fuse_request_end() look cleaner?
> 
> Sure, I was thinking that it looks cleaner setting this in
> do_fuse_request_end() instead of having to set it in both
> timeout_pending_req() and timeout_inflight_req(), but I see your point
> as well.
> I'll make this change in v3.
> 
>>
>>
>>>       if (test_and_set_bit(FR_FINISHED, &req->flags))
>>>               goto put_request;
>>>
>>> @@ -296,8 +303,6 @@ void fuse_request_end(struct fuse_req *req)
>>>               list_del_init(&req->intr_entry);
>>>               spin_unlock(&fiq->lock);
>>>       }
>>> -     WARN_ON(test_bit(FR_PENDING, &req->flags));
>>> -     WARN_ON(test_bit(FR_SENT, &req->flags));
>>>       if (test_bit(FR_BACKGROUND, &req->flags)) {
>>>               spin_lock(&fc->bg_lock);
>>>               clear_bit(FR_BACKGROUND, &req->flags);
>>> @@ -324,13 +329,105 @@ void fuse_request_end(struct fuse_req *req)
>>>               wake_up(&req->waitq);
>>>       }
>>>
>>> +     if (!from_timer_callback && req->timer.function)
>>> +             timer_delete_sync(&req->timer);
>>> +
>>
>> Similarly, move the caller i.e. fuse_request_end() call
>> timer_delete_sync() instead?
> 
> I don't think we can do that because the fuse_put_request() at the end
> of this function often holds the last refcount on the request which
> frees the request when it releases the ref.

Just a suggestion, maybe add an extra reference for the timer? The
condition above and fuse_request_timeout() would then need to release
that ref.


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 0/2] fuse: add timeout option for requests
  2024-08-05  5:05                 ` Joanne Koong
@ 2024-08-06 16:23                   ` Joanne Koong
  2024-08-06 17:11                     ` Bernd Schubert
  0 siblings, 1 reply; 34+ messages in thread
From: Joanne Koong @ 2024-08-06 16:23 UTC (permalink / raw)
  To: Yafang Shao; +Cc: miklos, linux-fsdevel, josef, bernd.schubert, kernel-team

On Sun, Aug 4, 2024 at 10:05 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Sun, Aug 4, 2024 at 12:48 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > On Sat, Aug 3, 2024 at 3:05 AM Joanne Koong <joannelkoong@gmail.com> wrote:
> > >
> > > On Wed, Jul 31, 2024 at 7:47 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > >
> > > > On Thu, Aug 1, 2024 at 2:46 AM Joanne Koong <joannelkoong@gmail.com> wrote:
> > > > >
> > > > > On Wed, Jul 31, 2024 at 10:52 AM Joanne Koong <joannelkoong@gmail.com> wrote:
> > > > > >
> > > > > > On Tue, Jul 30, 2024 at 7:14 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > > > > >
> > > > > > > On Wed, Jul 31, 2024 at 2:16 AM Joanne Koong <joannelkoong@gmail.com> wrote:
> > > > > > > >
> > > > > > > > On Mon, Jul 29, 2024 at 11:00 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > On Tue, Jul 30, 2024 at 8:28 AM Joanne Koong <joannelkoong@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > There are situations where fuse servers can become unresponsive or take
> > > > > > > > > > too long to reply to a request. Currently there is no upper bound on
> > > > > > > > > > how long a request may take, which may be frustrating to users who get
> > > > > > > > > > stuck waiting for a request to complete.
> > > > > > > > > >
> > > > > > > > > > This patchset adds a timeout option for requests and two dynamically
> > > > > > > > > > configurable fuse sysctls "default_request_timeout" and "max_request_timeout"
> > > > > > > > > > for controlling/enforcing timeout behavior system-wide.
> > > > > > > > > >
> > > > > > > > > > Existing fuse servers will not be affected unless they explicitly opt into the
> > > > > > > > > > timeout.
> > > > > > > > > >
> > > > > > > > > > v1: https://lore.kernel.org/linux-fsdevel/20240717213458.1613347-1-joannelkoong@gmail.com/
> > > > > > > > > > Changes from v1:
> > > > > > > > > > - Add timeout for background requests
> > > > > > > > > > - Handle resend race condition
> > > > > > > > > > - Add sysctls
> > > > > > > > > >
> > > > > > > > > > Joanne Koong (2):
> > > > > > > > > >   fuse: add optional kernel-enforced timeout for requests
> > > > > > > > > >   fuse: add default_request_timeout and max_request_timeout sysctls
> > > > > > > > > >
> > > > > > > > > >  Documentation/admin-guide/sysctl/fs.rst |  17 +++
> > > > > > > > > >  fs/fuse/Makefile                        |   2 +-
> > > > > > > > > >  fs/fuse/dev.c                           | 187 +++++++++++++++++++++++-
> > > > > > > > > >  fs/fuse/fuse_i.h                        |  30 ++++
> > > > > > > > > >  fs/fuse/inode.c                         |  24 +++
> > > > > > > > > >  fs/fuse/sysctl.c                        |  42 ++++++
> > > > > > > > > >  6 files changed, 293 insertions(+), 9 deletions(-)
> > > > > > > > > >  create mode 100644 fs/fuse/sysctl.c
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > 2.43.0
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Hello Joanne,
> > > > > > > > >
> > > > > > > > > Thanks for your update.
> > > > > > > > >
> > > > > > > > > I have tested your patches using my test case, which is similar to the
> > > > > > > > > hello-fuse [0] example, with an additional change as follows:
> > > > > > > > >
> > > > > > > > > @@ -125,6 +125,8 @@ static int hello_read(const char *path, char *buf,
> > > > > > > > > size_t size, off_t offset,
> > > > > > > > >         } else
> > > > > > > > >                 size = 0;
> > > > > > > > >
> > > > > > > > > +       // TO trigger timeout
> > > > > > > > > +       sleep(60);
> > > > > > > > >         return size;
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > [0] https://github.com/libfuse/libfuse/blob/master/example/hello.c
> > > > > > > > >
> > > > > > > > > However, it triggered a crash with the following setup:
> > > > > > > > >
> > > > > > > > > 1. Set FUSE timeout:
> > > > > > > > >   sysctl -w fs.fuse.default_request_timeout=10
> > > > > > > > >   sysctl -w fs.fuse.max_request_timeout = 20
> > > > > > > > >
> > > > > > > > > 2. Start FUSE daemon:
> > > > > > > > >   ./hello /tmp/fuse
> > > > > > > > >
> > > > > > > > > 3. Read from FUSE:
> > > > > > > > >   cat /tmp/fuse/hello
> > > > > > > > >
> > > > > > > > > 4. Kill the process within 10 seconds (to avoid the timeout being triggered).
> > > > > > > > >    Then the crash will be triggered.
> > > > > > > >
> > > > > > > > Hi Yafang,
> > > > > > > >
> > > > > > > > Thanks for trying this out on your use case!
> > > > > > > >
> > > > > > > > How consistently are you able to repro this?
> > > > > > >
> > > > > > > It triggers the crash every time.
> > > > > > >
> > > > > > > > I tried reproing using
> > > > > > > > your instructions above but I'm not able to get the crash.
> > > > > > >
> > > > > > > Please note that it is the `cat /tmp/fuse/hello` process that was
> > > > > > > killed, not the fuse daemon.
> > > > > > > The crash seems to occur when the fuse daemon wakes up after
> > > > > > > sleep(60). Please ensure that the fuse daemon can be woken up.
> > > > > > >
> > > > > >
> > > > > > I'm still not able to trigger the crash by killing the `cat
> > > > > > /tmp/fuse/hello` process. This is how I'm repro-ing
> > > > > >
> > > > > > 1) Add sleep to test code in
> > > > > > https://github.com/libfuse/libfuse/blob/master/example/hello.c
> > > > > > @@ -125,6 +126,9 @@ static int hello_read(const char *path, char *buf,
> > > > > > size_t size, off_t offset,
> > > > > >         } else
> > > > > >                 size = 0;
> > > > > >
> > > > > > +       sleep(60);
> > > > > > +       printf("hello_read woke up from sleep\n");
> > > > > > +
> > > > > >         return size;
> > > > > >  }
> > > > > >
> > > > > > 2)  Set fuse timeout to 10 seconds
> > > > > > sysctl -w fs.fuse.default_request_timeout=10
> > > > > >
> > > > > > 3) Start fuse daemon
> > > > > > ./example/hello ./tmp/fuse
> > > > > >
> > > > > > 4) Read from fuse
> > > > > > cat /tmp/fuse/hello
> > > > > >
> > > > > > 5) Get pid of cat process
> > > > > > top -b | grep cat
> > > > > >
> > > > > > 6) Kill cat process (within 10 seconds)
> > > > > >  sudo kill -9 <cat-pid>
> > > > > >
> > > > > > 7) Wait 60 seconds for fuse's read request to complete
> > > > > >
> > > > > > From what it sounds like, this is exactly what you are doing as well?
> > > > > >
> > > > > > I added some kernel-side logs and I'm seeing that the read request is
> > > > > > timing out after ~10 seconds and handled by the timeout handler
> > > > > > successfully.
> > > > > >
> > > > > > On the fuse daemon side, these are the logs I'm seeing from the above repro:
> > > > > > ./example/hello /tmp/fuse -f -d
> > > > > >
> > > > > > FUSE library version: 3.17.0
> > > > > > nullpath_ok: 0
> > > > > > unique: 2, opcode: INIT (26), nodeid: 0, insize: 104, pid: 0
> > > > > > INIT: 7.40
> > > > > > flags=0x73fffffb
> > > > > > max_readahead=0x00020000
> > > > > >    INIT: 7.40
> > > > > >    flags=0x4040f039
> > > > > >    max_readahead=0x00020000
> > > > > >    max_write=0x00100000
> > > > > >    max_background=0
> > > > > >    congestion_threshold=0
> > > > > >    time_gran=1
> > > > > >    unique: 2, success, outsize: 80
> > > > > > unique: 4, opcode: LOOKUP (1), nodeid: 1, insize: 46, pid: 673
> > > > > > LOOKUP /hello
> > > > > > getattr[NULL] /hello
> > > > > >    NODEID: 2
> > > > > >    unique: 4, success, outsize: 144
> > > > > > unique: 6, opcode: OPEN (14), nodeid: 2, insize: 48, pid: 673
> > > > > > open flags: 0x8000 /hello
> > > > > >    open[0] flags: 0x8000 /hello
> > > > > >    unique: 6, success, outsize: 32
> > > > > > unique: 8, opcode: READ (15), nodeid: 2, insize: 80, pid: 673
> > > > > > read[0] 4096 bytes from 0 flags: 0x8000
> > > > > > unique: 10, opcode: FLUSH (25), nodeid: 2, insize: 64, pid: 673
> > > > > >    unique: 10, error: -38 (Function not implemented), outsize: 16
> > > > > > unique: 11, opcode: INTERRUPT (36), nodeid: 0, insize: 48, pid: 0
> > > > > > FUSE_INTERRUPT: reply to kernel to disable interrupt
> > > > > >    unique: 11, error: -38 (Function not implemented), outsize: 16
> > > > > >
> > > > > > unique: 12, opcode: RELEASE (18), nodeid: 2, insize: 64, pid: 0
> > > > > >    unique: 12, success, outsize: 16
> > > > > >
> > > > > > hello_read woke up from sleep
> > > > > >    read[0] 13 bytes from 0
> > > > > >    unique: 8, success, outsize: 29
> > > > > >
> > > > > >
> > > > > > Are these the debug logs you are seeing from the daemon side as well?
> > > > > >
> > > > > > Thanks,
> > > > > > Joanne
> > > > > > > >
> > > > > > > > From the crash logs you provided below, it looks like what's happening
> > > > > > > > is that if the process gets killed, the timer isn't getting deleted.
> > > > >
> > > > > When I looked at this log previously, I thought you were repro-ing by
> > > > > killing the fuse daemon process, not the cat process. When we kill the
> > > > > cat process, the timer shouldn't be getting deleted. (if the daemon
> > > > > itself is killed, the timers get deleted)
> > > > >
> > > > > > > > I'll look more into what happens in fuse when a process is killed and
> > > > > > > > get back to you on this.
> > > > >
> > > > > This is the flow of what is happening on the kernel side (verified by
> > > > > local printks) -
> > > > >
> > > > > `cat /tmp/fuse/hello`:
> > > > > Issues a FUSE_READ background request (via fuse_send_readpages(),
> > > > > fm->fc->async_read). This request will have a timeout of 10 seconds on
> > > > > it
> > > > >
> > > > > The cat process is killed:
> > > > > This does not clean up the request. The request is still on the fpq
> > > > > processing list.
> > > > >
> > > > > Timeout on request expires:
> > > > > The timeout handler runs and properly cleans up / frees the request.
> > > > >
> > > > > Fuse daemon wakes from sleep and replies to the request:
> > > > > In dev_do_write(), the kernel won't be able to find this request
> > > > > (since it timed out and was removed from the fpq processing list) and
> > > > > return with -ENOENT
> > > >
> > > > Thank you for your explanation.
> > > > I will verify if there are any issues with my test environment.
> > > >
> > > Hi Yafang,
> > >
> > > Would you mind adding these printks to your kernel when you run the
> > > repro and pasting what they show?
> > >
> > > --- a/fs/fuse/dev.c
> > > +++ b/fs/fuse/dev.c
> > > @@ -287,6 +287,9 @@ static void do_fuse_request_end(struct fuse_req
> > > *req, bool from_timer_callback)
> > >         struct fuse_conn *fc = fm->fc;
> > >         struct fuse_iqueue *fiq = &fc->iq;
> > >
> > > +       printk("do_fuse_request_end: req=%p, from_timer=%d,
> > > req->timer.func=%d\n",
> > > +              req, from_timer_callback, req->timer.function != NULL);
> > > +
> > >         if (from_timer_callback)
> > >                 req->out.h.error = -ETIME;
> > >
> > > @@ -415,6 +418,8 @@ static void fuse_request_timeout(struct timer_list *timer)
> > >  {
> > >         struct fuse_req *req = container_of(timer, struct fuse_req, timer);
> > >
> > > +       printk("fuse_request_timeout: req=%p\n", req);
> > > +
> > >         /*
> > >          * Request reply is being finished by the kernel right now.
> > >          * No need to time out the request.
> > > @@ -612,6 +617,7 @@ ssize_t fuse_simple_request(struct fuse_mount *fm,
> > > struct fuse_args *args)
> > >
> > >         if (!args->noreply)
> > >                 __set_bit(FR_ISREPLY, &req->flags);
> > > +       printk("fuse_simple_request: req=%p, op=%u\n", req, args->opcode);
> > >         __fuse_request_send(req);
> > >         ret = req->out.h.error;
> > >         if (!ret && args->out_argvar) {
> > > @@ -673,6 +679,7 @@ int fuse_simple_background(struct fuse_mount *fm,
> > > struct fuse_args *args,
> > >
> > >         fuse_args_to_req(req, args);
> > >
> > > +       printk("fuse_background_request: req=%p, op=%u\n", req, args->opcode);
> > >         if (!fuse_request_queue_background(req)) {
> > >                 fuse_put_request(req);
> > >
> > >
> > > When I run it on my side, I see
> > >
> > > [   68.117740] fuse_background_request: req=00000000874e2f14, op=26
> > > [   68.131440] do_fuse_request_end: req=00000000874e2f14,
> > > from_timer=0, req->timer.func=1
> > > [   71.558538] fuse_simple_request: req=00000000cf643ace, op=1
> > > [   71.559651] do_fuse_request_end: req=00000000cf643ace,
> > > from_timer=0, req->timer.func=1
> > > [   71.561044] fuse_simple_request: req=00000000f2c001f0, op=14
> > > [   71.562524] do_fuse_request_end: req=00000000f2c001f0,
> > > from_timer=0, req->timer.func=1
> > > [   71.563820] fuse_background_request: req=00000000584f2cc3, op=15
> > > [   78.580035] fuse_simple_request: req=00000000ecbee970, op=25
> > > [   78.582614] do_fuse_request_end: req=00000000ecbee970,
> > > from_timer=0, req->timer.func=1
> > > [   81.624722] fuse_request_timeout: req=00000000584f2cc3
> > > [   81.625443] do_fuse_request_end: req=00000000584f2cc3,
> > > from_timer=1, req->timer.func=1
> > > [   81.626377] fuse_background_request: req=00000000b2d792ed, op=18
> > > [   81.627623] do_fuse_request_end: req=00000000b2d792ed,
> > > from_timer=0, req->timer.func=1
> > >
> > > I'm seeing only one timer get called, on the read request (opcode=15),
> > > and I'm not seeing do_fuse_request_end having been called on that
> > > request before the timer is invoked.
> > > I'm curious to compare this against the logs on your end.
> >
> > The log on my side is as follows,
> Thank you Yafang. These logs are very helpful.
>
> >
> > [  283.329421] fuse_background_request: req=000000002b4f82d4, op=26
> > [  283.330043] do_fuse_request_end: req=000000002b4f82d4,
> > from_timer=0, req->timer.func=0
> > [  287.889844] fuse_simple_request: req=00000000865e85bf, op=3
> > [  287.889914] do_fuse_request_end: req=00000000865e85bf,
> > from_timer=0, req->timer.func=0
> > [  287.889933] fuse_simple_request: req=00000000865e85bf, op=22
> > [  287.889994] do_fuse_request_end: req=00000000865e85bf,
> > from_timer=0, req->timer.func=0
> > [  287.890096] fuse_simple_request: req=00000000865e85bf, op=27
> > [  287.890130] do_fuse_request_end: req=00000000865e85bf,
> > from_timer=0, req->timer.func=0
> > [  287.890142] fuse_simple_request: req=00000000865e85bf, op=28
> > [  287.890167] do_fuse_request_end: req=00000000865e85bf,
> > from_timer=0, req->timer.func=0
> > [  287.890178] fuse_simple_request: req=00000000865e85bf, op=1
> > [  287.890191] do_fuse_request_end: req=00000000865e85bf,
> > from_timer=0, req->timer.func=0
> > [  287.890209] fuse_simple_request: req=00000000865e85bf, op=28
> > [  287.890216] do_fuse_request_end: req=00000000865e85bf,
> > from_timer=0, req->timer.func=0
> > [  287.890222] fuse_background_request: req=00000000865e85bf, op=29
> > [  287.890230] do_fuse_request_end: req=00000000865e85bf,
> > from_timer=0, req->timer.func=0
> > [  312.311752] fuse_background_request: req=00000000a8da8b44, op=26
> > [  312.312249] do_fuse_request_end: req=00000000a8da8b44,
> > from_timer=0, req->timer.func=1
> > [  317.368786] fuse_simple_request: req=00000000bc4817dd, op=1
> > [  317.368871] do_fuse_request_end: req=00000000bc4817dd,
> > from_timer=0, req->timer.func=1
> > [  317.368910] fuse_simple_request: req=00000000bc4817dd, op=14
> > [  317.368942] do_fuse_request_end: req=00000000bc4817dd,
> > from_timer=0, req->timer.func=1
> > [  317.368967] fuse_simple_request: req=00000000bc4817dd, op=15
> > [  327.855189] fuse_request_timeout: req=00000000bc4817dd
> > [  327.855195] do_fuse_request_end: req=00000000bc4817dd,
> > from_timer=1, req->timer.func=1
> > [  327.855218] fuse_simple_request: req=00000000c34cc363, op=15
> > [  327.855328] fuse_simple_request: req=00000000c34cc363, op=25
> > [  327.855401] do_fuse_request_end: req=00000000c34cc363,
> > from_timer=0, req->timer.func=1
> > [  327.855496] fuse_background_request: req=00000000c34cc363, op=18
> > [  327.855508] do_fuse_request_end: req=00000000c34cc363,
> > from_timer=0, req->timer.func=1
> > [  338.095136] Oops: general protection fault, probably for
> > non-canonical address 0xdead00000000012a: 0000 [#1] PREEMPT SMP NOPTI
> > [  338.096415] CPU: 58 PID: 0 Comm: swapper/58 Kdump: loaded Not
> > tainted 6.10.0+ #8
> > [  338.098219] RIP: 0010:__run_timers+0x27e/0x360
> > [  338.098686] Code: 07 48 c7 43 08 00 00 00 00 48 85 c0 74 78 4d 8b
> > 2f 4c 89 6b 08 0f 1f 44 00 00 49 8b 45 00 49 8b 55 08 48 89 02 48 85
> > c0 74 04 <48> 89 50 08 4d 8b 65 18 49 c7 45 08 00 00 00 00 48 b8 22 01
> > 00 00
> > [  338.100381] RSP: 0018:ffffb4ef808bced8 EFLAGS: 00010086
> > [  338.100907] RAX: dead000000000122 RBX: ffff9827ffca13c0 RCX: 0000000000000001
> > [  338.101623] RDX: ffffb4ef808bcef8 RSI: 0000000000000000 RDI: ffff9827ffca13e8
> > [  338.102333] RBP: ffffb4ef808bcf70 R08: 000000000000008b R09: ffff9827ffca1430
> > [  338.103020] R10: ffffffff93e060c0 R11: 0000000000000089 R12: 0000000000000001
> > [  338.103726] R13: ffff97e9dc06a0a0 R14: 0000000100009200 R15: ffffb4ef808bcef8
> > [  338.104439] FS:  0000000000000000(0000) GS:ffff9827ffc80000(0000)
> > knlGS:0000000000000000
> > [  338.105229] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  338.105795] CR2: 000000c002f99340 CR3: 0000000148254001 CR4: 0000000000370ef0
> > [  338.106502] Call Trace:
> > [  338.106836]  <IRQ>
> > [  338.107175]  ? show_regs+0x69/0x80
> > [  338.107603]  ? die_addr+0x38/0x90
> > [  338.108005]  ? exc_general_protection+0x236/0x490
> > [  338.108557]  ? asm_exc_general_protection+0x27/0x30
> > [  338.109095]  ? __run_timers+0x27e/0x360
> > [  338.109563]  ? __run_timers+0x1b4/0x360
> > [  338.110009]  ? kvm_sched_clock_read+0x11/0x20
> > [  338.110528]  ? sched_clock_noinstr+0x9/0x10
> > [  338.111002]  ? sched_clock+0x10/0x30
> > [  338.111447]  ? sched_clock_cpu+0x10/0x190
> > [  338.111914]  run_timer_softirq+0x3a/0x60
> > [  338.112406]  handle_softirqs+0x118/0x350
> > [  338.112859]  irq_exit_rcu+0x60/0x80
> > [  338.113295]  sysvec_apic_timer_interrupt+0x7f/0x90
> > [  338.113823]  </IRQ>
> > [  338.114147]  <TASK>
> > [  338.114447]  asm_sysvec_apic_timer_interrupt+0x1b/0x20
> > [  338.115002] RIP: 0010:default_idle+0xb/0x20
> > [  338.115498] Code: 00 4d 29 c8 4c 01 c7 4c 29 c2 e9 6e ff ff ff 90
> > 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 eb 07 0f 00 2d b3 51 33
> > 00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40
> > 00 90
> > [  338.117337] RSP: 0018:ffffb4ef8028fe18 EFLAGS: 00000246
> > [  338.117894] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 0001b48ebb3a1032
> > [  338.118673] RDX: 0000000000000001 RSI: ffffffff9412e060 RDI: ffff9827ffcbc8e0
> > [  338.119415] RBP: ffffb4ef8028fe20 R08: 0000004eb7fb01b4 R09: 0000000000000001
> > [  338.120151] R10: ffffffff93e56080 R11: 0000000000000001 R12: 0000000000000001
> > [  338.120872] R13: ffffffff9412e060 R14: ffffffff9412e0e0 R15: 0000000000000001
> > [  338.121615]  ? ct_kernel_exit.constprop.0+0x79/0x90
> > [  338.122171]  ? arch_cpu_idle+0x9/0x10
> > [  338.122602]  default_enter_idle+0x22/0x2f
> > [  338.123064]  cpuidle_enter_state+0x88/0x430
> > [  338.123556]  cpuidle_enter+0x34/0x50
> > [  338.123978]  call_cpuidle+0x22/0x50
> > [  338.124449]  cpuidle_idle_call+0xd2/0x120
> > [  338.124909]  do_idle+0x77/0xd0
> > [  338.125313]  cpu_startup_entry+0x2c/0x30
> > [  338.125763]  start_secondary+0x117/0x140
> > [  338.126240]  common_startup_64+0x13e/0x141
> > [  338.126711]  </TASK>
> >
> > In addition to the hello-fuse, there is another FUSE daemon, lxcfs,
> > running on my test server. After disabling lxcfs, the system no longer
> > panics, but there are still error logs:
> >
> > [  285.804534] fuse_background_request: req=0000000063502a93, op=26
> > [  285.805041] do_fuse_request_end: req=0000000063502a93,
> > from_timer=0, req->timer.func=1
> > [  290.967412] fuse_simple_request: req=000000003f362e4b, op=1
> > [  290.967480] do_fuse_request_end: req=000000003f362e4b,
> > from_timer=0, req->timer.func=1
> > [  290.967517] fuse_simple_request: req=000000003f362e4b, op=14
> > [  290.967585] do_fuse_request_end: req=000000003f362e4b,
> > from_timer=0, req->timer.func=1
> > [  290.967655] fuse_simple_request: req=000000003f362e4b, op=15
> > [  300.996023] fuse_request_timeout: req=000000003f362e4b
> > [  300.996030] do_fuse_request_end: req=000000003f362e4b,
> > from_timer=1, req->timer.func=1
> > [  300.996066] fuse_simple_request: req=00000000b4182f02, op=15
> > [  300.996180] fuse_simple_request: req=000000003f362e4b, op=25
> > [  300.996185] ==================================================================
> > [  300.996980] BUG: KFENCE: use-after-free write in enqueue_timer+0x24/0xb0
> >
> > [  300.997788] Use-after-free write at 0x0000000022312cb7 (in kfence-#156):
> > [  300.998476]  enqueue_timer+0x24/0xb0
> > [  300.998479]  __mod_timer+0x23b/0x360
> > [  300.998481]  add_timer+0x20/0x30
> > [  300.998483]  fuse_simple_request+0x1bc/0x2f0 [fuse]
> > [  300.998506]  fuse_flush+0x1ac/0x1f0 [fuse]
> > [  300.998511]  filp_flush+0x39/0x90
> > [  300.998517]  filp_close+0x15/0x30
> > [  300.998519]  put_files_struct+0x77/0xe0
> > [  300.998522]  exit_files+0x47/0x60
> > [  300.998524]  do_exit+0x262/0x480
> > [  300.998528]  do_group_exit+0x34/0x90
> > [  300.998531]  get_signal+0x92f/0x980
> > [  300.998534]  arch_do_signal_or_restart+0x2a/0x100
> > [  300.998537]  syscall_exit_to_user_mode+0xe3/0x1a0
> > [  300.998541]  do_syscall_64+0x71/0x170
> > [  300.998545]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> >
> > [  300.998759] kfence-#156: 0x00000000b4182f02-0x0000000084fc5c46,
> > size=200, cache=ip4-frags
> >
> > [  300.998761] allocated by task 15064 on cpu 26 at 300.996061s:
> > [  300.998766]  fuse_request_alloc+0x21/0xb0 [fuse]
> > [  300.998771]  fuse_get_req+0xde/0x270 [fuse]
> > [  300.998775]  fuse_simple_request+0x33/0x2f0 [fuse]
> > [  300.998779]  fuse_do_readpage+0x15e/0x200 [fuse]
> > [  300.998783]  fuse_read_folio+0x29/0x60 [fuse]
> > [  300.998787]  filemap_read_folio+0x3b/0xe0
> > [  300.998791]  filemap_update_page+0x236/0x2d0
> > [  300.998792]  filemap_get_pages+0x225/0x390
> > [  300.998794]  filemap_read+0xed/0x3a0
> > [  300.998796]  generic_file_read_iter+0xb8/0x100
> > [  300.998798]  fuse_file_read_iter+0xd8/0x150 [fuse]
> > [  300.998804]  vfs_read+0x25e/0x340
> > [  300.998806]  ksys_read+0x67/0xf0
> > [  300.998808]  __x64_sys_read+0x19/0x20
> > [  300.998810]  x64_sys_call+0x1709/0x20b0
> > [  300.998813]  do_syscall_64+0x65/0x170
> > [  300.998815]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> >
> > [  300.998817] freed by task 15064 on cpu 26 at 300.996084s:
> > [  300.998822]  fuse_put_request+0x89/0xf0 [fuse]
> > [  300.998826]  fuse_simple_request+0xe1/0x2f0 [fuse]
> > [  300.998830]  fuse_do_readpage+0x15e/0x200 [fuse]
> > [  300.998835]  fuse_read_folio+0x29/0x60 [fuse]
> > [  300.998839]  filemap_read_folio+0x3b/0xe0
> > [  300.998840]  filemap_update_page+0x236/0x2d0
> > [  300.998842]  filemap_get_pages+0x225/0x390
> > [  300.998844]  filemap_read+0xed/0x3a0
> > [  300.998846]  generic_file_read_iter+0xb8/0x100
> > [  300.998848]  fuse_file_read_iter+0xd8/0x150 [fuse]
> > [  300.998852]  vfs_read+0x25e/0x340
> > [  300.998854]  ksys_read+0x67/0xf0
> > [  300.998856]  __x64_sys_read+0x19/0x20
> > [  300.998857]  x64_sys_call+0x1709/0x20b0
> > [  300.998859]  do_syscall_64+0x65/0x170
> > [  300.998860]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> This is very interesting. These logs (and the ones above with the
> lxcfs server running concurrently) are showing that the read request
> was freed but not through the do_fuse_request_end path. It's weird
> that fuse_simple_request reached fuse_put_request without
> do_fuse_request_end having been called (which is the only place where
> FR_FINISHED gets set and wakes up the wait events in
> request_wait_answer).
>
> I'll take a deeper look tomorrow and try to make more sense of it.

Finally realized what's happening!
When we kill the cat program, if the request hasn't been sent out to
userspace yet when the fatal signal interrupts the
wait_event_interruptible and wait_event_killable in
request_wait_answer(), this will clean up the request manually (not
through the fuse_request_end() path), which doesn't delete the timer.

I'll fix this for v3.

Thank you for surfacing this and it would be much appreciated if you
could test out v3 when it's submitted to make sure.

Thanks!
Joanne

>
> >
> > [  300.999115] CPU: 26 PID: 15064 Comm: cat Kdump: loaded Not tainted 6.10.0+ #8
> > [  301.000803] ==================================================================
> > [  301.001695] do_fuse_request_end: req=000000003f362e4b,
> > from_timer=0, req->timer.func=1
> > [  301.001723] fuse_background_request: req=000000003f362e4b, op=18
> > [  301.001767] do_fuse_request_end: req=000000003f362e4b,
> > from_timer=0, req->timer.func=1
> > [  311.235964] fuse_request_timeout: req=00000000b4182f02
> > [  311.235969] ------------[ cut here ]------------
> > [  311.235970] list_del corruption, ffff9a8072d3a000->next is
> > LIST_POISON1 (dead000000000100)
> > [  311.235982] WARNING: CPU: 26 PID: 0 at lib/list_debug.c:56
> > __list_del_entry_valid_or_report+0x8a/0xf0
> > [  311.236036] CPU: 26 PID: 0 Comm: swapper/26 Kdump: loaded Tainted:
> > G    B              6.10.0+ #8
> > [  311.236040] RIP: 0010:__list_del_entry_valid_or_report+0x8a/0xf0
> > [  311.236043] Code: 31 c0 5d c3 cc cc cc cc 48 c7 c7 60 7a 5e b0 e8
> > cc ea a4 ff 0f 0b 31 c0 5d c3 cc cc cc cc 48 c7 c7 88 7a 5e b0 e8 b6
> > ea a4 ff <0f> 0b 31 c0 5d c3 cc cc cc cc 48 89 ca 48 c7 c7 c0 7a 5e b0
> > e8 9d
> > [  311.236045] RSP: 0018:ffffb6364056ce60 EFLAGS: 00010282
> > [  311.236047] RAX: 0000000000000000 RBX: ffff9a8072d3a0a0 RCX: 0000000000000027
> > [  311.236048] RDX: ffff9a807f4a0848 RSI: 0000000000000001 RDI: ffff9a807f4a0840
> > [  311.236049] RBP: ffffb6364056ce60 R08: 0000000000000000 R09: ffffb6364056cce0
> > [  311.236050] R10: ffffb6364056ccd8 R11: ffffffffb1017ee8 R12: ffff9a8072d3a000
> > [  311.236051] R13: ffff9a420d5af000 R14: 0000000100002800 R15: ffff9a420d5af054
> > [  311.236054] FS:  0000000000000000(0000) GS:ffff9a807f480000(0000)
> > knlGS:0000000000000000
> > [  311.236056] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  311.236057] CR2: 000000c000dc5000 CR3: 000000010cc38003 CR4: 0000000000370ef0
> > [  311.236058] Call Trace:
> > [  311.236059]  <IRQ>
> > [  311.236061]  ? show_regs+0x69/0x80
> > [  311.236065]  ? __warn+0x88/0x130
> > [  311.236068]  ? __list_del_entry_valid_or_report+0x8a/0xf0
> > [  311.236070]  ? report_bug+0x18f/0x1a0
> > [  311.236074]  ? handle_bug+0x40/0x70
> > [  311.236077]  ? exc_invalid_op+0x19/0x70
> > [  311.236079]  ? asm_exc_invalid_op+0x1b/0x20
> > [  311.236083]  ? __list_del_entry_valid_or_report+0x8a/0xf0
> > [  311.236086]  fuse_request_timeout+0x15c/0x1a0 [fuse]
> > [  311.236094]  ? __pfx_fuse_request_timeout+0x10/0x10 [fuse]
> > [  311.236099]  call_timer_fn+0x2c/0x130
> > [  311.236102]  ? __pfx_fuse_request_timeout+0x10/0x10 [fuse]
> > [  311.236106]  __run_timers+0x2c2/0x360
> > [  311.236108]  ? kvm_sched_clock_read+0x11/0x20
> > [  311.236110]  ? sched_clock_noinstr+0x9/0x10
> > [  311.236111]  ? sched_clock+0x10/0x30
> > [  311.236114]  ? sched_clock_cpu+0x10/0x190
> > [  311.236116]  run_timer_softirq+0x3a/0x60
> > [  311.236118]  handle_softirqs+0x118/0x350
> > [  311.236121]  irq_exit_rcu+0x60/0x80
> > [  311.236123]  sysvec_apic_timer_interrupt+0x7f/0x90
> > [  311.236124]  </IRQ>
> > [  311.236125]  <TASK>
> > [  311.236126]  asm_sysvec_apic_timer_interrupt+0x1b/0x20
> > [  311.236128] RIP: 0010:default_idle+0xb/0x20
> > [  311.236130] Code: 00 4d 29 c8 4c 01 c7 4c 29 c2 e9 6e ff ff ff 90
> > 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 eb 07 0f 00 2d b3 51 33
> > 00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40
> > 00 90
> > [  311.236131] RSP: 0018:ffffb6364018fe18 EFLAGS: 00000246
> > [  311.236133] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 0001c0582ca6ada0
> > [  311.236134] RDX: 0000000000000001 RSI: ffffffffb112e060 RDI: ffff9a807f4bc8e0
> > [  311.236135] RBP: ffffb6364018fe20 R08: 00000048770ca8ec R09: 0000000000000001
> > [  311.236135] R10: ffffffffb0e56080 R11: 0000000000000001 R12: 0000000000000001
> > [  311.236136] R13: ffffffffb112e060 R14: ffffffffb112e0e0 R15: 0000000000000001
> > [  311.236138]  ? ct_kernel_exit.constprop.0+0x79/0x90
> > [  311.236140]  ? arch_cpu_idle+0x9/0x10
> > [  311.236142]  default_enter_idle+0x22/0x2f
> > [  311.236144]  cpuidle_enter_state+0x88/0x430
> > [  311.236146]  cpuidle_enter+0x34/0x50
> > [  311.236150]  call_cpuidle+0x22/0x50
> > [  311.236151]  cpuidle_idle_call+0xd2/0x120
> > [  311.236154]  do_idle+0x77/0xd0
> > [  311.236156]  cpu_startup_entry+0x2c/0x30
> > [  311.236158]  start_secondary+0x117/0x140
> > [  311.236160]  common_startup_64+0x13e/0x141
> > [  311.236163]  </TASK>
> > [  311.236163] ---[ end trace 0000000000000000 ]---
> > [  311.236165] do_fuse_request_end: req=00000000b4182f02,
> > from_timer=1, req->timer.func=1
> > [  311.236166] ------------[ cut here ]------------
> > [  311.236167] refcount_t: underflow; use-after-free.
> > [  311.236174] WARNING: CPU: 26 PID: 0 at lib/refcount.c:28
> > refcount_warn_saturate+0xc2/0x110
> > [  311.236207] CPU: 26 PID: 0 Comm: swapper/26 Kdump: loaded Tainted:
> > G    B   W          6.10.0+ #8
> > [  311.236209] RIP: 0010:refcount_warn_saturate+0xc2/0x110
> > [  311.236211] Code: 01 e8 d2 72 a6 ff 0f 0b 5d c3 cc cc cc cc 80 3d
> > 33 d4 b1 01 00 75 81 48 c7 c7 30 69 5e b0 c6 05 23 d4 b1 01 01 e8 ae
> > 72 a6 ff <0f> 0b 5d c3 cc cc cc cc 80 3d 0d d4 b1 01 00 0f 85 59 ff ff
> > ff 48
> > [  311.236212] RSP: 0018:ffffb6364056cdf8 EFLAGS: 00010286
> > [  311.236213] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000027
> > [  311.236214] RDX: ffff9a807f4a0848 RSI: 0000000000000001 RDI: ffff9a807f4a0840
> > [  311.236215] RBP: ffffb6364056cdf8 R08: 0000000000000000 R09: ffffb6364056cc78
> > [  311.236216] R10: ffffb6364056cc70 R11: ffffffffb1017ee8 R12: ffff9a8072d3a000
> > [  311.236217] R13: ffff9a420d5af000 R14: ffff9a42426a6ec0 R15: ffff9a8072d3a010
> > [  311.236219] FS:  0000000000000000(0000) GS:ffff9a807f480000(0000)
> > knlGS:0000000000000000
> > [  311.236221] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  311.236222] CR2: 000000c000dc5000 CR3: 000000010cc38003 CR4: 0000000000370ef0
> > [  311.236223] Call Trace:
> > [  311.236223]  <IRQ>
> > [  311.236224]  ? show_regs+0x69/0x80
> > [  311.236233]  ? __warn+0x88/0x130
> > [  311.236235]  ? refcount_warn_saturate+0xc2/0x110
> > [  311.236236]  ? report_bug+0x18f/0x1a0
> > [  311.236238]  ? handle_bug+0x40/0x70
> > [  311.236240]  ? exc_invalid_op+0x19/0x70
> > [  311.236242]  ? asm_exc_invalid_op+0x1b/0x20
> > [  311.236244]  ? refcount_warn_saturate+0xc2/0x110
> > [  311.236246]  ? refcount_warn_saturate+0xc2/0x110
> > [  311.236247]  fuse_put_request+0xc6/0xf0 [fuse]
> > [  311.236253]  do_fuse_request_end+0xcc/0x1e0 [fuse]
> > [  311.236258]  fuse_request_timeout+0xac/0x1a0 [fuse]
> > [  311.236263]  ? __pfx_fuse_request_timeout+0x10/0x10 [fuse]
> > [  311.236267]  call_timer_fn+0x2c/0x130
> > [  311.236269]  ? __pfx_fuse_request_timeout+0x10/0x10 [fuse]
> > [  311.236274]  __run_timers+0x2c2/0x360
> > [  311.236275]  ? kvm_sched_clock_read+0x11/0x20
> > [  311.236277]  ? sched_clock_noinstr+0x9/0x10
> > [  311.236278]  ? sched_clock+0x10/0x30
> > [  311.236280]  ? sched_clock_cpu+0x10/0x190
> > [  311.236281]  run_timer_softirq+0x3a/0x60
> > [  311.236283]  handle_softirqs+0x118/0x350
> > [  311.236285]  irq_exit_rcu+0x60/0x80
> > [  311.236286]  sysvec_apic_timer_interrupt+0x7f/0x90
> > [  311.236288]  </IRQ>
> > [  311.236288]  <TASK>
> > [  311.236289]  asm_sysvec_apic_timer_interrupt+0x1b/0x20
> > [  311.236291] RIP: 0010:default_idle+0xb/0x20
> > [  311.236293] Code: 00 4d 29 c8 4c 01 c7 4c 29 c2 e9 6e ff ff ff 90
> > 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 eb 07 0f 00 2d b3 51 33
> > 00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40
> > 00 90
> > [  311.236294] RSP: 0018:ffffb6364018fe18 EFLAGS: 00000246
> > [  311.236295] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 0001c0582ca6ada0
> > [  311.236296] RDX: 0000000000000001 RSI: ffffffffb112e060 RDI: ffff9a807f4bc8e0
> > [  311.236297] RBP: ffffb6364018fe20 R08: 00000048770ca8ec R09: 0000000000000001
> > [  311.236298] R10: ffffffffb0e56080 R11: 0000000000000001 R12: 0000000000000001
> > [  311.236299] R13: ffffffffb112e060 R14: ffffffffb112e0e0 R15: 0000000000000001
> > [  311.236300]  ? ct_kernel_exit.constprop.0+0x79/0x90
> > [  311.236302]  ? arch_cpu_idle+0x9/0x10
> > [  311.236304]  default_enter_idle+0x22/0x2f
> > [  311.236306]  cpuidle_enter_state+0x88/0x430
> > [  311.236308]  cpuidle_enter+0x34/0x50
> > [  311.236310]  call_cpuidle+0x22/0x50
> > [  311.236311]  cpuidle_idle_call+0xd2/0x120
> > [  311.236313]  do_idle+0x77/0xd0
> > [  311.236315]  cpu_startup_entry+0x2c/0x30
> > [  311.236317]  start_secondary+0x117/0x140
> > [  311.236318]  common_startup_64+0x13e/0x141
> > [  311.236320]  </TASK>
> > [  311.236321] ---[ end trace 0000000000000000 ]---
> >
> > I wish I could provide you with a clear explanation of what happened
> > in my test environment, but I haven't had the time to delve into the
> > details yet.
> >
> >
> > --
> > Regards
> > Yafang

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/2] fuse: add optional kernel-enforced timeout for requests
  2024-08-06  2:45       ` Jingbo Xu
@ 2024-08-06 16:43         ` Joanne Koong
  0 siblings, 0 replies; 34+ messages in thread
From: Joanne Koong @ 2024-08-06 16:43 UTC (permalink / raw)
  To: Jingbo Xu
  Cc: miklos, linux-fsdevel, josef, bernd.schubert, laoar.shao,
	kernel-team

On Mon, Aug 5, 2024 at 7:45 PM Jingbo Xu <jefflexu@linux.alibaba.com> wrote:
>
>
>
> On 8/6/24 6:53 AM, Joanne Koong wrote:
> > On Mon, Aug 5, 2024 at 12:32 AM Jingbo Xu <jefflexu@linux.alibaba.com> wrote:
> >>
> >>
> >>
> >> On 7/30/24 8:23 AM, Joanne Koong wrote:
> >>> There are situations where fuse servers can become unresponsive or take
> >>> too long to reply to a request. Currently there is no upper bound on
> >>> how long a request may take, which may be frustrating to users who get
> >>> stuck waiting for a request to complete.
> >>>
> >>> This commit adds a timeout option (in seconds) for requests. If the
> >>> timeout elapses before the server replies to the request, the request
> >>> will fail with -ETIME.
> >>>
> >>> There are 3 possibilities for a request that times out:
> >>> a) The request times out before the request has been sent to userspace
> >>> b) The request times out after the request has been sent to userspace
> >>> and before it receives a reply from the server
> >>> c) The request times out after the request has been sent to userspace
> >>> and the server replies while the kernel is timing out the request
> >>>
> >>> While a request timeout is being handled, there may be other handlers
> >>> running at the same time if:
> >>> a) the kernel is forwarding the request to the server
> >>> b) the kernel is processing the server's reply to the request
> >>> c) the request is being re-sent
> >>> d) the connection is aborting
> >>> e) the device is getting released
> >>>
> >>> Proper synchronization must be added to ensure that the request is
> >>> handled correctly in all of these cases. To this effect, there is a new
> >>> FR_FINISHING bit added to the request flags, which is set atomically by
> >>> either the timeout handler (see fuse_request_timeout()) which is invoked
> >>> after the request timeout elapses or set by the request reply handler
> >>> (see dev_do_write()), whichever gets there first. If the reply handler
> >>> and the timeout handler are executing simultaneously and the reply handler
> >>> sets FR_FINISHING before the timeout handler, then the request will be
> >>> handled as if the timeout did not elapse. If the timeout handler sets
> >>> FR_FINISHING before the reply handler, then the request will fail with
> >>> -ETIME and the request will be cleaned up.
> >>>
> >>> Currently, this is the refcount lifecycle of a request:
> >>>
> >>> Synchronous request is created:
> >>> fuse_simple_request -> allocates request, sets refcount to 1
> >>>   __fuse_request_send -> acquires refcount
> >>>     queues request and waits for reply...
> >>> fuse_simple_request -> drops refcount
> >>>
> >>> Background request is created:
> >>> fuse_simple_background -> allocates request, sets refcount to 1
> >>>
> >>> Request is replied to:
> >>> fuse_dev_do_write
> >>>   fuse_request_end -> drops refcount on request
> >>>
> >>> Proper acquires on the request reference must be added to ensure that the
> >>> timeout handler does not drop the last refcount on the request while
> >>> other handlers may be operating on the request. Please note that the
> >>> timeout handler may get invoked at any phase of the request's
> >>> lifetime (eg before the request has been forwarded to userspace, etc).
> >>>
> >>> It is always guaranteed that there is a refcount on the request when the
> >>> timeout handler is executing. The timeout handler will be either
> >>> deactivated by the reply/abort/release handlers, or if the timeout
> >>> handler is concurrently executing on another CPU, the reply/abort/release
> >>> handlers will wait for the timeout handler to finish executing first before
> >>> it drops the final refcount on the request.
> >>>
> >>> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> >>> ---
> >>>  fs/fuse/dev.c    | 187 +++++++++++++++++++++++++++++++++++++++++++++--
> >>>  fs/fuse/fuse_i.h |  14 ++++
> >>>  fs/fuse/inode.c  |   7 ++
> >>>  3 files changed, 200 insertions(+), 8 deletions(-)
> >>>
> >>> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> >>> index 9eb191b5c4de..9992bc5f4469 100644
> >>> --- a/fs/fuse/dev.c
> >>> +++ b/fs/fuse/dev.c
> >>> @@ -31,6 +31,8 @@ MODULE_ALIAS("devname:fuse");
> >>>
> >>>  static struct kmem_cache *fuse_req_cachep;
> >>>
> >>> +static void fuse_request_timeout(struct timer_list *timer);
> >>> +
> >>>  static struct fuse_dev *fuse_get_dev(struct file *file)
> >>>  {
> >>>       /*
> >>> @@ -48,6 +50,8 @@ static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
> >>>       refcount_set(&req->count, 1);
> >>>       __set_bit(FR_PENDING, &req->flags);
> >>>       req->fm = fm;
> >>> +     if (fm->fc->req_timeout)
> >>> +             timer_setup(&req->timer, fuse_request_timeout, 0);
> >>>  }
> >>>
> >>>  static struct fuse_req *fuse_request_alloc(struct fuse_mount *fm, gfp_t flags)
> >>> @@ -277,12 +281,15 @@ static void flush_bg_queue(struct fuse_conn *fc)
> >>>   * the 'end' callback is called if given, else the reference to the
> >>>   * request is released
> >>>   */
> >>> -void fuse_request_end(struct fuse_req *req)
> >>> +static void do_fuse_request_end(struct fuse_req *req, bool from_timer_callback)
> >>>  {
> >>>       struct fuse_mount *fm = req->fm;
> >>>       struct fuse_conn *fc = fm->fc;
> >>>       struct fuse_iqueue *fiq = &fc->iq;
> >>>
> >>> +     if (from_timer_callback)
> >>> +             req->out.h.error = -ETIME;
> >>> +
> >>
> >> FMHO, could we move the above error assignment up to the caller to make
> >> do_fuse_request_end() look cleaner?
> >
> > Sure, I was thinking that it looks cleaner setting this in
> > do_fuse_request_end() instead of having to set it in both
> > timeout_pending_req() and timeout_inflight_req(), but I see your point
> > as well.
> > I'll make this change in v3.
> >
> >>
> >>
> >>>       if (test_and_set_bit(FR_FINISHED, &req->flags))
> >>>               goto put_request;
> >>>
> >>> @@ -296,8 +303,6 @@ void fuse_request_end(struct fuse_req *req)
> >>>               list_del_init(&req->intr_entry);
> >>>               spin_unlock(&fiq->lock);
> >>>       }
> >>> -     WARN_ON(test_bit(FR_PENDING, &req->flags));
> >>> -     WARN_ON(test_bit(FR_SENT, &req->flags));
> >>>       if (test_bit(FR_BACKGROUND, &req->flags)) {
> >>>               spin_lock(&fc->bg_lock);
> >>>               clear_bit(FR_BACKGROUND, &req->flags);
> >>> @@ -324,13 +329,105 @@ void fuse_request_end(struct fuse_req *req)
> >>>               wake_up(&req->waitq);
> >>>       }
> >>>
> >>> +     if (!from_timer_callback && req->timer.function)
> >>> +             timer_delete_sync(&req->timer);
> >>> +
> >>
> >> Similarly, move the caller i.e. fuse_request_end() call
> >> timer_delete_sync() instead?
> >
> > I don't think we can do that because the fuse_put_request() at the end
> > of this function often holds the last refcount on the request which
> > frees the request when it releases the ref.
>
> Initially I mean timer_delete_sync() could be called before
> do_fuse_request_end() inside fuse_request_end().  But anyway it's a
> rough idea just for making the code look cleaner, without thinking if
> this logic change is right or not.

Ahh I see now what you were saying. Great suggestion! I'll change this for v3.

>
>
> >>> +static void timeout_pending_req(struct fuse_req *req)
> >>> +{
> >>> +     struct fuse_conn *fc = req->fm->fc;
> >>> +     struct fuse_iqueue *fiq = &fc->iq;
> >>> +     bool background = test_bit(FR_BACKGROUND, &req->flags);
> >>> +
> >>> +     if (background)
> >>> +             spin_lock(&fc->bg_lock);
> >>
> >> Just out of curious, why fc->bg_lock is needed here, which makes the
> >> code look less clean?
> >
> > The fc->bg_lock is needed because the background request may still be
> > on the fc->bg_queue when the request times out (eg the request hasn't
> > been flushed yet). We need to acquire the fc->bg_lock so that we can
> > delete it from the queue, in case somehting else is modifying the
> > queue at the same time.
>
> I can understand now.  Thanks!
>
> >
> >>
> >>
> >>> +     spin_lock(&fiq->lock);
> >>> +
> >>> +     if (!test_bit(FR_PENDING, &req->flags)) {
> >>> +             spin_unlock(&fiq->lock);
> >>> +             if (background)
> >>> +                     spin_unlock(&fc->bg_lock);
> >>> +             timeout_inflight_req(req);
> >>> +             return;
> >>> +     }
> >>> +
> >>> +     if (!test_bit(FR_PRIVATE, &req->flags))
> >>> +             list_del_init(&req->list);
> >>> +
> >>> +     spin_unlock(&fiq->lock);
> >>> +     if (background)
> >>> +             spin_unlock(&fc->bg_lock);
> >>> +
> >>> +     do_fuse_request_end(req, true);
> >>
> >> I'm not sure if special handling for requests in fpq->io list in needed
> >> here.  At least when connection is aborted, thos LOCKED requests in
> >> fpq->io list won't be finished instantly until they are unlocked.
> >>
> >
> > The places where FR_LOCKED gets set on the request are in
> > fuse_dev_do_write and fuse_dev_do_read when we do some of the page
> > copying stuff. In both those functions, this timeout_pending_req()
> > path isn't hit while we have the lock obtained - in fuse_dev_do_write,
> > we test and set FR_FINISHING first before doing the page copying (the
> > timeout handler will return before calling timeout_pending_req()), and
> > in fuse_dev_do_read, the locking is called after the FR_PENDING flag
> > has been cleared.
> >
> > I think there is a possibility that the timeout handler executes
> > timeout_inflight_req() while the lock is obtained in fuse_dev_do_read
> > during the page copying, but this patch added an extra
> > __fuse_get_request() on the request before doing the page copying,
> > which means the timeout handler will not free out the request while
> > the lock is held and the page copying is being done.
> >
>
> Yes, this is the only possible place where the timeout handler could
> concurrently run while the request is in copying state (i.e. LOCKED).
> As I described above, when connection is aborted, the LOCKED requests
> will be left there and won't be finished until they are unlocked.  I'm
> not sure why this special handling is needed for LOCKED requests, but I
> guess it's not because of UAF issue.  From the comment of
> lock_request(), "Up to the next unlock_request() there mustn't be
> anything that could cause a page-fault.", though I can't understand in
> which case there will be a page fault and it will be an issue.

It's not clear to me either what in the end_requests() logic that
abort_conn would call could cause a page fault. It would be great to
get some light on this.

For v3 I'm going to integrate Bernd's suggestion and disarm the timer
before dev_do_read's main logic and rearm it after, which will also
obviate this possible scenario of the timeout handler executing while
the request is in locked copying state.

Thanks,
Joanne

>
> Maybe I'm wrong.  It would be helpful if someone could shed light on this.
>
>
> --
> Thanks,
> Jingbo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/2] fuse: add optional kernel-enforced timeout for requests
  2024-08-06 15:43         ` Bernd Schubert
@ 2024-08-06 17:08           ` Joanne Koong
  0 siblings, 0 replies; 34+ messages in thread
From: Joanne Koong @ 2024-08-06 17:08 UTC (permalink / raw)
  To: Bernd Schubert; +Cc: miklos, linux-fsdevel, josef, laoar.shao, kernel-team

On Tue, Aug 6, 2024 at 8:43 AM Bernd Schubert
<bernd.schubert@fastmail.fm> wrote:
>
>
>
> On 8/6/24 00:10, Joanne Koong wrote:
> > On Mon, Aug 5, 2024 at 6:26 AM Bernd Schubert
> > <bernd.schubert@fastmail.fm> wrote:
> >>>> @@ -1280,6 +1389,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
> >>>>                  if (args->opcode == FUSE_SETXATTR)
> >>>>                          req->out.h.error = -E2BIG;
> >>>>                  fuse_request_end(req);
> >>>> +               fuse_put_request(req);
> >>>>                  goto restart;
> >>>
> >>> While rereading through fuse_dev_do_read, I just realized we also need
> >>> to handle the race condition for the error edge cases (here and in the
> >>> "goto out_end;"), since the timeout handler could have finished
> >>> executing by the time we hit the error edge case. We need to
> >>> test_and_set_bit(FR_FINISHING) so that either the timeout_handler or
> >>> dev_do_read cleans up the request, but not both. I'll fix this for v3.
> >>
> >> I know it would change semantics a bit, but wouldn't it be much easier /
> >> less racy if fuse_dev_do_read() would delete the timer when it takes a
> >> request from fiq->pending and add it back in (with new timeouts) before
> >> it returns the request?
> >>
> >
> > Ooo I really like this idea! I'm worried though that this might allow
> > potential scenarios where the fuse_dev_do_read gets descheduled after
> > disarming the timer and a non-trivial amount of time elapses before it
> > gets scheduled back (eg on a system where the CPU is starved), in
> > which case the fuse req_timeout value will be (somewhat of) a lie. If
> > you and others think this is likely fine though, then I'll incorporate
> > this into v3 which will make this logic a lot simpler :)
> >
>
> In my opinion we only need to worry about fuse server getting stuck. I
> think we would have a grave issue if fuse_dev_do_read() gets descheduled
> for a long time - the timer might not run either in that case. Main
> issue I see with removing/re-adding the timer - it might double the
> timeout in worst case. In my personal opinion acceptable as it reduces
> code complexity.
>

Awesome, thanks for this suggestion Bernd! I'll make this change for
v3, this will get rid of having to handle all the possible races
between dev_do_read and the timeout handler.

I'm planning to rearm the timer with its original req->timer.expires
(which was set to "jiffies +  fc->req_timeout" at the time the timer
was started the first time), so I think this will retain the original
timeout and won't add any extra time to it. And according to the timer
docs, "if @timer->expires is already in the past @timer will be queued
to expire at the next timer tick".

Thanks,
Joanne
>
> Thanks
> Bernd

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 0/2] fuse: add timeout option for requests
  2024-08-06 16:23                   ` Joanne Koong
@ 2024-08-06 17:11                     ` Bernd Schubert
  2024-08-06 18:26                       ` Joanne Koong
  0 siblings, 1 reply; 34+ messages in thread
From: Bernd Schubert @ 2024-08-06 17:11 UTC (permalink / raw)
  To: Joanne Koong, Yafang Shao; +Cc: miklos, linux-fsdevel, josef, kernel-team



On 8/6/24 18:23, Joanne Koong wrote:

>>
>> This is very interesting. These logs (and the ones above with the
>> lxcfs server running concurrently) are showing that the read request
>> was freed but not through the do_fuse_request_end path. It's weird
>> that fuse_simple_request reached fuse_put_request without
>> do_fuse_request_end having been called (which is the only place where
>> FR_FINISHED gets set and wakes up the wait events in
>> request_wait_answer).
>>
>> I'll take a deeper look tomorrow and try to make more sense of it.
> 
> Finally realized what's happening!
> When we kill the cat program, if the request hasn't been sent out to
> userspace yet when the fatal signal interrupts the
> wait_event_interruptible and wait_event_killable in
> request_wait_answer(), this will clean up the request manually (not
> through the fuse_request_end() path), which doesn't delete the timer.
> 
> I'll fix this for v3.
> 
> Thank you for surfacing this and it would be much appreciated if you
> could test out v3 when it's submitted to make sure.

It is still just a suggestion, but if the timer would have its own ref,
any oversight of another fuse_put_request wouldn't be fatal.


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 0/2] fuse: add timeout option for requests
  2024-08-06 17:11                     ` Bernd Schubert
@ 2024-08-06 18:26                       ` Joanne Koong
  2024-08-06 18:37                         ` Joanne Koong
  0 siblings, 1 reply; 34+ messages in thread
From: Joanne Koong @ 2024-08-06 18:26 UTC (permalink / raw)
  To: Bernd Schubert; +Cc: Yafang Shao, miklos, linux-fsdevel, josef, kernel-team

On Tue, Aug 6, 2024 at 10:11 AM Bernd Schubert
<bernd.schubert@fastmail.fm> wrote:
>
> On 8/6/24 18:23, Joanne Koong wrote:
>
> >>
> >> This is very interesting. These logs (and the ones above with the
> >> lxcfs server running concurrently) are showing that the read request
> >> was freed but not through the do_fuse_request_end path. It's weird
> >> that fuse_simple_request reached fuse_put_request without
> >> do_fuse_request_end having been called (which is the only place where
> >> FR_FINISHED gets set and wakes up the wait events in
> >> request_wait_answer).
> >>
> >> I'll take a deeper look tomorrow and try to make more sense of it.
> >
> > Finally realized what's happening!
> > When we kill the cat program, if the request hasn't been sent out to
> > userspace yet when the fatal signal interrupts the
> > wait_event_interruptible and wait_event_killable in
> > request_wait_answer(), this will clean up the request manually (not
> > through the fuse_request_end() path), which doesn't delete the timer.
> >
> > I'll fix this for v3.
> >
> > Thank you for surfacing this and it would be much appreciated if you
> > could test out v3 when it's submitted to make sure.
>
> It is still just a suggestion, but if the timer would have its own ref,
> any oversight of another fuse_put_request wouldn't be fatal.
>

Thanks for the suggestion. My main concerns are whether it's worth the
extra (minimal?) performance penalty for something that's not strictly
needed and whether it ends up adding more of a burden to keep track of
the timer ref (eg in error handling like the case above where the
fatal signal is for a request that hasn't been sent to userspace yet,
having to account for the extra timer ref if the timer callback didn't
execute). I don't think adding a timer ref would prevent fatal crashes
on fuse_put_request oversights (unless we also mess up not releasing a
corresponding timer ref  :))

I don't feel that strongly about this though so if you do, I can add
this in for v3.

Thanks,
Joanne

>
> Thanks,
> Bernd

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 0/2] fuse: add timeout option for requests
  2024-08-06 18:26                       ` Joanne Koong
@ 2024-08-06 18:37                         ` Joanne Koong
  2024-08-06 20:08                           ` Bernd Schubert
  0 siblings, 1 reply; 34+ messages in thread
From: Joanne Koong @ 2024-08-06 18:37 UTC (permalink / raw)
  To: Bernd Schubert; +Cc: Yafang Shao, miklos, linux-fsdevel, josef, kernel-team

On Tue, Aug 6, 2024 at 11:26 AM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Tue, Aug 6, 2024 at 10:11 AM Bernd Schubert
> <bernd.schubert@fastmail.fm> wrote:
> >
> > On 8/6/24 18:23, Joanne Koong wrote:
> >
> > >>
> > >> This is very interesting. These logs (and the ones above with the
> > >> lxcfs server running concurrently) are showing that the read request
> > >> was freed but not through the do_fuse_request_end path. It's weird
> > >> that fuse_simple_request reached fuse_put_request without
> > >> do_fuse_request_end having been called (which is the only place where
> > >> FR_FINISHED gets set and wakes up the wait events in
> > >> request_wait_answer).
> > >>
> > >> I'll take a deeper look tomorrow and try to make more sense of it.
> > >
> > > Finally realized what's happening!
> > > When we kill the cat program, if the request hasn't been sent out to
> > > userspace yet when the fatal signal interrupts the
> > > wait_event_interruptible and wait_event_killable in
> > > request_wait_answer(), this will clean up the request manually (not
> > > through the fuse_request_end() path), which doesn't delete the timer.
> > >
> > > I'll fix this for v3.
> > >
> > > Thank you for surfacing this and it would be much appreciated if you
> > > could test out v3 when it's submitted to make sure.
> >
> > It is still just a suggestion, but if the timer would have its own ref,
> > any oversight of another fuse_put_request wouldn't be fatal.
> >
>
> Thanks for the suggestion. My main concerns are whether it's worth the
> extra (minimal?) performance penalty for something that's not strictly
> needed and whether it ends up adding more of a burden to keep track of
> the timer ref (eg in error handling like the case above where the
> fatal signal is for a request that hasn't been sent to userspace yet,
> having to account for the extra timer ref if the timer callback didn't
> execute). I don't think adding a timer ref would prevent fatal crashes
> on fuse_put_request oversights (unless we also mess up not releasing a
> corresponding timer ref  :))

I amend this last sentence - I just realized your point about the
fatal crashes is that if we accidentally miss a fuse_put_request
altogether, we'd also miss releasing the timer ref in that path, which
means the timer callback would be the one releasing the last ref.

>
> I don't feel that strongly about this though so if you do, I can add
> this in for v3.
>
> Thanks,
> Joanne
>
> >
> > Thanks,
> > Bernd

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 0/2] fuse: add timeout option for requests
  2024-08-06 18:37                         ` Joanne Koong
@ 2024-08-06 20:08                           ` Bernd Schubert
  0 siblings, 0 replies; 34+ messages in thread
From: Bernd Schubert @ 2024-08-06 20:08 UTC (permalink / raw)
  To: Joanne Koong; +Cc: Yafang Shao, miklos, linux-fsdevel, josef, kernel-team



On 8/6/24 20:37, Joanne Koong wrote:
> On Tue, Aug 6, 2024 at 11:26 AM Joanne Koong <joannelkoong@gmail.com> wrote:
>>
>> On Tue, Aug 6, 2024 at 10:11 AM Bernd Schubert
>> <bernd.schubert@fastmail.fm> wrote:
>>>
>>> On 8/6/24 18:23, Joanne Koong wrote:
>>>
>>>>>
>>>>> This is very interesting. These logs (and the ones above with the
>>>>> lxcfs server running concurrently) are showing that the read request
>>>>> was freed but not through the do_fuse_request_end path. It's weird
>>>>> that fuse_simple_request reached fuse_put_request without
>>>>> do_fuse_request_end having been called (which is the only place where
>>>>> FR_FINISHED gets set and wakes up the wait events in
>>>>> request_wait_answer).
>>>>>
>>>>> I'll take a deeper look tomorrow and try to make more sense of it.
>>>>
>>>> Finally realized what's happening!
>>>> When we kill the cat program, if the request hasn't been sent out to
>>>> userspace yet when the fatal signal interrupts the
>>>> wait_event_interruptible and wait_event_killable in
>>>> request_wait_answer(), this will clean up the request manually (not
>>>> through the fuse_request_end() path), which doesn't delete the timer.
>>>>
>>>> I'll fix this for v3.
>>>>
>>>> Thank you for surfacing this and it would be much appreciated if you
>>>> could test out v3 when it's submitted to make sure.
>>>
>>> It is still just a suggestion, but if the timer would have its own ref,
>>> any oversight of another fuse_put_request wouldn't be fatal.
>>>
>>
>> Thanks for the suggestion. My main concerns are whether it's worth the
>> extra (minimal?) performance penalty for something that's not strictly
>> needed and whether it ends up adding more of a burden to keep track of
>> the timer ref (eg in error handling like the case above where the
>> fatal signal is for a request that hasn't been sent to userspace yet,
>> having to account for the extra timer ref if the timer callback didn't
>> execute). I don't think adding a timer ref would prevent fatal crashes
>> on fuse_put_request oversights (unless we also mess up not releasing a
>> corresponding timer ref  :))
> 
> I amend this last sentence - I just realized your point about the
> fatal crashes is that if we accidentally miss a fuse_put_request
> altogether, we'd also miss releasing the timer ref in that path, which
> means the timer callback would be the one releasing the last ref.
> 

Yeah, that is what I meant. It is a bit defensive coding, but I don't
have a strong opinion about it.


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2024-08-06 20:08 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-30  0:23 [PATCH v2 0/2] fuse: add timeout option for requests Joanne Koong
2024-07-30  0:23 ` [PATCH v2 1/2] fuse: add optional kernel-enforced timeout " Joanne Koong
2024-08-04 22:46   ` Bernd Schubert
2024-08-05  4:45     ` Joanne Koong
2024-08-05 13:05       ` Bernd Schubert
2024-08-05  4:52   ` Joanne Koong
2024-08-05 13:26     ` Bernd Schubert
2024-08-05 22:10       ` Joanne Koong
2024-08-06 15:43         ` Bernd Schubert
2024-08-06 17:08           ` Joanne Koong
2024-08-05  7:32   ` Jingbo Xu
2024-08-05 22:53     ` Joanne Koong
2024-08-06  2:45       ` Jingbo Xu
2024-08-06 16:43         ` Joanne Koong
2024-08-06 15:50       ` Bernd Schubert
2024-07-30  0:23 ` [PATCH v2 2/2] fuse: add default_request_timeout and max_request_timeout sysctls Joanne Koong
2024-07-30  7:49   ` kernel test robot
2024-07-30  9:14   ` kernel test robot
2024-08-05  7:38   ` Jingbo Xu
2024-08-06  1:26     ` Joanne Koong
2024-07-30  5:59 ` [PATCH v2 0/2] fuse: add timeout option for requests Yafang Shao
2024-07-30 18:16   ` Joanne Koong
2024-07-31  2:13     ` Yafang Shao
2024-07-31 17:52       ` Joanne Koong
2024-07-31 18:46         ` Joanne Koong
2024-08-01  2:47           ` Yafang Shao
2024-08-02 19:05             ` Joanne Koong
2024-08-04  7:46               ` Yafang Shao
2024-08-05  5:05                 ` Joanne Koong
2024-08-06 16:23                   ` Joanne Koong
2024-08-06 17:11                     ` Bernd Schubert
2024-08-06 18:26                       ` Joanne Koong
2024-08-06 18:37                         ` Joanne Koong
2024-08-06 20:08                           ` Bernd Schubert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).