linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/2] fuse: Add timeout support for fuse connection
@ 2024-07-24  7:11 Yafang Shao
  2024-07-24  7:11 ` [RFC PATCH 1/2] fuse: Add "timeout" sysfs attribute for each " Yafang Shao
  2024-07-24  7:11 ` [RFC PATCH 2/2] fuse: Enhance each fuse connection with timeout support Yafang Shao
  0 siblings, 2 replies; 6+ messages in thread
From: Yafang Shao @ 2024-07-24  7:11 UTC (permalink / raw)
  To: miklos; +Cc: linux-fsdevel, Yafang Shao

Currently, when an issue arises within the FUSE daemon, the FUSE connection
can become indefinitely stuck. The only resolution currently available is
to manually abort the connection using the sysfs interface. However,
relying solely on the abort interface for automatic error handling is
unreliable. To address this, a timeout mechanism has been introduced in
this series. When the timeout is reached without receiving a response from
the FUSE daemon, the FUSE request will terminate with an error code
returned to the user space, enabling the user space to handle the situation
appropriately.

Furthermore, the timeout value is configurable by the user, allowing for
customization based on specific workload requirements.

Yafang Shao (2):
  fuse: Add "timeout" sysfs attribute for each fuse connection
  fuse: Enhance each fuse connection with timeout support

 fs/fuse/control.c | 50 ++++++++++++++++++++++++++++++++++++++++-
 fs/fuse/dev.c     | 57 ++++++++++++++++++++++++++++++++++++++++-------
 fs/fuse/fuse_i.h  |  7 +++++-
 3 files changed, 104 insertions(+), 10 deletions(-)

-- 
2.43.5


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RFC PATCH 1/2] fuse: Add "timeout" sysfs attribute for each fuse connection
  2024-07-24  7:11 [RFC PATCH 0/2] fuse: Add timeout support for fuse connection Yafang Shao
@ 2024-07-24  7:11 ` Yafang Shao
  2024-07-24  7:11 ` [RFC PATCH 2/2] fuse: Enhance each fuse connection with timeout support Yafang Shao
  1 sibling, 0 replies; 6+ messages in thread
From: Yafang Shao @ 2024-07-24  7:11 UTC (permalink / raw)
  To: miklos; +Cc: linux-fsdevel, Yafang Shao

A dedicated "timeout" sysfs attribute has been introduced for each fuse
connection, empowering users to manage and control the timeout duration for
individual connections.

This is a preparation for the followup patch.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 fs/fuse/control.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++-
 fs/fuse/fuse_i.h  |  5 ++++-
 2 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/fs/fuse/control.c b/fs/fuse/control.c
index 97ac994ff78f..247fbec72cda 100644
--- a/fs/fuse/control.c
+++ b/fs/fuse/control.c
@@ -180,6 +180,44 @@ static ssize_t fuse_conn_congestion_threshold_write(struct file *file,
 	return ret;
 }
 
+static ssize_t fuse_conn_timeout_read(struct file *file,
+				      char __user *buf, size_t len,
+				      loff_t *ppos)
+{
+	struct fuse_conn *fc;
+	u32 val;
+
+	fc = fuse_ctl_file_conn_get(file);
+	if (!fc)
+		return 0;
+
+	val = READ_ONCE(fc->timeout);
+	fuse_conn_put(fc);
+	return fuse_conn_limit_read(file, buf, len, ppos, val);
+}
+
+static ssize_t fuse_conn_timeout_write(struct file *file,
+				       const char __user *buf,
+				       size_t count, loff_t *ppos)
+{
+	struct fuse_conn *fc;
+	ssize_t ret;
+	u32 val;
+
+	ret = fuse_conn_limit_write(file, buf, count, ppos, &val,
+				    3600);
+	if (ret <= 0)
+		goto out;
+	fc = fuse_ctl_file_conn_get(file);
+	if (!fc)
+		goto out;
+
+	WRITE_ONCE(fc->timeout, val);
+	fuse_conn_put(fc);
+out:
+	return ret;
+}
+
 static const struct file_operations fuse_ctl_abort_ops = {
 	.open = nonseekable_open,
 	.write = fuse_conn_abort_write,
@@ -206,6 +244,13 @@ static const struct file_operations fuse_conn_congestion_threshold_ops = {
 	.llseek = no_llseek,
 };
 
+static const struct file_operations fuse_conn_timeout_ops = {
+	.open = nonseekable_open,
+	.read = fuse_conn_timeout_read,
+	.write = fuse_conn_timeout_write,
+	.llseek = no_llseek,
+};
+
 static struct dentry *fuse_ctl_add_dentry(struct dentry *parent,
 					  struct fuse_conn *fc,
 					  const char *name,
@@ -274,7 +319,10 @@ int fuse_ctl_add_conn(struct fuse_conn *fc)
 				 1, NULL, &fuse_conn_max_background_ops) ||
 	    !fuse_ctl_add_dentry(parent, fc, "congestion_threshold",
 				 S_IFREG | 0600, 1, NULL,
-				 &fuse_conn_congestion_threshold_ops))
+				 &fuse_conn_congestion_threshold_ops) ||
+	    !fuse_ctl_add_dentry(parent, fc, "timeout",
+				 S_IFREG | 0600, 1, NULL,
+				 &fuse_conn_timeout_ops))
 		goto err;
 
 	return 0;
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index f23919610313..367601bf7285 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -45,7 +45,7 @@
 #define FUSE_NAME_MAX 1024
 
 /** Number of dentries for each connection in the control filesystem */
-#define FUSE_CTL_NUM_DENTRIES 5
+#define FUSE_CTL_NUM_DENTRIES 6
 
 /** List of active connections */
 extern struct list_head fuse_conn_list;
@@ -917,6 +917,9 @@ struct fuse_conn {
 	/** IDR for backing files ids */
 	struct idr backing_files_map;
 #endif
+
+	/* fuse request timeout */
+	u32 timeout;
 };
 
 /*
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RFC PATCH 2/2] fuse: Enhance each fuse connection with timeout support
  2024-07-24  7:11 [RFC PATCH 0/2] fuse: Add timeout support for fuse connection Yafang Shao
  2024-07-24  7:11 ` [RFC PATCH 1/2] fuse: Add "timeout" sysfs attribute for each " Yafang Shao
@ 2024-07-24  7:11 ` Yafang Shao
  2024-07-24 17:09   ` Joanne Koong
  1 sibling, 1 reply; 6+ messages in thread
From: Yafang Shao @ 2024-07-24  7:11 UTC (permalink / raw)
  To: miklos; +Cc: linux-fsdevel, Yafang Shao

In our experience with fuse.hdfs, we encountered a challenge where, if the
HDFS server encounters an issue, the fuse.hdfs daemon—responsible for
sending requests to the HDFS server—can get stuck indefinitely.
Consequently, access to the fuse.hdfs directory becomes unresponsive.
The current workaround involves manually aborting the fuse connection,
which is unreliable in automatically addressing the abnormal connection
issue. To alleviate this pain point, we have implemented a timeout
mechanism that automatically handles such abnormal cases, thereby
streamlining the process and enhancing reliability.

The timeout value is configurable by the user, allowing them to tailor it
according to their specific workload requirements.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 fs/fuse/dev.c    | 57 +++++++++++++++++++++++++++++++++++++++++-------
 fs/fuse/fuse_i.h |  2 ++
 2 files changed, 51 insertions(+), 8 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 9eb191b5c4de..ff9c55bcfb3d 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -369,10 +369,27 @@ static void request_wait_answer(struct fuse_req *req)
 
 	if (!fc->no_interrupt) {
 		/* Any signal may interrupt this */
-		err = wait_event_interruptible(req->waitq,
-					test_bit(FR_FINISHED, &req->flags));
-		if (!err)
-			return;
+		if (!fc->timeout) {
+			err = wait_event_interruptible(req->waitq,
+						       test_bit(FR_FINISHED, &req->flags));
+			if (!err)
+				return;
+		} else {
+			err = wait_event_interruptible_timeout(req->waitq,
+							       test_bit(FR_FINISHED, &req->flags),
+							       (long)fc->timeout * HZ);
+			if (err > 0)
+				return;
+
+			/* timeout */
+			if (!err) {
+				req->out.h.error = -EAGAIN;
+				set_bit(FR_TIMEOUT, &req->flags);
+				/* matches barrier in fuse_dev_do_write() */
+				smp_mb__after_atomic();
+				return;
+			}
+		}
 
 		set_bit(FR_INTERRUPTED, &req->flags);
 		/* matches barrier in fuse_dev_do_read() */
@@ -383,10 +400,27 @@ static void request_wait_answer(struct fuse_req *req)
 
 	if (!test_bit(FR_FORCE, &req->flags)) {
 		/* Only fatal signals may interrupt this */
-		err = wait_event_killable(req->waitq,
-					test_bit(FR_FINISHED, &req->flags));
-		if (!err)
-			return;
+		if (!fc->timeout) {
+			err = wait_event_killable(req->waitq,
+						  test_bit(FR_FINISHED, &req->flags));
+			if (!err)
+				return;
+		} else {
+			err = wait_event_killable_timeout(req->waitq,
+							  test_bit(FR_FINISHED, &req->flags),
+							  (long)fc->timeout * HZ);
+			if (err > 0)
+				return;
+
+			/* timeout */
+			if (!err) {
+				req->out.h.error = -EAGAIN;
+				set_bit(FR_TIMEOUT, &req->flags);
+				/* matches barrier in fuse_dev_do_write() */
+				smp_mb__after_atomic();
+				return;
+			}
+		}
 
 		spin_lock(&fiq->lock);
 		/* Request is not yet in userspace, bail out */
@@ -1951,6 +1985,13 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
 		goto copy_finish;
 	}
 
+	/* matches barrier in request_wait_answer() */
+	smp_mb__after_atomic();
+	if (test_and_clear_bit(FR_TIMEOUT, &req->flags)) {
+		spin_unlock(&fpq->lock);
+		goto copy_finish;
+	}
+
 	/* Is it an interrupt reply ID? */
 	if (oh.unique & FUSE_INT_REQ_BIT) {
 		__fuse_get_request(req);
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 367601bf7285..c1467eb8c2e9 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -375,6 +375,7 @@ struct fuse_io_priv {
  * FR_FINISHED:		request is finished
  * FR_PRIVATE:		request is on private list
  * FR_ASYNC:		request is asynchronous
+ * FR_TIMEOUT:		request is timeout
  */
 enum fuse_req_flag {
 	FR_ISREPLY,
@@ -389,6 +390,7 @@ enum fuse_req_flag {
 	FR_FINISHED,
 	FR_PRIVATE,
 	FR_ASYNC,
+	FR_TIMEOUT,
 };
 
 /**
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH 2/2] fuse: Enhance each fuse connection with timeout support
  2024-07-24  7:11 ` [RFC PATCH 2/2] fuse: Enhance each fuse connection with timeout support Yafang Shao
@ 2024-07-24 17:09   ` Joanne Koong
  2024-07-25  2:06     ` Yafang Shao
  0 siblings, 1 reply; 6+ messages in thread
From: Joanne Koong @ 2024-07-24 17:09 UTC (permalink / raw)
  To: Yafang Shao; +Cc: miklos, linux-fsdevel

On Wed, Jul 24, 2024 at 12:12 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> In our experience with fuse.hdfs, we encountered a challenge where, if the
> HDFS server encounters an issue, the fuse.hdfs daemon—responsible for
> sending requests to the HDFS server—can get stuck indefinitely.
> Consequently, access to the fuse.hdfs directory becomes unresponsive.
> The current workaround involves manually aborting the fuse connection,
> which is unreliable in automatically addressing the abnormal connection
> issue. To alleviate this pain point, we have implemented a timeout
> mechanism that automatically handles such abnormal cases, thereby
> streamlining the process and enhancing reliability.
>
> The timeout value is configurable by the user, allowing them to tailor it
> according to their specific workload requirements.
>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>

Hi Yafang,

There was a similar thread/conversation about timeouts started in this
link from last week
https://lore.kernel.org/linux-fsdevel/20240717213458.1613347-1-joannelkoong@gmail.com/#t

The core idea is the same but also handles cleanup for requests that
time out, to avoid memory leaks in cases where the server never
replies to the request. For v2, I am going to add timeouts for
background requests as well.


Thanks,
Joanne

> ---
>  fs/fuse/dev.c    | 57 +++++++++++++++++++++++++++++++++++++++++-------
>  fs/fuse/fuse_i.h |  2 ++
>  2 files changed, 51 insertions(+), 8 deletions(-)
>
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index 9eb191b5c4de..ff9c55bcfb3d 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -369,10 +369,27 @@ static void request_wait_answer(struct fuse_req *req)
>
>         if (!fc->no_interrupt) {
>                 /* Any signal may interrupt this */
> -               err = wait_event_interruptible(req->waitq,
> -                                       test_bit(FR_FINISHED, &req->flags));
> -               if (!err)
> -                       return;
> +               if (!fc->timeout) {
> +                       err = wait_event_interruptible(req->waitq,
> +                                                      test_bit(FR_FINISHED, &req->flags));
> +                       if (!err)
> +                               return;
> +               } else {
> +                       err = wait_event_interruptible_timeout(req->waitq,
> +                                                              test_bit(FR_FINISHED, &req->flags),
> +                                                              (long)fc->timeout * HZ);
> +                       if (err > 0)
> +                               return;
> +
> +                       /* timeout */
> +                       if (!err) {
> +                               req->out.h.error = -EAGAIN;
> +                               set_bit(FR_TIMEOUT, &req->flags);
> +                               /* matches barrier in fuse_dev_do_write() */
> +                               smp_mb__after_atomic();
> +                               return;
> +                       }
> +               }
>
>                 set_bit(FR_INTERRUPTED, &req->flags);
>                 /* matches barrier in fuse_dev_do_read() */
> @@ -383,10 +400,27 @@ static void request_wait_answer(struct fuse_req *req)
>
>         if (!test_bit(FR_FORCE, &req->flags)) {
>                 /* Only fatal signals may interrupt this */
> -               err = wait_event_killable(req->waitq,
> -                                       test_bit(FR_FINISHED, &req->flags));
> -               if (!err)
> -                       return;
> +               if (!fc->timeout) {
> +                       err = wait_event_killable(req->waitq,
> +                                                 test_bit(FR_FINISHED, &req->flags));
> +                       if (!err)
> +                               return;
> +               } else {
> +                       err = wait_event_killable_timeout(req->waitq,
> +                                                         test_bit(FR_FINISHED, &req->flags),
> +                                                         (long)fc->timeout * HZ);
> +                       if (err > 0)
> +                               return;
> +
> +                       /* timeout */
> +                       if (!err) {
> +                               req->out.h.error = -EAGAIN;
> +                               set_bit(FR_TIMEOUT, &req->flags);
> +                               /* matches barrier in fuse_dev_do_write() */
> +                               smp_mb__after_atomic();
> +                               return;
> +                       }
> +               }
>
>                 spin_lock(&fiq->lock);
>                 /* Request is not yet in userspace, bail out */
> @@ -1951,6 +1985,13 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
>                 goto copy_finish;
>         }
>
> +       /* matches barrier in request_wait_answer() */
> +       smp_mb__after_atomic();
> +       if (test_and_clear_bit(FR_TIMEOUT, &req->flags)) {
> +               spin_unlock(&fpq->lock);
> +               goto copy_finish;
> +       }
> +
>         /* Is it an interrupt reply ID? */
>         if (oh.unique & FUSE_INT_REQ_BIT) {
>                 __fuse_get_request(req);
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index 367601bf7285..c1467eb8c2e9 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -375,6 +375,7 @@ struct fuse_io_priv {
>   * FR_FINISHED:                request is finished
>   * FR_PRIVATE:         request is on private list
>   * FR_ASYNC:           request is asynchronous
> + * FR_TIMEOUT:         request is timeout
>   */
>  enum fuse_req_flag {
>         FR_ISREPLY,
> @@ -389,6 +390,7 @@ enum fuse_req_flag {
>         FR_FINISHED,
>         FR_PRIVATE,
>         FR_ASYNC,
> +       FR_TIMEOUT,
>  };
>
>  /**
> --
> 2.43.5
>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH 2/2] fuse: Enhance each fuse connection with timeout support
  2024-07-24 17:09   ` Joanne Koong
@ 2024-07-25  2:06     ` Yafang Shao
  2024-07-25 17:56       ` Joanne Koong
  0 siblings, 1 reply; 6+ messages in thread
From: Yafang Shao @ 2024-07-25  2:06 UTC (permalink / raw)
  To: Joanne Koong; +Cc: miklos, linux-fsdevel

On Thu, Jul 25, 2024 at 1:09 AM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Wed, Jul 24, 2024 at 12:12 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > In our experience with fuse.hdfs, we encountered a challenge where, if the
> > HDFS server encounters an issue, the fuse.hdfs daemon—responsible for
> > sending requests to the HDFS server—can get stuck indefinitely.
> > Consequently, access to the fuse.hdfs directory becomes unresponsive.
> > The current workaround involves manually aborting the fuse connection,
> > which is unreliable in automatically addressing the abnormal connection
> > issue. To alleviate this pain point, we have implemented a timeout
> > mechanism that automatically handles such abnormal cases, thereby
> > streamlining the process and enhancing reliability.
> >
> > The timeout value is configurable by the user, allowing them to tailor it
> > according to their specific workload requirements.
> >
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
>
> Hi Yafang,
>
> There was a similar thread/conversation about timeouts started in this
> link from last week
> https://lore.kernel.org/linux-fsdevel/20240717213458.1613347-1-joannelkoong@gmail.com/#t
>

I am not currently subscribed to linux-fsdevel, so I missed your patch.
Thanks for your information. I will test your patch.

> The core idea is the same but also handles cleanup for requests that
> time out, to avoid memory leaks in cases where the server never
> replies to the request. For v2, I am going to add timeouts for
> background requests as well.

Please CC me if you send new versions.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH 2/2] fuse: Enhance each fuse connection with timeout support
  2024-07-25  2:06     ` Yafang Shao
@ 2024-07-25 17:56       ` Joanne Koong
  0 siblings, 0 replies; 6+ messages in thread
From: Joanne Koong @ 2024-07-25 17:56 UTC (permalink / raw)
  To: Yafang Shao; +Cc: miklos, linux-fsdevel

On Wed, Jul 24, 2024 at 7:07 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Thu, Jul 25, 2024 at 1:09 AM Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > On Wed, Jul 24, 2024 at 12:12 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> > >
> > > In our experience with fuse.hdfs, we encountered a challenge where, if the
> > > HDFS server encounters an issue, the fuse.hdfs daemon—responsible for
> > > sending requests to the HDFS server—can get stuck indefinitely.
> > > Consequently, access to the fuse.hdfs directory becomes unresponsive.
> > > The current workaround involves manually aborting the fuse connection,
> > > which is unreliable in automatically addressing the abnormal connection
> > > issue. To alleviate this pain point, we have implemented a timeout
> > > mechanism that automatically handles such abnormal cases, thereby
> > > streamlining the process and enhancing reliability.
> > >
> > > The timeout value is configurable by the user, allowing them to tailor it
> > > according to their specific workload requirements.
> > >
> > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> >
> > Hi Yafang,
> >
> > There was a similar thread/conversation about timeouts started in this
> > link from last week
> > https://lore.kernel.org/linux-fsdevel/20240717213458.1613347-1-joannelkoong@gmail.com/#t
> >
>
> I am not currently subscribed to linux-fsdevel, so I missed your patch.
> Thanks for your information. I will test your patch.
>
> > The core idea is the same but also handles cleanup for requests that
> > time out, to avoid memory leaks in cases where the server never
> > replies to the request. For v2, I am going to add timeouts for
> > background requests as well.
>
> Please CC me if you send new versions.

Will do. I'll make sure you are cc-ed.

Thanks,
Joanne
>
> --
> Regards
> Yafang

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-07-25 17:56 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-24  7:11 [RFC PATCH 0/2] fuse: Add timeout support for fuse connection Yafang Shao
2024-07-24  7:11 ` [RFC PATCH 1/2] fuse: Add "timeout" sysfs attribute for each " Yafang Shao
2024-07-24  7:11 ` [RFC PATCH 2/2] fuse: Enhance each fuse connection with timeout support Yafang Shao
2024-07-24 17:09   ` Joanne Koong
2024-07-25  2:06     ` Yafang Shao
2024-07-25 17:56       ` Joanne Koong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).