From: Josef Bacik <josef@toxicpanda.com>
To: Joanne Koong <joannelkoong@gmail.com>
Cc: miklos@szeredi.hu, linux-fsdevel@vger.kernel.org,
osandov@osandov.com, kernel-team@meta.com
Subject: Re: [PATCH] fuse: add optional kernel-enforced timeout for fuse requests
Date: Thu, 18 Jul 2024 10:44:30 -0400 [thread overview]
Message-ID: <20240718144430.GA2099026@perftesting> (raw)
In-Reply-To: <20240717213458.1613347-1-joannelkoong@gmail.com>
On Wed, Jul 17, 2024 at 02:34:58PM -0700, Joanne Koong wrote:
> There are situations where fuse servers can become unresponsive or take
> too long to reply to a request. Currently there is no upper bound on
> how long a request may take, which may be frustrating to users who get
> stuck waiting for a request to complete.
>
> This commit adds a daemon timeout option (in seconds) for fuse requests.
> If the timeout elapses before the request is replied to, the request will
> fail with -ETIME.
>
> There are 3 possibilities for a request that times out:
> a) The request times out before the request has been sent to userspace
> b) The request times out after the request has been sent to userspace
> and before it receives a reply from the server
> c) The request times out after the request has been sent to userspace
> and the server replies while the kernel is timing out the request
>
> Proper synchronization must be added to ensure that the request is
> handled correctly in all of these cases. To this effect, there is a new
> FR_PROCESSING bit added to the request flags, which is set atomically by
> either the timeout handler (see fuse_request_timeout()) which is invoked
> after the request timeout elapses or set by the request reply handler
> (see dev_do_write()), whichever gets there first.
>
> If the reply handler and the timeout handler are executing simultaneously
> and the reply handler sets FR_PROCESSING before the timeout handler, then
> the request is re-queued onto the waitqueue and the kernel will process the
> reply as though the timeout did not elapse. If the timeout handler sets
> FR_PROCESSING before the reply handler, then the request will fail with
> -ETIME and the request will be cleaned up.
>
> Proper acquires on the request reference must be added to ensure that the
> timeout handler does not drop the last refcount on the request while the
> reply handler (dev_do_write()) or forwarder handler (dev_do_read()) is
> still accessing the request. (By "forwarder handler", this is the handler
> that forwards the request to userspace).
>
> Currently, this is the lifecycle of the request refcount:
>
> Request is created:
> fuse_simple_request -> allocates request, sets refcount to 1
> __fuse_request_send -> acquires refcount
> queues request and waits for reply...
> fuse_simple_request -> drops refcount
>
> Request is freed:
> fuse_dev_do_write
> fuse_request_end -> drops refcount on request
>
> The timeout handler drops the refcount on the request so that the
> request is properly cleaned up if a reply is never received. Because of
> this, both the forwarder handler and the reply handler must acquire a refcount
> on the request while it accesses the request, and the refcount must be
> acquired while the lock of the list the request is on is held.
>
> There is a potential race if the request is being forwarded to
> userspace while the timeout handler is executing (eg FR_PENDING has
> already been cleared but dev_do_read() hasn't finished executing). This
> is a problem because this would free the request but the request has not
> been removed from the fpq list it's on. To prevent this, dev_do_read()
> must check FR_PROCESSING at the end of its logic and remove the request
> from the fpq list if the timeout occurred.
>
> There is also the case where the connection may be aborted or the
> device may be released while the timeout handler is running. To protect
> against an extra refcount drop on the request, the timeout handler
> checks the connected state of the list and lets the abort handler drop the
> last reference if the abort is running simultaneously. Similarly, the
> timeout handler also needs to check if the req->out.h.error is set to
> -ESTALE, which indicates that the device release is cleaning up the
> request. In both these cases, the timeout handler will return without
> dropping the refcount.
>
> Please also note that background requests are not applicable for timeouts
> since they are asynchronous.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
> fs/fuse/dev.c | 177 ++++++++++++++++++++++++++++++++++++++++++++---
> fs/fuse/fuse_i.h | 12 ++++
> fs/fuse/inode.c | 7 ++
> 3 files changed, 188 insertions(+), 8 deletions(-)
>
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index 9eb191b5c4de..7dd7b244951b 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -331,6 +331,69 @@ void fuse_request_end(struct fuse_req *req)
> }
> EXPORT_SYMBOL_GPL(fuse_request_end);
>
> +/* fuse_request_end for requests that timeout */
> +static void fuse_request_timeout(struct fuse_req *req)
> +{
> + struct fuse_conn *fc = req->fm->fc;
> + struct fuse_iqueue *fiq = &fc->iq;
> + struct fuse_pqueue *fpq;
> +
> + spin_lock(&fiq->lock);
> + if (!fiq->connected) {
> + spin_unlock(&fiq->lock);
> + /*
> + * Connection is being aborted. The abort will release
> + * the refcount on the request
> + */
> + req->out.h.error = -ECONNABORTED;
> + return;
> + }
> + if (test_bit(FR_PENDING, &req->flags)) {
> + /* Request is not yet in userspace, bail out */
> + list_del(&req->list);
> + spin_unlock(&fiq->lock);
> + req->out.h.error = -ETIME;
> + __fuse_put_request(req);
Why is this safe? We could be the last holder of the reference on this request
correct? The only places using __fuse_put_request() would be where we are in a
path where the caller already holds a reference on the request. Since this is
async it may not be the case right?
If it is safe then it's just confusing and warrants a comment.
> + return;
> + }
> + if (test_bit(FR_INTERRUPTED, &req->flags))
> + list_del_init(&req->intr_entry);
> +
> + fpq = req->fpq;
> + spin_unlock(&fiq->lock);
> +
> + if (fpq) {
> + spin_lock(&fpq->lock);
> + if (!fpq->connected && (!test_bit(FR_PRIVATE, &req->flags))) {
^^
You don't need the extra () there.
> + spin_unlock(&fpq->lock);
> + /*
> + * Connection is being aborted. The abort will release
> + * the refcount on the request
> + */
> + req->out.h.error = -ECONNABORTED;
> + return;
> + }
> + if (req->out.h.error == -ESTALE) {
> + /*
> + * Device is being released. The fuse_dev_release call
> + * will drop the refcount on the request
> + */
> + spin_unlock(&fpq->lock);
> + return;
> + }
> + if (!test_bit(FR_PRIVATE, &req->flags))
> + list_del_init(&req->list);
> + spin_unlock(&fpq->lock);
> + }
> +
> + req->out.h.error = -ETIME;
> +
> + if (test_bit(FR_ASYNC, &req->flags))
> + req->args->end(req->fm, req->args, req->out.h.error);
> +
> + fuse_put_request(req);
> +}
Just a general styling thing, we have two different states for requests here,
PENDING and !PENDING correct? I think it may be better to do something like
if (test_bit(FR_PENDING, &req->flags))
timeout_pending_req();
else
timeout_inflight_req();
and then in timeout_pending_req() you do
spin_lock(&fiq->lock);
if (!test_bit(FR_PENDING, &req->flags)) {
spin_unlock(&fiq_lock);
timeout_inflight_req();
return;
}
This will keep the two different state cleanup functions separate and a little
cleaner to grok.
> +
> static int queue_interrupt(struct fuse_req *req)
> {
> struct fuse_iqueue *fiq = &req->fm->fc->iq;
> @@ -361,6 +424,62 @@ static int queue_interrupt(struct fuse_req *req)
> return 0;
> }
>
> +enum wait_type {
> + WAIT_TYPE_INTERRUPTIBLE,
> + WAIT_TYPE_KILLABLE,
> + WAIT_TYPE_NONINTERRUPTIBLE,
> +};
> +
> +static int fuse_wait_event_interruptible_timeout(struct fuse_req *req)
> +{
> + struct fuse_conn *fc = req->fm->fc;
> +
> + return wait_event_interruptible_timeout(req->waitq,
> + test_bit(FR_FINISHED,
> + &req->flags),
> + fc->daemon_timeout);
> +}
> +ALLOW_ERROR_INJECTION(fuse_wait_event_interruptible_timeout, ERRNO);
> +
> +static int wait_answer_timeout(struct fuse_req *req, enum wait_type type)
> +{
> + struct fuse_conn *fc = req->fm->fc;
> + int err;
> +
> +wait_answer_start:
> + if (type == WAIT_TYPE_INTERRUPTIBLE)
> + err = fuse_wait_event_interruptible_timeout(req);
> + else if (type == WAIT_TYPE_KILLABLE)
> + err = wait_event_killable_timeout(req->waitq,
> + test_bit(FR_FINISHED, &req->flags),
> + fc->daemon_timeout);
> +
> + else if (type == WAIT_TYPE_NONINTERRUPTIBLE)
> + err = wait_event_timeout(req->waitq, test_bit(FR_FINISHED, &req->flags),
> + fc->daemon_timeout);
> + else
> + WARN_ON(1);
This will leak some random value for err, so initialize err to something that
will be dealt with, like -EINVAL;
> +
> + /* request was answered */
> + if (err > 0)
> + return 0;
> +
> + /* request was not answered in time */
> + if (err == 0) {
> + if (test_and_set_bit(FR_PROCESSING, &req->flags))
> + /* request reply is being processed by kernel right now.
> + * we should wait for the answer.
> + */
Format for multiline comments is
/*
* blah
* blah
*/
and since this is a 1 line if statement put it above the if statement.
> + goto wait_answer_start;
> +
> + fuse_request_timeout(req);
> + return 0;
> + }
> +
> + /* else request was interrupted */
> + return err;
> +}
> +
> static void request_wait_answer(struct fuse_req *req)
> {
> struct fuse_conn *fc = req->fm->fc;
> @@ -369,8 +488,11 @@ static void request_wait_answer(struct fuse_req *req)
>
> if (!fc->no_interrupt) {
> /* Any signal may interrupt this */
> - err = wait_event_interruptible(req->waitq,
> - test_bit(FR_FINISHED, &req->flags));
> + if (fc->daemon_timeout)
> + err = wait_answer_timeout(req, WAIT_TYPE_INTERRUPTIBLE);
> + else
> + err = wait_event_interruptible(req->waitq,
> + test_bit(FR_FINISHED, &req->flags));
> if (!err)
> return;
>
> @@ -383,8 +505,11 @@ static void request_wait_answer(struct fuse_req *req)
>
> if (!test_bit(FR_FORCE, &req->flags)) {
> /* Only fatal signals may interrupt this */
> - err = wait_event_killable(req->waitq,
> - test_bit(FR_FINISHED, &req->flags));
> + if (fc->daemon_timeout)
> + err = wait_answer_timeout(req, WAIT_TYPE_KILLABLE);
> + else
> + err = wait_event_killable(req->waitq,
> + test_bit(FR_FINISHED, &req->flags));
> if (!err)
> return;
>
> @@ -404,7 +529,10 @@ static void request_wait_answer(struct fuse_req *req)
> * Either request is already in userspace, or it was forced.
> * Wait it out.
> */
> - wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags));
> + if (fc->daemon_timeout)
> + wait_answer_timeout(req, WAIT_TYPE_NONINTERRUPTIBLE);
> + else
> + wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags));
> }
>
> static void __fuse_request_send(struct fuse_req *req)
> @@ -1268,6 +1396,9 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
> req = list_entry(fiq->pending.next, struct fuse_req, list);
> clear_bit(FR_PENDING, &req->flags);
> list_del_init(&req->list);
> + /* Acquire a reference since fuse_request_timeout may also be executing */
> + __fuse_get_request(req);
> + req->fpq = fpq;
> spin_unlock(&fiq->lock);
>
There's a race here with completion. If we timeout a request right here, we can
end up sending that same request below.
You are going to need to check
test_bit(FR_PROCESSING)
after you take the fpq->lock just below here to make sure you didn't race with
the timeout handler and time the request out already.
> args = req->args;
> @@ -1280,6 +1411,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
> if (args->opcode == FUSE_SETXATTR)
> req->out.h.error = -E2BIG;
> fuse_request_end(req);
> + fuse_put_request(req);
> goto restart;
> }
> spin_lock(&fpq->lock);
> @@ -1316,13 +1448,23 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
> }
> hash = fuse_req_hash(req->in.h.unique);
> list_move_tail(&req->list, &fpq->processing[hash]);
> - __fuse_get_request(req);
> set_bit(FR_SENT, &req->flags);
> spin_unlock(&fpq->lock);
> /* matches barrier in request_wait_answer() */
> smp_mb__after_atomic();
> if (test_bit(FR_INTERRUPTED, &req->flags))
> queue_interrupt(req);
> +
> + /* Check if request timed out */
> + if (test_bit(FR_PROCESSING, &req->flags)) {
> + spin_lock(&fpq->lock);
> + if (!test_bit(FR_PRIVATE, &req->flags))
> + list_del_init(&req->list);
> + spin_unlock(&fpq->lock);
> + fuse_put_request(req);
> + return -ETIME;
> + }
This isn't quite right, we could have FR_PROCESSING set because we completed the
request before we got here. If you put a schedule_timeout(HZ); right above this
you could easily see where a request gets completed by userspace, but now you've
fimed it out.
Additionally if we have FR_PROCESSING set from the timeout, shouldn't this
cleanup have been done already? I don't understand why we need to handle this
here, we should just return and whoever is waiting on the request will get the
error.
Thanks,
Josef
next prev parent reply other threads:[~2024-07-18 14:44 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-17 21:34 [PATCH] fuse: add optional kernel-enforced timeout for fuse requests Joanne Koong
2024-07-17 22:02 ` Joanne Koong
2024-07-17 22:23 ` Bernd Schubert
2024-07-18 5:24 ` Joanne Koong
2024-07-18 22:11 ` Bernd Schubert
2024-07-19 0:37 ` Joanne Koong
2024-07-19 13:06 ` Josef Bacik
2024-07-22 18:58 ` Joanne Koong
2024-07-22 20:52 ` Josef Bacik
2024-07-22 21:15 ` Joanne Koong
2024-07-18 14:44 ` Josef Bacik [this message]
2024-07-18 16:58 ` Joanne Koong
2024-07-24 20:44 ` Joanne Koong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240718144430.GA2099026@perftesting \
--to=josef@toxicpanda.com \
--cc=joannelkoong@gmail.com \
--cc=kernel-team@meta.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=osandov@osandov.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).