From: Bernd Schubert <bernd.schubert@fastmail.fm>
To: Joanne Koong <joannelkoong@gmail.com>,
Miklos Szeredi <miklos@szeredi.hu>
Cc: linux-fsdevel@vger.kernel.org, josef@toxicpanda.com,
jefflexu@linux.alibaba.com, laoar.shao@gmail.com,
kernel-team@meta.com
Subject: Re: [PATCH v6 1/2] fuse: add optional kernel-enforced timeout for requests
Date: Wed, 4 Sep 2024 00:37:58 +0200 [thread overview]
Message-ID: <02b45c36-b64c-4b7c-9148-55cbd06cc07b@fastmail.fm> (raw)
In-Reply-To: <CAJnrk1ZSEk+GuC1kvNS_Cu9u7UsoFW+vd2xOsrbL5i_GNAoEkQ@mail.gmail.com>
On 9/3/24 19:25, Joanne Koong wrote:
> On Mon, Sep 2, 2024 at 3:38 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
>>
>> On Fri, 30 Aug 2024 at 18:27, Joanne Koong <joannelkoong@gmail.com> wrote:
>>>
>>> There are situations where fuse servers can become unresponsive or
>>> stuck, for example if the server is in a deadlock. Currently, there's
>>> no good way to detect if a server is stuck and needs to be killed
>>> manually.
>>>
>>> This commit adds an option for enforcing a timeout (in seconds) on
>>> requests where if the timeout elapses without a reply from the server,
>>> the connection will be automatically aborted.
>>
>> Okay.
>>
>> I'm not sure what the overhead (scheduling and memory) of timers, but
>> starting one for each request seems excessive.
>>
>> Can we make the timeout per-connection instead of per request?
>>
>> I.e. When the first request is sent, the timer is started. When a
>> reply is received but there are still outstanding requests, the timer
>> is reset. When the last reply is received, the timer is stopped.
>>
>> This should handle the frozen server case just as well. It may not
>> perfectly handle the case when the server is still alive but for some
>> reason one or more requests get stuck, while others are still being
>> processed. The latter case is unlikely to be an issue in practice,
>> IMO.
>
> In that case, if the timeout is per-connection instead of per-request
> and we're not stringent about some requests getting stuck, maybe it
> makes more sense to just do this in userspace (libfuse) then? That
> seems pretty simple with having a watchdog thread that periodically
> (according to whatever specified timeout) checks if the number of
> requests serviced is increasing when
> /sys/fs/fuse/connections/*/waiting is non-zero.
>
> If there are multiple server threads (eg libfuse's fuse_loop_mt
> interface) and say, all of them are deadlocked except for 1 that is
> actively servicing requests, then this wouldn't catch that case, but
> even if this per-connection timeout was enforced in the kernel
> instead, it wouldn't catch that case either.
>
> So maybe this logic should just be moved to libfuse then? For this
> we'd need to pass the connection's device id (fc->dev) to userspace
> which i don't think we currently do, but that seems pretty simple. The
> one downside I see is that this doesn't let sysadmins enforce an
> automatic system-wide "max timeout" against malicious fuse servers but
> if we are having the timeout be per-connection instead of per-request,
> then a malicious server could still be malicious anyways (eg
> deadlocking all threads except for 1).
>
> Curious to hear what your and Bernrd's thoughts on this are.
I have question here, does it need to be an exact timeout or could it be
an interval/epoch? Let's say you timeout based on epoch lists? Example
1) epoch-a starts, requests are added to epoch-a list.
2) epoch-b starts, epoch-a list should get empty
3) epoch-c starts, epoch-b list should get empty, kill the connection if
epoch-a list is not empty (epoch-c list should not be needed, as epoch-a
list can be used, once confirmed it is empty.)
Here timeout would be epoch-a + epoch-b, i.e.
max-timeout <= 2 * epoch-time.
We could have more epochs/list-heads to make it more fine grained.
From my point of view that should be a rather cheap, as it just
adding/removing requests from list and checking for timeout if a list is
empty. With the caveat that it is not precise anymore.
That could be implemented in kernel and/or libfuse?
Thanks,
Bernd
next prev parent reply other threads:[~2024-09-03 22:38 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-30 16:26 [PATCH v6 0/2] fuse: add timeout option for requests Joanne Koong
2024-08-30 16:26 ` [PATCH v6 1/2] fuse: add optional kernel-enforced timeout " Joanne Koong
2024-09-02 10:37 ` Miklos Szeredi
2024-09-02 10:50 ` Bernd Schubert
2024-09-02 11:11 ` Miklos Szeredi
2024-09-03 17:25 ` Joanne Koong
2024-09-03 22:37 ` Bernd Schubert [this message]
2024-09-04 17:23 ` Joanne Koong
2024-09-17 22:00 ` Bernd Schubert
2024-09-18 7:29 ` Miklos Szeredi
2024-09-18 9:12 ` Bernd Schubert
2024-09-27 19:36 ` Joanne Koong
2024-09-28 8:43 ` Bernd Schubert
2024-10-01 17:03 ` Joanne Koong
2024-10-01 17:12 ` Joanne Koong
2024-10-07 18:39 ` Joanne Koong
2024-10-07 20:02 ` Bernd Schubert
2024-10-08 16:26 ` Joanne Koong
2024-10-08 19:00 ` Joanne Koong
2024-08-30 16:26 ` [PATCH v6 2/2] fuse: add default_request_timeout and max_request_timeout sysctls Joanne Koong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=02b45c36-b64c-4b7c-9148-55cbd06cc07b@fastmail.fm \
--to=bernd.schubert@fastmail.fm \
--cc=jefflexu@linux.alibaba.com \
--cc=joannelkoong@gmail.com \
--cc=josef@toxicpanda.com \
--cc=kernel-team@meta.com \
--cc=laoar.shao@gmail.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=miklos@szeredi.hu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).