* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
2024-11-14 19:13 ` [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests Joanne Koong
@ 2024-11-28 10:44 ` Sergey Senozhatsky
2024-11-28 11:00 ` Bernd Schubert
2024-12-03 11:01 ` Sergey Senozhatsky
` (6 subsequent siblings)
7 siblings, 1 reply; 39+ messages in thread
From: Sergey Senozhatsky @ 2024-11-28 10:44 UTC (permalink / raw)
To: Joanne Koong
Cc: miklos, linux-fsdevel, josef, bernd.schubert, jefflexu,
laoar.shao, kernel-team, Bernd Schubert
Hi Joanne,
On (24/11/14 11:13), Joanne Koong wrote:
> There are situations where fuse servers can become unresponsive or
> stuck, for example if the server is deadlocked. Currently, there's no
> good way to detect if a server is stuck and needs to be killed manually.
>
> This commit adds an option for enforcing a timeout (in minutes) for
> requests where if the timeout elapses without the server responding to
> the request, the connection will be automatically aborted.
Does it make sense to configure timeout in seconds? hung-task watchdog
operates in seconds and can be set to anything, e.g. 45 seconds, so it
panic the system before fuse timeout has a chance to trigger.
Another question is: this will terminate the connection. Does it
make sense to run timeout per request and just "abort" individual
requests? What I'm currently playing with here on our side is
something like this:
----
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 8573d79ef29c..82e071cecafd 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -21,6 +21,7 @@
#include <linux/swap.h>
#include <linux/splice.h>
#include <linux/sched.h>
+#include <linux/sched/sysctl.h>
MODULE_ALIAS_MISCDEV(FUSE_MINOR);
MODULE_ALIAS("devname:fuse");
@@ -368,11 +369,24 @@ static void request_wait_answer(struct fuse_req *req)
int err;
if (!fc->no_interrupt) {
- /* Any signal may interrupt this */
- err = wait_event_interruptible(req->waitq,
+ /* We can use CONFIG_DEFAULT_HUNG_TASK_TIMEOUT here */
+ unsigned long hang_check = sysctl_hung_task_timeout_secs;
+
+ if (hang_check) {
+ /* Any signal or timeout may interrupt this */
+ err = wait_event_interruptible_timeout(req->waitq,
+ test_bit(FR_FINISHED, &req->flags),
+ hang_check * (HZ / 2));
+ if (err > 0)
+ return;
+ } else {
+ /* Any signal may interrupt this */
+ err = wait_event_interruptible(req->waitq,
test_bit(FR_FINISHED, &req->flags));
- if (!err)
- return;
+
+ if (!err)
+ return;
+ }
set_bit(FR_INTERRUPTED, &req->flags);
/* matches barrier in fuse_dev_do_read() */
^ permalink raw reply related [flat|nested] 39+ messages in thread* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
2024-11-28 10:44 ` Sergey Senozhatsky
@ 2024-11-28 11:00 ` Bernd Schubert
2024-11-28 11:09 ` Sergey Senozhatsky
0 siblings, 1 reply; 39+ messages in thread
From: Bernd Schubert @ 2024-11-28 11:00 UTC (permalink / raw)
To: Sergey Senozhatsky, Joanne Koong
Cc: miklos, linux-fsdevel, josef, jefflexu, laoar.shao, kernel-team,
Bernd Schubert
On 11/28/24 11:44, Sergey Senozhatsky wrote:
> Hi Joanne,
>
> On (24/11/14 11:13), Joanne Koong wrote:
>> There are situations where fuse servers can become unresponsive or
>> stuck, for example if the server is deadlocked. Currently, there's no
>> good way to detect if a server is stuck and needs to be killed manually.
>>
>> This commit adds an option for enforcing a timeout (in minutes) for
>> requests where if the timeout elapses without the server responding to
>> the request, the connection will be automatically aborted.
>
> Does it make sense to configure timeout in seconds? hung-task watchdog
> operates in seconds and can be set to anything, e.g. 45 seconds, so it
> panic the system before fuse timeout has a chance to trigger.
>
> Another question is: this will terminate the connection. Does it
> make sense to run timeout per request and just "abort" individual
> requests? What I'm currently playing with here on our side is
> something like this:
Miklos had asked for to abort the connection in v4
https://lore.kernel.org/all/CAJfpegsiRNnJx7OAoH58XRq3zujrcXx94S2JACFdgJJ_b8FdHw@mail.gmail.com/raw
Thanks,
Bernd
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
2024-11-28 11:00 ` Bernd Schubert
@ 2024-11-28 11:09 ` Sergey Senozhatsky
2024-11-28 11:23 ` Bernd Schubert
0 siblings, 1 reply; 39+ messages in thread
From: Sergey Senozhatsky @ 2024-11-28 11:09 UTC (permalink / raw)
To: Bernd Schubert
Cc: Sergey Senozhatsky, Joanne Koong, miklos, linux-fsdevel, josef,
jefflexu, laoar.shao, kernel-team, Bernd Schubert
On (24/11/28 12:00), Bernd Schubert wrote:
> On 11/28/24 11:44, Sergey Senozhatsky wrote:
> > Hi Joanne,
> >
> > On (24/11/14 11:13), Joanne Koong wrote:
> >> There are situations where fuse servers can become unresponsive or
> >> stuck, for example if the server is deadlocked. Currently, there's no
> >> good way to detect if a server is stuck and needs to be killed manually.
> >>
> >> This commit adds an option for enforcing a timeout (in minutes) for
> >> requests where if the timeout elapses without the server responding to
> >> the request, the connection will be automatically aborted.
> >
> > Does it make sense to configure timeout in seconds? hung-task watchdog
> > operates in seconds and can be set to anything, e.g. 45 seconds, so it
> > panic the system before fuse timeout has a chance to trigger.
> >
> > Another question is: this will terminate the connection. Does it
> > make sense to run timeout per request and just "abort" individual
> > requests? What I'm currently playing with here on our side is
> > something like this:
Thanks for the pointers again, Bernd.
> Miklos had asked for to abort the connection in v4
> https://lore.kernel.org/all/CAJfpegsiRNnJx7OAoH58XRq3zujrcXx94S2JACFdgJJ_b8FdHw@mail.gmail.com/raw
OK, sounds reasonable. I'll try to give the series some testing in the
coming days.
// I still would probably prefer "seconds" timeout granularity.
// Unless this also has been discussed already and Bernd has a link ;)
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
2024-11-28 11:09 ` Sergey Senozhatsky
@ 2024-11-28 11:23 ` Bernd Schubert
2024-11-28 11:54 ` Sergey Senozhatsky
0 siblings, 1 reply; 39+ messages in thread
From: Bernd Schubert @ 2024-11-28 11:23 UTC (permalink / raw)
To: Sergey Senozhatsky, Bernd Schubert
Cc: Joanne Koong, miklos@szeredi.hu, linux-fsdevel@vger.kernel.org,
josef@toxicpanda.com, jefflexu@linux.alibaba.com,
laoar.shao@gmail.com, kernel-team@meta.com
On 11/28/24 12:09, Sergey Senozhatsky wrote:
> [You don't often get email from senozhatsky@chromium.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> On (24/11/28 12:00), Bernd Schubert wrote:
>> On 11/28/24 11:44, Sergey Senozhatsky wrote:
>>> Hi Joanne,
>>>
>>> On (24/11/14 11:13), Joanne Koong wrote:
>>>> There are situations where fuse servers can become unresponsive or
>>>> stuck, for example if the server is deadlocked. Currently, there's no
>>>> good way to detect if a server is stuck and needs to be killed manually.
>>>>
>>>> This commit adds an option for enforcing a timeout (in minutes) for
>>>> requests where if the timeout elapses without the server responding to
>>>> the request, the connection will be automatically aborted.
>>>
>>> Does it make sense to configure timeout in seconds? hung-task watchdog
>>> operates in seconds and can be set to anything, e.g. 45 seconds, so it
>>> panic the system before fuse timeout has a chance to trigger.
>>>
>>> Another question is: this will terminate the connection. Does it
>>> make sense to run timeout per request and just "abort" individual
>>> requests? What I'm currently playing with here on our side is
>>> something like this:
>
> Thanks for the pointers again, Bernd.
>
>> Miklos had asked for to abort the connection in v4
>> https://lore.kernel.org/all/CAJfpegsiRNnJx7OAoH58XRq3zujrcXx94S2JACFdgJJ_b8FdHw@mail.gmail.com/raw
>
> OK, sounds reasonable. I'll try to give the series some testing in the
> coming days.
>
> // I still would probably prefer "seconds" timeout granularity.
> // Unless this also has been discussed already and Bernd has a link ;)
The issue is that is currently iterating through 256 hash lists +
pending + bg.
https://lore.kernel.org/all/CAJnrk1b7bfAWWq_pFP=4XH3ddc_9GtAM2mE7EgWnx2Od+UUUjQ@mail.gmail.com/raw
Personally I would prefer a second list to avoid the check spike and latency
https://lore.kernel.org/linux-fsdevel/9ba4eaf4-b9f0-483f-90e5-9512aded419e@fastmail.fm/raw
What is your opinion about that? I guess android and chromium have an
interest low latencies and avoiding cpu spikes?
Thanks,
Bernd
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
2024-11-28 11:23 ` Bernd Schubert
@ 2024-11-28 11:54 ` Sergey Senozhatsky
2024-12-02 9:45 ` Tomasz Figa
0 siblings, 1 reply; 39+ messages in thread
From: Sergey Senozhatsky @ 2024-11-28 11:54 UTC (permalink / raw)
To: Bernd Schubert
Cc: Sergey Senozhatsky, Bernd Schubert, Joanne Koong,
miklos@szeredi.hu, linux-fsdevel@vger.kernel.org,
josef@toxicpanda.com, jefflexu@linux.alibaba.com,
laoar.shao@gmail.com, kernel-team@meta.com, Tomasz Figa
Cc-ing Tomasz
On (24/11/28 11:23), Bernd Schubert wrote:
> > Thanks for the pointers again, Bernd.
> >
> >> Miklos had asked for to abort the connection in v4
> >> https://lore.kernel.org/all/CAJfpegsiRNnJx7OAoH58XRq3zujrcXx94S2JACFdgJJ_b8FdHw@mail.gmail.com/raw
> >
> > OK, sounds reasonable. I'll try to give the series some testing in the
> > coming days.
> >
> > // I still would probably prefer "seconds" timeout granularity.
> > // Unless this also has been discussed already and Bernd has a link ;)
>
>
> The issue is that is currently iterating through 256 hash lists +
> pending + bg.
>
> https://lore.kernel.org/all/CAJnrk1b7bfAWWq_pFP=4XH3ddc_9GtAM2mE7EgWnx2Od+UUUjQ@mail.gmail.com/raw
Oh, I see.
> Personally I would prefer a second list to avoid the check spike and latency
> https://lore.kernel.org/linux-fsdevel/9ba4eaf4-b9f0-483f-90e5-9512aded419e@fastmail.fm/raw
That's good to know. I like the idea of less CPU usage in general,
our devices a battery powered so everything counts, to some extent.
> What is your opinion about that? I guess android and chromium have an
> interest low latencies and avoiding cpu spikes?
Good question.
Can't speak for android, in chromeos we probably will keep it at 1 minute,
but this is because our DEFAULT_HUNG_TASK_TIMEOUT is larger than that (we
use default value of 120 sec). There are setups that might use lower
values, or even re-define default value, e.g.:
arch/arc/configs/axs101_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
arch/arc/configs/axs103_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
arch/arc/configs/axs103_smp_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
arch/arc/configs/hsdk_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
arch/arc/configs/vdk_hs38_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
arch/arc/configs/vdk_hs38_smp_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
arch/powerpc/configs/mvme5100_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=20
In those cases 1 minute fuse timeout will overshot HUNG_TASK_TIMEOUT
and then the question is whether HUNG_TASK_PANIC is set.
On the other hand, setups that set much lower timeout than
DEFAULT_HUNG_TASK_TIMEOUT=120 will have extra CPU activities regardless,
just because watchdogs will run more often.
Tomasz, any opinions?
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
2024-11-28 11:54 ` Sergey Senozhatsky
@ 2024-12-02 9:45 ` Tomasz Figa
2024-12-02 14:43 ` Bernd Schubert
0 siblings, 1 reply; 39+ messages in thread
From: Tomasz Figa @ 2024-12-02 9:45 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: Bernd Schubert, Bernd Schubert, Joanne Koong, miklos@szeredi.hu,
linux-fsdevel@vger.kernel.org, josef@toxicpanda.com,
jefflexu@linux.alibaba.com, laoar.shao@gmail.com,
kernel-team@meta.com
Hi everyone,
On Thu, Nov 28, 2024 at 8:55 PM Sergey Senozhatsky
<senozhatsky@chromium.org> wrote:
>
> Cc-ing Tomasz
>
> On (24/11/28 11:23), Bernd Schubert wrote:
> > > Thanks for the pointers again, Bernd.
> > >
> > >> Miklos had asked for to abort the connection in v4
> > >> https://lore.kernel.org/all/CAJfpegsiRNnJx7OAoH58XRq3zujrcXx94S2JACFdgJJ_b8FdHw@mail.gmail.com/raw
> > >
> > > OK, sounds reasonable. I'll try to give the series some testing in the
> > > coming days.
> > >
> > > // I still would probably prefer "seconds" timeout granularity.
> > > // Unless this also has been discussed already and Bernd has a link ;)
> >
> >
> > The issue is that is currently iterating through 256 hash lists +
> > pending + bg.
> >
> > https://lore.kernel.org/all/CAJnrk1b7bfAWWq_pFP=4XH3ddc_9GtAM2mE7EgWnx2Od+UUUjQ@mail.gmail.com/raw
>
> Oh, I see.
>
> > Personally I would prefer a second list to avoid the check spike and latency
> > https://lore.kernel.org/linux-fsdevel/9ba4eaf4-b9f0-483f-90e5-9512aded419e@fastmail.fm/raw
>
> That's good to know. I like the idea of less CPU usage in general,
> our devices a battery powered so everything counts, to some extent.
>
> > What is your opinion about that? I guess android and chromium have an
> > interest low latencies and avoiding cpu spikes?
>
> Good question.
>
> Can't speak for android, in chromeos we probably will keep it at 1 minute,
> but this is because our DEFAULT_HUNG_TASK_TIMEOUT is larger than that (we
> use default value of 120 sec). There are setups that might use lower
> values, or even re-define default value, e.g.:
>
> arch/arc/configs/axs101_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> arch/arc/configs/axs103_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> arch/arc/configs/axs103_smp_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> arch/arc/configs/hsdk_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> arch/arc/configs/vdk_hs38_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> arch/arc/configs/vdk_hs38_smp_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> arch/powerpc/configs/mvme5100_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=20
>
> In those cases 1 minute fuse timeout will overshot HUNG_TASK_TIMEOUT
> and then the question is whether HUNG_TASK_PANIC is set.
>
> On the other hand, setups that set much lower timeout than
> DEFAULT_HUNG_TASK_TIMEOUT=120 will have extra CPU activities regardless,
> just because watchdogs will run more often.
>
> Tomasz, any opinions?
First of all, thanks everyone for looking into this.
How about keeping a list of requests in the FIFO order (in other
words: first entry is the first to timeout) and whenever the first
entry is being removed from the list (aka the request actually
completes), re-arming the timer to the timeout of the next request in
the list? This way we don't really have any timer firing unless there
is really a request that timed out.
(In fact, we could optimize it even further by opportunistically
scheduling a timer slightly later and opportunistically handling timed
out requests when other requests are being completed, but this would
be optimizing for the slow path, so probably an overkill.)
As for the length of the request timeout vs the hung task watchdog
timeout, my opinion is that we should make sure that the hung task
watchdog doesn't hit in any case, simply because a misbehaving
userspace process must not be able to panic the kernel. In the
blk-core, the blk_io_schedule() function [1] uses
sysctl_hung_task_timeout_secs to determine the maximum length of a
single uninterruptible sleep. I suppose we could use the same
calculation to obtain our timeout number. What does everyone think?
[1] https://elixir.bootlin.com/linux/v6.12.1/source/block/blk-core.c#L1232
Best regards,
Tomasz
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
2024-12-02 9:45 ` Tomasz Figa
@ 2024-12-02 14:43 ` Bernd Schubert
2024-12-02 19:29 ` Joanne Koong
0 siblings, 1 reply; 39+ messages in thread
From: Bernd Schubert @ 2024-12-02 14:43 UTC (permalink / raw)
To: Tomasz Figa, Sergey Senozhatsky
Cc: Bernd Schubert, Joanne Koong, miklos@szeredi.hu,
linux-fsdevel@vger.kernel.org, josef@toxicpanda.com,
jefflexu@linux.alibaba.com, laoar.shao@gmail.com,
kernel-team@meta.com
On 12/2/24 10:45, Tomasz Figa wrote:
> Hi everyone,
>
> On Thu, Nov 28, 2024 at 8:55 PM Sergey Senozhatsky
> <senozhatsky@chromium.org> wrote:
>>
>> Cc-ing Tomasz
>>
>> On (24/11/28 11:23), Bernd Schubert wrote:
>>>> Thanks for the pointers again, Bernd.
>>>>
>>>>> Miklos had asked for to abort the connection in v4
>>>>> https://lore.kernel.org/all/CAJfpegsiRNnJx7OAoH58XRq3zujrcXx94S2JACFdgJJ_b8FdHw@mail.gmail.com/raw
>>>>
>>>> OK, sounds reasonable. I'll try to give the series some testing in the
>>>> coming days.
>>>>
>>>> // I still would probably prefer "seconds" timeout granularity.
>>>> // Unless this also has been discussed already and Bernd has a link ;)
>>>
>>>
>>> The issue is that is currently iterating through 256 hash lists +
>>> pending + bg.
>>>
>>> https://lore.kernel.org/all/CAJnrk1b7bfAWWq_pFP=4XH3ddc_9GtAM2mE7EgWnx2Od+UUUjQ@mail.gmail.com/raw
>>
>> Oh, I see.
>>
>>> Personally I would prefer a second list to avoid the check spike and latency
>>> https://lore.kernel.org/linux-fsdevel/9ba4eaf4-b9f0-483f-90e5-9512aded419e@fastmail.fm/raw
>>
>> That's good to know. I like the idea of less CPU usage in general,
>> our devices a battery powered so everything counts, to some extent.
>>
>>> What is your opinion about that? I guess android and chromium have an
>>> interest low latencies and avoiding cpu spikes?
>>
>> Good question.
>>
>> Can't speak for android, in chromeos we probably will keep it at 1 minute,
>> but this is because our DEFAULT_HUNG_TASK_TIMEOUT is larger than that (we
>> use default value of 120 sec). There are setups that might use lower
>> values, or even re-define default value, e.g.:
>>
>> arch/arc/configs/axs101_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
>> arch/arc/configs/axs103_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
>> arch/arc/configs/axs103_smp_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
>> arch/arc/configs/hsdk_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
>> arch/arc/configs/vdk_hs38_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
>> arch/arc/configs/vdk_hs38_smp_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
>> arch/powerpc/configs/mvme5100_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=20
>>
>> In those cases 1 minute fuse timeout will overshot HUNG_TASK_TIMEOUT
>> and then the question is whether HUNG_TASK_PANIC is set.
>>
>> On the other hand, setups that set much lower timeout than
>> DEFAULT_HUNG_TASK_TIMEOUT=120 will have extra CPU activities regardless,
>> just because watchdogs will run more often.
>>
>> Tomasz, any opinions?
>
> First of all, thanks everyone for looking into this.
>
> How about keeping a list of requests in the FIFO order (in other
> words: first entry is the first to timeout) and whenever the first
> entry is being removed from the list (aka the request actually
> completes), re-arming the timer to the timeout of the next request in
> the list? This way we don't really have any timer firing unless there
> is really a request that timed out.
Requests are in FIFO order on the list and only head is checked.
There are 256 hash lists per fuse device for requests currently
in user space, though.
>
> (In fact, we could optimize it even further by opportunistically
> scheduling a timer slightly later and opportunistically handling timed
> out requests when other requests are being completed, but this would
> be optimizing for the slow path, so probably an overkill.)
>
> As for the length of the request timeout vs the hung task watchdog
> timeout, my opinion is that we should make sure that the hung task
> watchdog doesn't hit in any case, simply because a misbehaving
> userspace process must not be able to panic the kernel. In the
> blk-core, the blk_io_schedule() function [1] uses
> sysctl_hung_task_timeout_secs to determine the maximum length of a
> single uninterruptible sleep. I suppose we could use the same
> calculation to obtain our timeout number. What does everyone think?
>
> [1] https://elixir.bootlin.com/linux/v6.12.1/source/block/blk-core.c#L1232
I think that is a good idea.
Thanks,
Bernd
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
2024-12-02 14:43 ` Bernd Schubert
@ 2024-12-02 19:29 ` Joanne Koong
2024-12-03 4:31 ` Sergey Senozhatsky
2024-12-04 14:40 ` Tomasz Figa
0 siblings, 2 replies; 39+ messages in thread
From: Joanne Koong @ 2024-12-02 19:29 UTC (permalink / raw)
To: Bernd Schubert
Cc: Tomasz Figa, Sergey Senozhatsky, Bernd Schubert,
miklos@szeredi.hu, linux-fsdevel@vger.kernel.org,
josef@toxicpanda.com, jefflexu@linux.alibaba.com,
laoar.shao@gmail.com, kernel-team@meta.com
On Mon, Dec 2, 2024 at 6:43 AM Bernd Schubert
<bernd.schubert@fastmail.fm> wrote:
>
> On 12/2/24 10:45, Tomasz Figa wrote:
> > Hi everyone,
> >
> > On Thu, Nov 28, 2024 at 8:55 PM Sergey Senozhatsky
> > <senozhatsky@chromium.org> wrote:
> >>
> >> Cc-ing Tomasz
> >>
> >> On (24/11/28 11:23), Bernd Schubert wrote:
> >>>> Thanks for the pointers again, Bernd.
> >>>>
> >>>>> Miklos had asked for to abort the connection in v4
> >>>>> https://lore.kernel.org/all/CAJfpegsiRNnJx7OAoH58XRq3zujrcXx94S2JACFdgJJ_b8FdHw@mail.gmail.com/raw
> >>>>
> >>>> OK, sounds reasonable. I'll try to give the series some testing in the
> >>>> coming days.
> >>>>
> >>>> // I still would probably prefer "seconds" timeout granularity.
> >>>> // Unless this also has been discussed already and Bernd has a link ;)
> >>>
> >>>
> >>> The issue is that is currently iterating through 256 hash lists +
> >>> pending + bg.
> >>>
> >>> https://lore.kernel.org/all/CAJnrk1b7bfAWWq_pFP=4XH3ddc_9GtAM2mE7EgWnx2Od+UUUjQ@mail.gmail.com/raw
> >>
> >> Oh, I see.
> >>
> >>> Personally I would prefer a second list to avoid the check spike and latency
> >>> https://lore.kernel.org/linux-fsdevel/9ba4eaf4-b9f0-483f-90e5-9512aded419e@fastmail.fm/raw
> >>
> >> That's good to know. I like the idea of less CPU usage in general,
> >> our devices a battery powered so everything counts, to some extent.
> >>
> >>> What is your opinion about that? I guess android and chromium have an
> >>> interest low latencies and avoiding cpu spikes?
> >>
> >> Good question.
> >>
> >> Can't speak for android, in chromeos we probably will keep it at 1 minute,
> >> but this is because our DEFAULT_HUNG_TASK_TIMEOUT is larger than that (we
> >> use default value of 120 sec). There are setups that might use lower
> >> values, or even re-define default value, e.g.:
> >>
> >> arch/arc/configs/axs101_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> >> arch/arc/configs/axs103_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> >> arch/arc/configs/axs103_smp_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> >> arch/arc/configs/hsdk_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> >> arch/arc/configs/vdk_hs38_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> >> arch/arc/configs/vdk_hs38_smp_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> >> arch/powerpc/configs/mvme5100_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=20
> >>
> >> In those cases 1 minute fuse timeout will overshot HUNG_TASK_TIMEOUT
> >> and then the question is whether HUNG_TASK_PANIC is set.
> >>
> >> On the other hand, setups that set much lower timeout than
> >> DEFAULT_HUNG_TASK_TIMEOUT=120 will have extra CPU activities regardless,
> >> just because watchdogs will run more often.
> >>
> >> Tomasz, any opinions?
> >
> > First of all, thanks everyone for looking into this.
Hi Sergey and Tomasz,
Sorry for the late reply - I was out the last couple of days. Thanks
Bernd for weighing in and answering the questions!
> >
> > How about keeping a list of requests in the FIFO order (in other
> > words: first entry is the first to timeout) and whenever the first
> > entry is being removed from the list (aka the request actually
> > completes), re-arming the timer to the timeout of the next request in
> > the list? This way we don't really have any timer firing unless there
> > is really a request that timed out.
I think the issue with this is that we likely would end up wasting
more cpu cycles. For a busy FUSE server, there could be hundreds
(thousands?) of requests that happen within the span of
FUSE_TIMEOUT_TIMER_FREQ seconds.
While working on the patch, one thing I considered was disarming the
timer in the timeout handler fuse_check_timeout() if no requests are
on the list, in order to accomodate for "quiet periods" (eg if the
FUSE server is inactive for a few minutes or hours) but ultimately
decided against it because of the overhead it'd incur per request (eg
check if the timer is disarmed, would most likely need to grab the
fc->lock as well since timer rearming would need to be synchronized
between background and non-background requests, etc.).
All in all, imo I don't think having the timer trigger every 60
seconds (what FUSE_TIMEOUT_TIMER_FREQ is set to) is too costly.
>
> Requests are in FIFO order on the list and only head is checked.
> There are 256 hash lists per fuse device for requests currently
> in user space, though.
>
> >
> > (In fact, we could optimize it even further by opportunistically
> > scheduling a timer slightly later and opportunistically handling timed
> > out requests when other requests are being completed, but this would
> > be optimizing for the slow path, so probably an overkill.)
> >
> > As for the length of the request timeout vs the hung task watchdog
> > timeout, my opinion is that we should make sure that the hung task
> > watchdog doesn't hit in any case, simply because a misbehaving
> > userspace process must not be able to panic the kernel. In the
> > blk-core, the blk_io_schedule() function [1] uses
> > sysctl_hung_task_timeout_secs to determine the maximum length of a
> > single uninterruptible sleep. I suppose we could use the same
> > calculation to obtain our timeout number. What does everyone think?
> >
> > [1] https://elixir.bootlin.com/linux/v6.12.1/source/block/blk-core.c#L1232
>
> I think that is a good idea.
Btw, just something to note, the fuse request timeout has an upper
margin of error associated with it.
Copying over from the commit message -
"Please note that these timeouts are not 100% precise. The request may
take an extra FUSE_TIMEOUT_TIMER_FREQ seconds beyond the requested max
timeout due to how it's internally implemented."
For example, if a server sets the request timeout config to 10
minutes, the server could be aborted after 11 minutes
(FUSE_TIMEOUT_TIMER_FREQ is set to 60 seconds internally) instead of
10 minutes.
Thanks,
Joanne
>
>
> Thanks,
> Bernd
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
2024-12-02 19:29 ` Joanne Koong
@ 2024-12-03 4:31 ` Sergey Senozhatsky
2024-12-03 5:11 ` Joanne Koong
2024-12-04 14:40 ` Tomasz Figa
1 sibling, 1 reply; 39+ messages in thread
From: Sergey Senozhatsky @ 2024-12-03 4:31 UTC (permalink / raw)
To: Joanne Koong
Cc: Bernd Schubert, Tomasz Figa, Sergey Senozhatsky, Bernd Schubert,
miklos@szeredi.hu, linux-fsdevel@vger.kernel.org,
josef@toxicpanda.com, jefflexu@linux.alibaba.com,
laoar.shao@gmail.com, kernel-team@meta.com
On (24/12/02 11:29), Joanne Koong wrote:
> > >> In those cases 1 minute fuse timeout will overshot HUNG_TASK_TIMEOUT
> > >> and then the question is whether HUNG_TASK_PANIC is set.
> > >>
> > >> On the other hand, setups that set much lower timeout than
> > >> DEFAULT_HUNG_TASK_TIMEOUT=120 will have extra CPU activities regardless,
> > >> just because watchdogs will run more often.
> > >>
> > >> Tomasz, any opinions?
> > >
> > > First of all, thanks everyone for looking into this.
>
> Hi Sergey and Tomasz,
>
> Sorry for the late reply - I was out the last couple of days. Thanks
> Bernd for weighing in and answering the questions!
>
> > >
> > > How about keeping a list of requests in the FIFO order (in other
> > > words: first entry is the first to timeout) and whenever the first
> > > entry is being removed from the list (aka the request actually
> > > completes), re-arming the timer to the timeout of the next request in
> > > the list? This way we don't really have any timer firing unless there
> > > is really a request that timed out.
>
> I think the issue with this is that we likely would end up wasting
> more cpu cycles. For a busy FUSE server, there could be hundreds
> (thousands?) of requests that happen within the span of
> FUSE_TIMEOUT_TIMER_FREQ seconds.
So, a silly question - can we not do that maybe?
What I'm thinking about is what if instead of implementing fuse-watchdog
and tracking jiffies per request we'd switch to timeout aware operations
and use what's already in the kernel? E.g. instead of wait_event() we'd
use wait_event_timeout() and would configure timeout per connection
(also bringing in current hung-task-watchdog timeout value into the
equation), using MAX_SCHEDULE_TIMEOUT as a default (similarly to what
core kernel does). The first req that timeouts kills its siblings and
the connection.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
2024-12-03 4:31 ` Sergey Senozhatsky
@ 2024-12-03 5:11 ` Joanne Koong
0 siblings, 0 replies; 39+ messages in thread
From: Joanne Koong @ 2024-12-03 5:11 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: Bernd Schubert, Tomasz Figa, Bernd Schubert, miklos@szeredi.hu,
linux-fsdevel@vger.kernel.org, josef@toxicpanda.com,
jefflexu@linux.alibaba.com, laoar.shao@gmail.com,
kernel-team@meta.com
On Mon, Dec 2, 2024 at 8:31 PM Sergey Senozhatsky
<senozhatsky@chromium.org> wrote:
>
> On (24/12/02 11:29), Joanne Koong wrote:
> > > >> In those cases 1 minute fuse timeout will overshot HUNG_TASK_TIMEOUT
> > > >> and then the question is whether HUNG_TASK_PANIC is set.
> > > >>
> > > >> On the other hand, setups that set much lower timeout than
> > > >> DEFAULT_HUNG_TASK_TIMEOUT=120 will have extra CPU activities regardless,
> > > >> just because watchdogs will run more often.
> > > >>
> > > >> Tomasz, any opinions?
> > > >
> > > > First of all, thanks everyone for looking into this.
> >
> > Hi Sergey and Tomasz,
> >
> > Sorry for the late reply - I was out the last couple of days. Thanks
> > Bernd for weighing in and answering the questions!
> >
> > > >
> > > > How about keeping a list of requests in the FIFO order (in other
> > > > words: first entry is the first to timeout) and whenever the first
> > > > entry is being removed from the list (aka the request actually
> > > > completes), re-arming the timer to the timeout of the next request in
> > > > the list? This way we don't really have any timer firing unless there
> > > > is really a request that timed out.
> >
> > I think the issue with this is that we likely would end up wasting
> > more cpu cycles. For a busy FUSE server, there could be hundreds
> > (thousands?) of requests that happen within the span of
> > FUSE_TIMEOUT_TIMER_FREQ seconds.
>
> So, a silly question - can we not do that maybe?
>
> What I'm thinking about is what if instead of implementing fuse-watchdog
> and tracking jiffies per request we'd switch to timeout aware operations
> and use what's already in the kernel? E.g. instead of wait_event() we'd
> use wait_event_timeout() and would configure timeout per connection
> (also bringing in current hung-task-watchdog timeout value into the
> equation), using MAX_SCHEDULE_TIMEOUT as a default (similarly to what
> core kernel does). The first req that timeouts kills its siblings and
> the connection.
Using timeout aware operations like wait_event_timeout() associates a
timer per request (see schedule_timeout()) and this approach was tried
in v6 [1] but the overhead of having a timer per request showed about
a 1.5% drop in throughput [1], which is why we ended up pivoting to a
periodic watchdog timer that triggers at set intervals.
Thanks,
Joanne
[1] https://lore.kernel.org/linux-fsdevel/CAJnrk1bdyDq+4jo29ZbyjdcbFiU2qyCGGbYbqQc_G23+B_Xe_Q@mail.gmail.com/
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
2024-12-02 19:29 ` Joanne Koong
2024-12-03 4:31 ` Sergey Senozhatsky
@ 2024-12-04 14:40 ` Tomasz Figa
2024-12-04 14:51 ` Brian Geffon
1 sibling, 1 reply; 39+ messages in thread
From: Tomasz Figa @ 2024-12-04 14:40 UTC (permalink / raw)
To: Joanne Koong, Brian Geffon
Cc: Bernd Schubert, Sergey Senozhatsky, Bernd Schubert,
miklos@szeredi.hu, linux-fsdevel@vger.kernel.org,
josef@toxicpanda.com, jefflexu@linux.alibaba.com,
laoar.shao@gmail.com, kernel-team@meta.com
On Tue, Dec 3, 2024 at 4:29 AM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Mon, Dec 2, 2024 at 6:43 AM Bernd Schubert
> <bernd.schubert@fastmail.fm> wrote:
> >
> > On 12/2/24 10:45, Tomasz Figa wrote:
> > > Hi everyone,
> > >
> > > On Thu, Nov 28, 2024 at 8:55 PM Sergey Senozhatsky
> > > <senozhatsky@chromium.org> wrote:
> > >>
> > >> Cc-ing Tomasz
> > >>
> > >> On (24/11/28 11:23), Bernd Schubert wrote:
> > >>>> Thanks for the pointers again, Bernd.
> > >>>>
> > >>>>> Miklos had asked for to abort the connection in v4
> > >>>>> https://lore.kernel.org/all/CAJfpegsiRNnJx7OAoH58XRq3zujrcXx94S2JACFdgJJ_b8FdHw@mail.gmail.com/raw
> > >>>>
> > >>>> OK, sounds reasonable. I'll try to give the series some testing in the
> > >>>> coming days.
> > >>>>
> > >>>> // I still would probably prefer "seconds" timeout granularity.
> > >>>> // Unless this also has been discussed already and Bernd has a link ;)
> > >>>
> > >>>
> > >>> The issue is that is currently iterating through 256 hash lists +
> > >>> pending + bg.
> > >>>
> > >>> https://lore.kernel.org/all/CAJnrk1b7bfAWWq_pFP=4XH3ddc_9GtAM2mE7EgWnx2Od+UUUjQ@mail.gmail.com/raw
> > >>
> > >> Oh, I see.
> > >>
> > >>> Personally I would prefer a second list to avoid the check spike and latency
> > >>> https://lore.kernel.org/linux-fsdevel/9ba4eaf4-b9f0-483f-90e5-9512aded419e@fastmail.fm/raw
> > >>
> > >> That's good to know. I like the idea of less CPU usage in general,
> > >> our devices a battery powered so everything counts, to some extent.
> > >>
> > >>> What is your opinion about that? I guess android and chromium have an
> > >>> interest low latencies and avoiding cpu spikes?
> > >>
> > >> Good question.
> > >>
> > >> Can't speak for android, in chromeos we probably will keep it at 1 minute,
> > >> but this is because our DEFAULT_HUNG_TASK_TIMEOUT is larger than that (we
> > >> use default value of 120 sec). There are setups that might use lower
> > >> values, or even re-define default value, e.g.:
> > >>
> > >> arch/arc/configs/axs101_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> > >> arch/arc/configs/axs103_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> > >> arch/arc/configs/axs103_smp_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> > >> arch/arc/configs/hsdk_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> > >> arch/arc/configs/vdk_hs38_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> > >> arch/arc/configs/vdk_hs38_smp_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> > >> arch/powerpc/configs/mvme5100_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=20
> > >>
> > >> In those cases 1 minute fuse timeout will overshot HUNG_TASK_TIMEOUT
> > >> and then the question is whether HUNG_TASK_PANIC is set.
> > >>
> > >> On the other hand, setups that set much lower timeout than
> > >> DEFAULT_HUNG_TASK_TIMEOUT=120 will have extra CPU activities regardless,
> > >> just because watchdogs will run more often.
> > >>
> > >> Tomasz, any opinions?
> > >
> > > First of all, thanks everyone for looking into this.
>
> Hi Sergey and Tomasz,
>
> Sorry for the late reply - I was out the last couple of days. Thanks
> Bernd for weighing in and answering the questions!
>
> > >
> > > How about keeping a list of requests in the FIFO order (in other
> > > words: first entry is the first to timeout) and whenever the first
> > > entry is being removed from the list (aka the request actually
> > > completes), re-arming the timer to the timeout of the next request in
> > > the list? This way we don't really have any timer firing unless there
> > > is really a request that timed out.
>
> I think the issue with this is that we likely would end up wasting
> more cpu cycles. For a busy FUSE server, there could be hundreds
> (thousands?) of requests that happen within the span of
> FUSE_TIMEOUT_TIMER_FREQ seconds.
>
> While working on the patch, one thing I considered was disarming the
> timer in the timeout handler fuse_check_timeout() if no requests are
> on the list, in order to accomodate for "quiet periods" (eg if the
> FUSE server is inactive for a few minutes or hours) but ultimately
> decided against it because of the overhead it'd incur per request (eg
> check if the timer is disarmed, would most likely need to grab the
> fc->lock as well since timer rearming would need to be synchronized
> between background and non-background requests, etc.).
>
> All in all, imo I don't think having the timer trigger every 60
> seconds (what FUSE_TIMEOUT_TIMER_FREQ is set to) is too costly.
>
> >
> > Requests are in FIFO order on the list and only head is checked.
> > There are 256 hash lists per fuse device for requests currently
> > in user space, though.
> >
> > >
> > > (In fact, we could optimize it even further by opportunistically
> > > scheduling a timer slightly later and opportunistically handling timed
> > > out requests when other requests are being completed, but this would
> > > be optimizing for the slow path, so probably an overkill.)
> > >
> > > As for the length of the request timeout vs the hung task watchdog
> > > timeout, my opinion is that we should make sure that the hung task
> > > watchdog doesn't hit in any case, simply because a misbehaving
> > > userspace process must not be able to panic the kernel. In the
> > > blk-core, the blk_io_schedule() function [1] uses
> > > sysctl_hung_task_timeout_secs to determine the maximum length of a
> > > single uninterruptible sleep. I suppose we could use the same
> > > calculation to obtain our timeout number. What does everyone think?
> > >
> > > [1] https://elixir.bootlin.com/linux/v6.12.1/source/block/blk-core.c#L1232
> >
> > I think that is a good idea.
>
> Btw, just something to note, the fuse request timeout has an upper
> margin of error associated with it.
>
> Copying over from the commit message -
>
> "Please note that these timeouts are not 100% precise. The request may
> take an extra FUSE_TIMEOUT_TIMER_FREQ seconds beyond the requested max
> timeout due to how it's internally implemented."
>
> For example, if a server sets the request timeout config to 10
> minutes, the server could be aborted after 11 minutes
> (FUSE_TIMEOUT_TIMER_FREQ is set to 60 seconds internally) instead of
> 10 minutes.
>
Let me add +Brian Geffon who also was thinking about the right timeout value.
>
> Thanks,
> Joanne
> >
> >
> > Thanks,
> > Bernd
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
2024-12-04 14:40 ` Tomasz Figa
@ 2024-12-04 14:51 ` Brian Geffon
2024-12-04 15:09 ` Bernd Schubert
2024-12-05 3:23 ` Sergey Senozhatsky
0 siblings, 2 replies; 39+ messages in thread
From: Brian Geffon @ 2024-12-04 14:51 UTC (permalink / raw)
To: Tomasz Figa
Cc: Joanne Koong, Bernd Schubert, Sergey Senozhatsky, Bernd Schubert,
miklos@szeredi.hu, linux-fsdevel@vger.kernel.org,
josef@toxicpanda.com, jefflexu@linux.alibaba.com,
laoar.shao@gmail.com, kernel-team@meta.com
On Wed, Dec 4, 2024 at 9:40 AM Tomasz Figa <tfiga@chromium.org> wrote:
>
> On Tue, Dec 3, 2024 at 4:29 AM Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > On Mon, Dec 2, 2024 at 6:43 AM Bernd Schubert
> > <bernd.schubert@fastmail.fm> wrote:
> > >
> > > On 12/2/24 10:45, Tomasz Figa wrote:
> > > > Hi everyone,
> > > >
> > > > On Thu, Nov 28, 2024 at 8:55 PM Sergey Senozhatsky
> > > > <senozhatsky@chromium.org> wrote:
> > > >>
> > > >> Cc-ing Tomasz
> > > >>
> > > >> On (24/11/28 11:23), Bernd Schubert wrote:
> > > >>>> Thanks for the pointers again, Bernd.
> > > >>>>
> > > >>>>> Miklos had asked for to abort the connection in v4
> > > >>>>> https://lore.kernel.org/all/CAJfpegsiRNnJx7OAoH58XRq3zujrcXx94S2JACFdgJJ_b8FdHw@mail.gmail.com/raw
> > > >>>>
> > > >>>> OK, sounds reasonable. I'll try to give the series some testing in the
> > > >>>> coming days.
> > > >>>>
> > > >>>> // I still would probably prefer "seconds" timeout granularity.
> > > >>>> // Unless this also has been discussed already and Bernd has a link ;)
> > > >>>
> > > >>>
> > > >>> The issue is that is currently iterating through 256 hash lists +
> > > >>> pending + bg.
> > > >>>
> > > >>> https://lore.kernel.org/all/CAJnrk1b7bfAWWq_pFP=4XH3ddc_9GtAM2mE7EgWnx2Od+UUUjQ@mail.gmail.com/raw
> > > >>
> > > >> Oh, I see.
> > > >>
> > > >>> Personally I would prefer a second list to avoid the check spike and latency
> > > >>> https://lore.kernel.org/linux-fsdevel/9ba4eaf4-b9f0-483f-90e5-9512aded419e@fastmail.fm/raw
> > > >>
> > > >> That's good to know. I like the idea of less CPU usage in general,
> > > >> our devices a battery powered so everything counts, to some extent.
> > > >>
> > > >>> What is your opinion about that? I guess android and chromium have an
> > > >>> interest low latencies and avoiding cpu spikes?
> > > >>
> > > >> Good question.
> > > >>
> > > >> Can't speak for android, in chromeos we probably will keep it at 1 minute,
> > > >> but this is because our DEFAULT_HUNG_TASK_TIMEOUT is larger than that (we
> > > >> use default value of 120 sec). There are setups that might use lower
> > > >> values, or even re-define default value, e.g.:
> > > >>
> > > >> arch/arc/configs/axs101_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> > > >> arch/arc/configs/axs103_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> > > >> arch/arc/configs/axs103_smp_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> > > >> arch/arc/configs/hsdk_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> > > >> arch/arc/configs/vdk_hs38_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> > > >> arch/arc/configs/vdk_hs38_smp_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> > > >> arch/powerpc/configs/mvme5100_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=20
> > > >>
> > > >> In those cases 1 minute fuse timeout will overshot HUNG_TASK_TIMEOUT
> > > >> and then the question is whether HUNG_TASK_PANIC is set.
In my opinion this is a good argument for having the hung task timeout
and a fuse timeout independent. The hung task timeout is for hung
kernel threads, in this situation we're potentially taking too long in
userspace but that doesn't necessarily mean the system is hung. I
think a loop which does an interruptible wait with a timeout of 1/2
the hung task timeout would make sense to ensure the hung task timeout
doesn't hit. There might be situations where we want a fuse timeout
which is larger than the hung task timeout, perhaps a file system
being read over a satellite internet connection?
> > > >>
> > > >> On the other hand, setups that set much lower timeout than
> > > >> DEFAULT_HUNG_TASK_TIMEOUT=120 will have extra CPU activities regardless,
> > > >> just because watchdogs will run more often.
> > > >>
> > > >> Tomasz, any opinions?
> > > >
> > > > First of all, thanks everyone for looking into this.
> >
> > Hi Sergey and Tomasz,
> >
> > Sorry for the late reply - I was out the last couple of days. Thanks
> > Bernd for weighing in and answering the questions!
> >
> > > >
> > > > How about keeping a list of requests in the FIFO order (in other
> > > > words: first entry is the first to timeout) and whenever the first
> > > > entry is being removed from the list (aka the request actually
> > > > completes), re-arming the timer to the timeout of the next request in
> > > > the list? This way we don't really have any timer firing unless there
> > > > is really a request that timed out.
> >
> > I think the issue with this is that we likely would end up wasting
> > more cpu cycles. For a busy FUSE server, there could be hundreds
> > (thousands?) of requests that happen within the span of
> > FUSE_TIMEOUT_TIMER_FREQ seconds.
> >
> > While working on the patch, one thing I considered was disarming the
> > timer in the timeout handler fuse_check_timeout() if no requests are
> > on the list, in order to accomodate for "quiet periods" (eg if the
> > FUSE server is inactive for a few minutes or hours) but ultimately
> > decided against it because of the overhead it'd incur per request (eg
> > check if the timer is disarmed, would most likely need to grab the
> > fc->lock as well since timer rearming would need to be synchronized
> > between background and non-background requests, etc.).
> >
> > All in all, imo I don't think having the timer trigger every 60
> > seconds (what FUSE_TIMEOUT_TIMER_FREQ is set to) is too costly.
> >
> > >
> > > Requests are in FIFO order on the list and only head is checked.
> > > There are 256 hash lists per fuse device for requests currently
> > > in user space, though.
> > >
> > > >
> > > > (In fact, we could optimize it even further by opportunistically
> > > > scheduling a timer slightly later and opportunistically handling timed
> > > > out requests when other requests are being completed, but this would
> > > > be optimizing for the slow path, so probably an overkill.)
> > > >
> > > > As for the length of the request timeout vs the hung task watchdog
> > > > timeout, my opinion is that we should make sure that the hung task
> > > > watchdog doesn't hit in any case, simply because a misbehaving
> > > > userspace process must not be able to panic the kernel. In the
> > > > blk-core, the blk_io_schedule() function [1] uses
> > > > sysctl_hung_task_timeout_secs to determine the maximum length of a
> > > > single uninterruptible sleep. I suppose we could use the same
> > > > calculation to obtain our timeout number. What does everyone think?
> > > >
> > > > [1] https://elixir.bootlin.com/linux/v6.12.1/source/block/blk-core.c#L1232
> > >
> > > I think that is a good idea.
> >
> > Btw, just something to note, the fuse request timeout has an upper
> > margin of error associated with it.
> >
> > Copying over from the commit message -
> >
> > "Please note that these timeouts are not 100% precise. The request may
> > take an extra FUSE_TIMEOUT_TIMER_FREQ seconds beyond the requested max
> > timeout due to how it's internally implemented."
> >
> > For example, if a server sets the request timeout config to 10
> > minutes, the server could be aborted after 11 minutes
> > (FUSE_TIMEOUT_TIMER_FREQ is set to 60 seconds internally) instead of
> > 10 minutes.
> >
>
> Let me add +Brian Geffon who also was thinking about the right timeout value.
>
> >
> > Thanks,
> > Joanne
> > >
> > >
> > > Thanks,
> > > Bernd
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
2024-12-04 14:51 ` Brian Geffon
@ 2024-12-04 15:09 ` Bernd Schubert
2024-12-05 3:23 ` Sergey Senozhatsky
1 sibling, 0 replies; 39+ messages in thread
From: Bernd Schubert @ 2024-12-04 15:09 UTC (permalink / raw)
To: Brian Geffon, Tomasz Figa
Cc: Joanne Koong, Sergey Senozhatsky, Bernd Schubert,
miklos@szeredi.hu, linux-fsdevel@vger.kernel.org,
josef@toxicpanda.com, jefflexu@linux.alibaba.com,
laoar.shao@gmail.com, kernel-team@meta.com
On 12/4/24 15:51, Brian Geffon wrote:
> On Wed, Dec 4, 2024 at 9:40 AM Tomasz Figa <tfiga@chromium.org> wrote:
>>
>> On Tue, Dec 3, 2024 at 4:29 AM Joanne Koong <joannelkoong@gmail.com> wrote:
>>>
>>> On Mon, Dec 2, 2024 at 6:43 AM Bernd Schubert
>>> <bernd.schubert@fastmail.fm> wrote:
>>>>
>>>> On 12/2/24 10:45, Tomasz Figa wrote:
>>>>> Hi everyone,
>>>>>
>>>>> On Thu, Nov 28, 2024 at 8:55 PM Sergey Senozhatsky
>>>>> <senozhatsky@chromium.org> wrote:
>>>>>>
>>>>>> Cc-ing Tomasz
>>>>>>
>>>>>> On (24/11/28 11:23), Bernd Schubert wrote:
>>>>>>>> Thanks for the pointers again, Bernd.
>>>>>>>>
>>>>>>>>> Miklos had asked for to abort the connection in v4
>>>>>>>>> https://lore.kernel.org/all/CAJfpegsiRNnJx7OAoH58XRq3zujrcXx94S2JACFdgJJ_b8FdHw@mail.gmail.com/raw
>>>>>>>>
>>>>>>>> OK, sounds reasonable. I'll try to give the series some testing in the
>>>>>>>> coming days.
>>>>>>>>
>>>>>>>> // I still would probably prefer "seconds" timeout granularity.
>>>>>>>> // Unless this also has been discussed already and Bernd has a link ;)
>>>>>>>
>>>>>>>
>>>>>>> The issue is that is currently iterating through 256 hash lists +
>>>>>>> pending + bg.
>>>>>>>
>>>>>>> https://lore.kernel.org/all/CAJnrk1b7bfAWWq_pFP=4XH3ddc_9GtAM2mE7EgWnx2Od+UUUjQ@mail.gmail.com/raw
>>>>>>
>>>>>> Oh, I see.
>>>>>>
>>>>>>> Personally I would prefer a second list to avoid the check spike and latency
>>>>>>> https://lore.kernel.org/linux-fsdevel/9ba4eaf4-b9f0-483f-90e5-9512aded419e@fastmail.fm/raw
>>>>>>
>>>>>> That's good to know. I like the idea of less CPU usage in general,
>>>>>> our devices a battery powered so everything counts, to some extent.
>>>>>>
>>>>>>> What is your opinion about that? I guess android and chromium have an
>>>>>>> interest low latencies and avoiding cpu spikes?
>>>>>>
>>>>>> Good question.
>>>>>>
>>>>>> Can't speak for android, in chromeos we probably will keep it at 1 minute,
>>>>>> but this is because our DEFAULT_HUNG_TASK_TIMEOUT is larger than that (we
>>>>>> use default value of 120 sec). There are setups that might use lower
>>>>>> values, or even re-define default value, e.g.:
>>>>>>
>>>>>> arch/arc/configs/axs101_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
>>>>>> arch/arc/configs/axs103_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
>>>>>> arch/arc/configs/axs103_smp_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
>>>>>> arch/arc/configs/hsdk_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
>>>>>> arch/arc/configs/vdk_hs38_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
>>>>>> arch/arc/configs/vdk_hs38_smp_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
>>>>>> arch/powerpc/configs/mvme5100_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=20
>>>>>>
>>>>>> In those cases 1 minute fuse timeout will overshot HUNG_TASK_TIMEOUT
>>>>>> and then the question is whether HUNG_TASK_PANIC is set.
>
> In my opinion this is a good argument for having the hung task timeout
> and a fuse timeout independent. The hung task timeout is for hung
> kernel threads, in this situation we're potentially taking too long in
> userspace but that doesn't necessarily mean the system is hung. I
> think a loop which does an interruptible wait with a timeout of 1/2
> the hung task timeout would make sense to ensure the hung task timeout
> doesn't hit. There might be situations where we want a fuse timeout
> which is larger than the hung task timeout, perhaps a file system
> being read over a satellite internet connection?
For a network file system the remote server also might just hang and
one might want to wait much longer than 1/2 hung task timeout for
recovery.
Thanks,
Bernd
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
2024-12-04 14:51 ` Brian Geffon
2024-12-04 15:09 ` Bernd Schubert
@ 2024-12-05 3:23 ` Sergey Senozhatsky
2024-12-05 11:07 ` Brian Geffon
1 sibling, 1 reply; 39+ messages in thread
From: Sergey Senozhatsky @ 2024-12-05 3:23 UTC (permalink / raw)
To: Brian Geffon
Cc: Tomasz Figa, Joanne Koong, Bernd Schubert, Sergey Senozhatsky,
Bernd Schubert, miklos@szeredi.hu, linux-fsdevel@vger.kernel.org,
josef@toxicpanda.com, jefflexu@linux.alibaba.com,
laoar.shao@gmail.com, kernel-team@meta.com
On (24/12/04 09:51), Brian Geffon wrote:
> > > > >>
> > > > >> In those cases 1 minute fuse timeout will overshot HUNG_TASK_TIMEOUT
> > > > >> and then the question is whether HUNG_TASK_PANIC is set.
>
> In my opinion this is a good argument for having the hung task timeout
> and a fuse timeout independent. The hung task timeout is for hung
> kernel threads
Sorry, but no, it's not only for kernel threads.
> in this situation we're potentially taking too long in
> userspace but that doesn't necessarily mean the system is hung.
And it's not for hung system. It's for tasks that stuck unreasonably
long waiting for a particular event or resource. And those tasks very
well can be user-space processes, being stuck in syscall waiting for
something.
> I think a loop which does an interruptible wait with a timeout of 1/2
> the hung task timeout would make sense to ensure the hung task timeout
> doesn't hit.
The point here is not to silent watchdog, we might as well just disable
it and call it a day. The point here is that fuse can be used (and IS
in our particular case) as a network filesystem, and the problem can be
way outside of your system, so spinning in a wait loop doesn't fix any
problem on that remote system no matter how long we spin, and that's
what watchdog signals us.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
2024-12-05 3:23 ` Sergey Senozhatsky
@ 2024-12-05 11:07 ` Brian Geffon
0 siblings, 0 replies; 39+ messages in thread
From: Brian Geffon @ 2024-12-05 11:07 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: Tomasz Figa, Joanne Koong, Bernd Schubert, Bernd Schubert,
miklos@szeredi.hu, linux-fsdevel@vger.kernel.org,
josef@toxicpanda.com, jefflexu@linux.alibaba.com,
laoar.shao@gmail.com, kernel-team@meta.com
On Wed, Dec 4, 2024 at 10:23 PM Sergey Senozhatsky
<senozhatsky@chromium.org> wrote:
>
> On (24/12/04 09:51), Brian Geffon wrote:
> > > > > >>
> > > > > >> In those cases 1 minute fuse timeout will overshot HUNG_TASK_TIMEOUT
> > > > > >> and then the question is whether HUNG_TASK_PANIC is set.
> >
> > In my opinion this is a good argument for having the hung task timeout
> > and a fuse timeout independent. The hung task timeout is for hung
> > kernel threads
>
> Sorry, but no, it's not only for kernel threads.
>
> > in this situation we're potentially taking too long in
> > userspace but that doesn't necessarily mean the system is hung.
>
> And it's not for hung system. It's for tasks that stuck unreasonably
> long waiting for a particular event or resource. And those tasks very
> well can be user-space processes, being stuck in syscall waiting for
> something.
>
> > I think a loop which does an interruptible wait with a timeout of 1/2
> > the hung task timeout would make sense to ensure the hung task timeout
> > doesn't hit.
>
> The point here is not to silent watchdog, we might as well just disable
> it and call it a day. The point here is that fuse can be used (and IS
> in our particular case) as a network filesystem, and the problem can be
> way outside of your system, so spinning in a wait loop doesn't fix any
> problem on that remote system no matter how long we spin, and that's
> what watchdog signals us.
Sergey, to clarify again what I was suggesting, I was not suggesting
that no timeout is appropriate. I suggested that the hung task timeout
is not appropriate and instead we should be using a different timeout.
My suggestion about looping was simply to say: **once we've
transitioned to userspace to service the request the hung task timeout
should not apply and instead a different fuse timeout should.**
The loop is nothing more than to allow the task to break out of the
wait so it's not triggering the hung task timeout and frankly an
implementation detail that I should have probably omitted. There are
situations where a timeout less than the hung task timeout might be
appropriate and there are situations where a timeout longer than the
hung task timeout might be appropriate.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
2024-11-14 19:13 ` [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests Joanne Koong
2024-11-28 10:44 ` Sergey Senozhatsky
@ 2024-12-03 11:01 ` Sergey Senozhatsky
2024-12-06 0:06 ` Joanne Koong
2024-12-04 13:14 ` Sergey Senozhatsky
` (5 subsequent siblings)
7 siblings, 1 reply; 39+ messages in thread
From: Sergey Senozhatsky @ 2024-12-03 11:01 UTC (permalink / raw)
To: Joanne Koong
Cc: miklos, linux-fsdevel, josef, bernd.schubert, jefflexu,
laoar.shao, kernel-team, Bernd Schubert, Tomasz Figa
On (24/11/14 11:13), Joanne Koong wrote:
[..]
> +static void fuse_init_fc_timeout(struct fuse_conn *fc, struct fuse_fs_context *ctx)
> +{
> + if (ctx->req_timeout) {
> + if (check_mul_overflow(ctx->req_timeout * 60, HZ, &fc->timeout.req_timeout))
> + fc->timeout.req_timeout = ULONG_MAX;
> + timer_setup(&fc->timeout.timer, fuse_check_timeout, 0);
> + mod_timer(&fc->timeout.timer, jiffies + FUSE_TIMEOUT_TIMER_FREQ);
So I think this will require much bigger changes in the code.
fuse_check_timeout() is executed from IRQ context and it takes
the same locks that are acquired by preemptive task contexts.
So all of those now need to disable local IRQs before they
acquire: fc->bg_lock, fig->lock, fc->lock. Otherwise we are
in a deadlock scenario (in many places, unfortunately).
A simple example:
[ 91.466408] _raw_spin_lock+0x39/0x70
[ 91.466420] fuse_simple_background+0x902/0x1130 [fuse]
[ 91.466453] fuse_send_init+0x337/0x680 [fuse]
[ 91.466485] fuse_fill_super+0x1c8/0x200 [fuse]
[ 91.466516] get_tree_nodev+0xaf/0x140
[ 91.466527] fuse_get_tree+0x27e/0x450 [fuse]
[ 91.466559] vfs_get_tree+0x88/0x240
[ 91.466569] path_mount+0xf26/0x1ed0
[ 91.466579] __se_sys_mount+0x1c9/0x240
[ 91.466588] do_syscall_64+0x6f/0xa0
[ 91.466598] entry_SYSCALL_64_after_hwframe+0x73/0xdd
[ 91.466666]
other info that might help us debug this:
[ 91.466672] Possible unsafe locking scenario:
[ 91.466679] CPU0
[ 91.466684] ----
[ 91.466689] lock(&fiq->lock);
[ 91.466701] <Interrupt>
[ 91.466707] lock(&fiq->lock);
[ 91.466718]
*** DEADLOCK ***
[ 91.466724] 4 locks held by runtime_probe/5043:
[ 91.466732] #0: ffff888005812448 (sb_writers#3){.+.+}-{0:0}, at: filename_create+0xb2/0x320
[ 91.466762] #1: ffff8881499ea3f0 (&type->i_mutex_dir_key#5/1){+.+.}-{3:3}, at: filename_create+0x14c/0x320
[ 91.466791] #2: ffffffff9d864ce0 (rcu_read_lock){....}-{1:2}, at: security_sid_to_context_core+0xa4/0x4f0
[ 91.466817] #3: ffff88815c009ec0 ((&fc->timeout.timer)){+.-.}-{0:0}, at: run_timer_softirq+0x702/0x1700
[ 91.466841]
stack backtrace:
[ 91.466850] CPU: 2 PID: 5043 Comm: runtime_probe Tainted: G U 6.6.63-lockdep #1 f2b045305e587e03c4766ca818bb77742f953c87
[ 91.466864] Hardware name: HP Lantis/Lantis, BIOS Google_Lantis.13606.437.0 01/21/2022
[ 91.466872] Call Trace:
[ 91.466881] <IRQ>
[ 91.466889] dump_stack_lvl+0x6d/0xb0
[ 91.466904] print_usage_bug+0x8a4/0xbb0
[ 91.466917] mark_lock+0x13ca/0x1940
[ 91.466930] __lock_acquire+0xc10/0x2670
[ 91.466944] lock_acquire+0x119/0x3a0
[ 91.466955] ? fuse_check_timeout+0x32/0x630 [fuse c290dfa1c114772028056af34b9662cba2d155b5]
[ 91.466992] ? fuse_check_timeout+0x32/0x630 [fuse c290dfa1c114772028056af34b9662cba2d155b5]
[ 91.467025] _raw_spin_lock+0x39/0x70
[ 91.467036] ? fuse_check_timeout+0x32/0x630 [fuse c290dfa1c114772028056af34b9662cba2d155b5]
[ 91.467070] fuse_check_timeout+0x32/0x630 [fuse c290dfa1c114772028056af34b9662cba2d155b5]
[ 91.467104] ? run_timer_softirq+0x702/0x1700
[ 91.467114] ? run_timer_softirq+0x702/0x1700
[ 91.467123] ? __pfx_fuse_check_timeout+0x10/0x10 [fuse c290dfa1c114772028056af34b9662cba2d155b5]
[ 91.467156] run_timer_softirq+0x763/0x1700
[ 91.467172] irq_exit_rcu+0x300/0x7d0
[ 91.467183] ? sysvec_apic_timer_interrupt+0x56/0x90
[ 91.467195] sysvec_apic_timer_interrupt+0x56/0x90
[ 91.467205] </IRQ>
Do we want to run fuse-watchdog as a preemptive task instead of
IRQ context? Simlar to the way the hung-task watchdog runs, for
example. Yes, it now may starve and not get scheduled in corner
cases (under some crazy pressure), but at least we don't need to
patch the entire fuse code to use irq_safe/irq_restore locking
variants.
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
2024-12-03 11:01 ` Sergey Senozhatsky
@ 2024-12-06 0:06 ` Joanne Koong
2024-12-06 4:28 ` Sergey Senozhatsky
0 siblings, 1 reply; 39+ messages in thread
From: Joanne Koong @ 2024-12-06 0:06 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: miklos, linux-fsdevel, josef, bernd.schubert, jefflexu,
laoar.shao, kernel-team, Bernd Schubert, Tomasz Figa
On Tue, Dec 3, 2024 at 3:01 AM Sergey Senozhatsky
<senozhatsky@chromium.org> wrote:
>
> On (24/11/14 11:13), Joanne Koong wrote:
> [..]
> > +static void fuse_init_fc_timeout(struct fuse_conn *fc, struct fuse_fs_context *ctx)
> > +{
> > + if (ctx->req_timeout) {
> > + if (check_mul_overflow(ctx->req_timeout * 60, HZ, &fc->timeout.req_timeout))
> > + fc->timeout.req_timeout = ULONG_MAX;
> > + timer_setup(&fc->timeout.timer, fuse_check_timeout, 0);
> > + mod_timer(&fc->timeout.timer, jiffies + FUSE_TIMEOUT_TIMER_FREQ);
>
> So I think this will require much bigger changes in the code.
> fuse_check_timeout() is executed from IRQ context and it takes
> the same locks that are acquired by preemptive task contexts.
> So all of those now need to disable local IRQs before they
> acquire: fc->bg_lock, fig->lock, fc->lock. Otherwise we are
> in a deadlock scenario (in many places, unfortunately).
>
> A simple example:
>
> [ 91.466408] _raw_spin_lock+0x39/0x70
> [ 91.466420] fuse_simple_background+0x902/0x1130 [fuse]
> [ 91.466453] fuse_send_init+0x337/0x680 [fuse]
> [ 91.466485] fuse_fill_super+0x1c8/0x200 [fuse]
> [ 91.466516] get_tree_nodev+0xaf/0x140
> [ 91.466527] fuse_get_tree+0x27e/0x450 [fuse]
> [ 91.466559] vfs_get_tree+0x88/0x240
> [ 91.466569] path_mount+0xf26/0x1ed0
> [ 91.466579] __se_sys_mount+0x1c9/0x240
> [ 91.466588] do_syscall_64+0x6f/0xa0
> [ 91.466598] entry_SYSCALL_64_after_hwframe+0x73/0xdd
> [ 91.466666]
> other info that might help us debug this:
> [ 91.466672] Possible unsafe locking scenario:
>
> [ 91.466679] CPU0
> [ 91.466684] ----
> [ 91.466689] lock(&fiq->lock);
> [ 91.466701] <Interrupt>
> [ 91.466707] lock(&fiq->lock);
> [ 91.466718]
> *** DEADLOCK ***
>
> [ 91.466724] 4 locks held by runtime_probe/5043:
> [ 91.466732] #0: ffff888005812448 (sb_writers#3){.+.+}-{0:0}, at: filename_create+0xb2/0x320
> [ 91.466762] #1: ffff8881499ea3f0 (&type->i_mutex_dir_key#5/1){+.+.}-{3:3}, at: filename_create+0x14c/0x320
> [ 91.466791] #2: ffffffff9d864ce0 (rcu_read_lock){....}-{1:2}, at: security_sid_to_context_core+0xa4/0x4f0
> [ 91.466817] #3: ffff88815c009ec0 ((&fc->timeout.timer)){+.-.}-{0:0}, at: run_timer_softirq+0x702/0x1700
> [ 91.466841]
> stack backtrace:
> [ 91.466850] CPU: 2 PID: 5043 Comm: runtime_probe Tainted: G U 6.6.63-lockdep #1 f2b045305e587e03c4766ca818bb77742f953c87
> [ 91.466864] Hardware name: HP Lantis/Lantis, BIOS Google_Lantis.13606.437.0 01/21/2022
> [ 91.466872] Call Trace:
> [ 91.466881] <IRQ>
> [ 91.466889] dump_stack_lvl+0x6d/0xb0
> [ 91.466904] print_usage_bug+0x8a4/0xbb0
> [ 91.466917] mark_lock+0x13ca/0x1940
> [ 91.466930] __lock_acquire+0xc10/0x2670
> [ 91.466944] lock_acquire+0x119/0x3a0
> [ 91.466955] ? fuse_check_timeout+0x32/0x630 [fuse c290dfa1c114772028056af34b9662cba2d155b5]
> [ 91.466992] ? fuse_check_timeout+0x32/0x630 [fuse c290dfa1c114772028056af34b9662cba2d155b5]
> [ 91.467025] _raw_spin_lock+0x39/0x70
> [ 91.467036] ? fuse_check_timeout+0x32/0x630 [fuse c290dfa1c114772028056af34b9662cba2d155b5]
> [ 91.467070] fuse_check_timeout+0x32/0x630 [fuse c290dfa1c114772028056af34b9662cba2d155b5]
> [ 91.467104] ? run_timer_softirq+0x702/0x1700
> [ 91.467114] ? run_timer_softirq+0x702/0x1700
> [ 91.467123] ? __pfx_fuse_check_timeout+0x10/0x10 [fuse c290dfa1c114772028056af34b9662cba2d155b5]
> [ 91.467156] run_timer_softirq+0x763/0x1700
> [ 91.467172] irq_exit_rcu+0x300/0x7d0
> [ 91.467183] ? sysvec_apic_timer_interrupt+0x56/0x90
> [ 91.467195] sysvec_apic_timer_interrupt+0x56/0x90
> [ 91.467205] </IRQ>
>
> Do we want to run fuse-watchdog as a preemptive task instead of
> IRQ context? Simlar to the way the hung-task watchdog runs, for
> example. Yes, it now may starve and not get scheduled in corner
> cases (under some crazy pressure), but at least we don't need to
> patch the entire fuse code to use irq_safe/irq_restore locking
> variants.
Interesting! Thanks for noting this.
It looks like the choices we have here then are to either:
* have this run in a kthread like hung-task watchdog, as you mentioned above
* have all fuse code use irq_safe/irq_restore for locks
* use a separate spinlock/list for request timeouts as per the
implementation in v7 [1], though will need to benchmark this to make
sure performance doesn't degrade from lock contention if lots of
requests are submitted in parallel
Having 1 extra kthread per server as overhead doesn't seem too bad to
me. There's already an upper margin of imprecision with the timer
implementation, so I don't see the kthread scheduling imprecision as a
big deal.
I'll restructure this to use a kthread in v10. If anyone has thoughts
on a better or more preferred approach though, would love to hear
them.
Thanks,
Joanne
[1] https://lore.kernel.org/linux-fsdevel/20241007184258.2837492-3-joannelkoong@gmail.com/
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
2024-12-06 0:06 ` Joanne Koong
@ 2024-12-06 4:28 ` Sergey Senozhatsky
0 siblings, 0 replies; 39+ messages in thread
From: Sergey Senozhatsky @ 2024-12-06 4:28 UTC (permalink / raw)
To: Joanne Koong
Cc: Sergey Senozhatsky, miklos, linux-fsdevel, josef, bernd.schubert,
jefflexu, laoar.shao, kernel-team, Bernd Schubert, Tomasz Figa
On (24/12/05 16:06), Joanne Koong wrote:
> Interesting! Thanks for noting this.
>
> It looks like the choices we have here then are to either:
>
> * have this run in a kthread like hung-task watchdog, as you mentioned above
Works for me.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
2024-11-14 19:13 ` [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests Joanne Koong
2024-11-28 10:44 ` Sergey Senozhatsky
2024-12-03 11:01 ` Sergey Senozhatsky
@ 2024-12-04 13:14 ` Sergey Senozhatsky
2024-12-05 23:10 ` Joanne Koong
2024-12-04 13:47 ` Sergey Senozhatsky
` (4 subsequent siblings)
7 siblings, 1 reply; 39+ messages in thread
From: Sergey Senozhatsky @ 2024-12-04 13:14 UTC (permalink / raw)
To: Joanne Koong
Cc: miklos, linux-fsdevel, josef, bernd.schubert, jefflexu,
laoar.shao, kernel-team, Bernd Schubert, Tomasz Figa
On (24/11/14 11:13), Joanne Koong wrote:
[..]
> @@ -920,6 +935,9 @@ struct fuse_conn {
> /** IDR for backing files ids */
> struct idr backing_files_map;
> #endif
> +
> + /** Only used if the connection enforces request timeouts */
> + struct fuse_timeout timeout;
> };
[..]
> @@ -749,6 +750,7 @@ static const struct fs_parameter_spec fuse_fs_parameters[] = {
> fsparam_u32 ("max_read", OPT_MAX_READ),
> fsparam_u32 ("blksize", OPT_BLKSIZE),
> fsparam_string ("subtype", OPT_SUBTYPE),
> + fsparam_u16 ("request_timeout", OPT_REQUEST_TIMEOUT),
> {}
> };
>
> @@ -844,6 +846,10 @@ static int fuse_parse_param(struct fs_context *fsc, struct fs_parameter *param)
> ctx->blksize = result.uint_32;
> break;
>
> + case OPT_REQUEST_TIMEOUT:
> + ctx->req_timeout = result.uint_16;
> + break;
> +
A quick question: so for this user-space should be updated
to request fuse-watchdog on particular connection? Would
it make sense to have a way to simply enforce watchdog on
all connections?
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
2024-12-04 13:14 ` Sergey Senozhatsky
@ 2024-12-05 23:10 ` Joanne Koong
0 siblings, 0 replies; 39+ messages in thread
From: Joanne Koong @ 2024-12-05 23:10 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: miklos, linux-fsdevel, josef, bernd.schubert, jefflexu,
laoar.shao, kernel-team, Bernd Schubert, Tomasz Figa
On Wed, Dec 4, 2024 at 5:14 AM Sergey Senozhatsky
<senozhatsky@chromium.org> wrote:
>
> On (24/11/14 11:13), Joanne Koong wrote:
> [..]
> > @@ -920,6 +935,9 @@ struct fuse_conn {
> > /** IDR for backing files ids */
> > struct idr backing_files_map;
> > #endif
> > +
> > + /** Only used if the connection enforces request timeouts */
> > + struct fuse_timeout timeout;
> > };
> [..]
> > @@ -749,6 +750,7 @@ static const struct fs_parameter_spec fuse_fs_parameters[] = {
> > fsparam_u32 ("max_read", OPT_MAX_READ),
> > fsparam_u32 ("blksize", OPT_BLKSIZE),
> > fsparam_string ("subtype", OPT_SUBTYPE),
> > + fsparam_u16 ("request_timeout", OPT_REQUEST_TIMEOUT),
> > {}
> > };
> >
> > @@ -844,6 +846,10 @@ static int fuse_parse_param(struct fs_context *fsc, struct fs_parameter *param)
> > ctx->blksize = result.uint_32;
> > break;
> >
> > + case OPT_REQUEST_TIMEOUT:
> > + ctx->req_timeout = result.uint_16;
> > + break;
> > +
>
> A quick question: so for this user-space should be updated
> to request fuse-watchdog on particular connection? Would
> it make sense to have a way to simply enforce watchdog on
> all connections?
Hi Sergey,
The third patch
(https://lore.kernel.org/linux-fsdevel/20241114191332.669127-4-joannelkoong@gmail.com/)
in this patchset adds this through the sysctl interface. The sys admin
can set /proc/sys/fs/fuse/max_request_timeout and this will ensure
requests don't take longer than this value.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
2024-11-14 19:13 ` [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests Joanne Koong
` (2 preceding siblings ...)
2024-12-04 13:14 ` Sergey Senozhatsky
@ 2024-12-04 13:47 ` Sergey Senozhatsky
2024-12-05 23:29 ` Joanne Koong
2024-12-06 0:37 ` Jeff Layton
` (3 subsequent siblings)
7 siblings, 1 reply; 39+ messages in thread
From: Sergey Senozhatsky @ 2024-12-04 13:47 UTC (permalink / raw)
To: Joanne Koong
Cc: miklos, linux-fsdevel, josef, bernd.schubert, jefflexu,
laoar.shao, kernel-team, Bernd Schubert, Tomasz Figa
On (24/11/14 11:13), Joanne Koong wrote:
>
> +static bool request_expired(struct fuse_conn *fc, struct fuse_req *req)
> +{
> + return jiffies > req->create_time + fc->timeout.req_timeout;
> +}
With jiffies we need to use time_after() and such, so we'd deal
with jiffies wrap-around.
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
2024-12-04 13:47 ` Sergey Senozhatsky
@ 2024-12-05 23:29 ` Joanne Koong
2024-12-06 3:26 ` Sergey Senozhatsky
0 siblings, 1 reply; 39+ messages in thread
From: Joanne Koong @ 2024-12-05 23:29 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: miklos, linux-fsdevel, josef, bernd.schubert, jefflexu,
laoar.shao, kernel-team, Bernd Schubert, Tomasz Figa
On Wed, Dec 4, 2024 at 5:47 AM Sergey Senozhatsky
<senozhatsky@chromium.org> wrote:
>
> On (24/11/14 11:13), Joanne Koong wrote:
> >
> > +static bool request_expired(struct fuse_conn *fc, struct fuse_req *req)
> > +{
> > + return jiffies > req->create_time + fc->timeout.req_timeout;
> > +}
>
> With jiffies we need to use time_after() and such, so we'd deal
> with jiffies wrap-around.
Ohh I see, I guess this is because on 32-bit systems unsigned longs
are 4 bytes? Thanks for noting - I'll push out a new version with this
change.
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
2024-12-05 23:29 ` Joanne Koong
@ 2024-12-06 3:26 ` Sergey Senozhatsky
0 siblings, 0 replies; 39+ messages in thread
From: Sergey Senozhatsky @ 2024-12-06 3:26 UTC (permalink / raw)
To: Joanne Koong
Cc: Sergey Senozhatsky, miklos, linux-fsdevel, josef, bernd.schubert,
jefflexu, laoar.shao, kernel-team, Bernd Schubert, Tomasz Figa
On (24/12/05 15:29), Joanne Koong wrote:
> On Wed, Dec 4, 2024 at 5:47 AM Sergey Senozhatsky
> <senozhatsky@chromium.org> wrote:
> >
> > On (24/11/14 11:13), Joanne Koong wrote:
> > >
> > > +static bool request_expired(struct fuse_conn *fc, struct fuse_req *req)
> > > +{
> > > + return jiffies > req->create_time + fc->timeout.req_timeout;
> > > +}
> >
> > With jiffies we need to use time_after() and such, so we'd deal
> > with jiffies wrap-around.
>
> Ohh I see, I guess this is because on 32-bit systems unsigned longs
> are 4 bytes?
Correct, IIRC on 32bit system jiffies wraparound every 47 or 49 days.
But I also recall that jiffies (at least in the past) were initialized
to -5 minutes so that they would wraparound during first 5 minutes
of uptime to reveal bugs. I don't know if it's still the case though.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
2024-11-14 19:13 ` [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests Joanne Koong
` (3 preceding siblings ...)
2024-12-04 13:47 ` Sergey Senozhatsky
@ 2024-12-06 0:37 ` Jeff Layton
2024-12-06 19:19 ` Joanne Koong
2024-12-06 16:30 ` Etienne Martineau
` (2 subsequent siblings)
7 siblings, 1 reply; 39+ messages in thread
From: Jeff Layton @ 2024-12-06 0:37 UTC (permalink / raw)
To: Joanne Koong, miklos, linux-fsdevel, Sergey Senozhatsky
Cc: josef, bernd.schubert, jefflexu, laoar.shao, kernel-team,
Bernd Schubert
On Thu, 2024-11-14 at 11:13 -0800, Joanne Koong wrote:
> There are situations where fuse servers can become unresponsive or
> stuck, for example if the server is deadlocked. Currently, there's no
> good way to detect if a server is stuck and needs to be killed manually.
>
> This commit adds an option for enforcing a timeout (in minutes) for
> requests where if the timeout elapses without the server responding to
> the request, the connection will be automatically aborted.
>
I haven't been keeping up with the earlier series, but I think I agree
with Sergey that this timeout would be better expressed in seconds.
Most filesystems that deal with timeouts (NFS, CIFS, etc.) specify them
as a number of seconds, and expressing this in minutes goes against
that convention. It also seems rather coarse-grained. I could easily
see a situation where 5 minutes is too short, but 6 minutes is too
long.
With that too, you probably wouldn't need patch #1. You could treat it
as a 32-bit integer and just clamp the value as necessary.
> Please note that these timeouts are not 100% precise. The request may
> take an extra FUSE_TIMEOUT_TIMER_FREQ seconds beyond the requested max
> timeout due to how it's internally implemented.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> Reviewed-by: Bernd Schubert <bschubert@ddn.com>
> ---
> fs/fuse/dev.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++
> fs/fuse/fuse_i.h | 21 +++++++++++++
> fs/fuse/inode.c | 21 +++++++++++++
> 3 files changed, 122 insertions(+)
>
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index 29fc61a072ba..536aa4525e8f 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -45,6 +45,82 @@ static struct fuse_dev *fuse_get_dev(struct file *file)
> return READ_ONCE(file->private_data);
> }
>
> +static bool request_expired(struct fuse_conn *fc, struct fuse_req *req)
> +{
> + return jiffies > req->create_time + fc->timeout.req_timeout;
> +}
> +
> +/*
> + * Check if any requests aren't being completed by the specified request
> + * timeout. To do so, we:
> + * - check the fiq pending list
> + * - check the bg queue
> + * - check the fpq io and processing lists
> + *
> + * To make this fast, we only check against the head request on each list since
> + * these are generally queued in order of creation time (eg newer requests get
> + * queued to the tail). We might miss a few edge cases (eg requests transitioning
> + * between lists, re-sent requests at the head of the pending list having a
> + * later creation time than other requests on that list, etc.) but that is fine
> + * since if the request never gets fulfilled, it will eventually be caught.
> + */
> +void fuse_check_timeout(struct timer_list *timer)
> +{
> + struct fuse_conn *fc = container_of(timer, struct fuse_conn, timeout.timer);
> + struct fuse_iqueue *fiq = &fc->iq;
> + struct fuse_req *req;
> + struct fuse_dev *fud;
> + struct fuse_pqueue *fpq;
> + bool expired = false;
> + int i;
> +
> + spin_lock(&fiq->lock);
> + req = list_first_entry_or_null(&fiq->pending, struct fuse_req, list);
> + if (req)
> + expired = request_expired(fc, req);
> + spin_unlock(&fiq->lock);
> + if (expired)
> + goto abort_conn;
> +
> + spin_lock(&fc->bg_lock);
> + req = list_first_entry_or_null(&fc->bg_queue, struct fuse_req, list);
> + if (req)
> + expired = request_expired(fc, req);
> + spin_unlock(&fc->bg_lock);
> + if (expired)
> + goto abort_conn;
> +
> + spin_lock(&fc->lock);
> + if (!fc->connected) {
> + spin_unlock(&fc->lock);
> + return;
> + }
> + list_for_each_entry(fud, &fc->devices, entry) {
> + fpq = &fud->pq;
> + spin_lock(&fpq->lock);
> + req = list_first_entry_or_null(&fpq->io, struct fuse_req, list);
> + if (req && request_expired(fc, req))
> + goto fpq_abort;
> +
> + for (i = 0; i < FUSE_PQ_HASH_SIZE; i++) {
> + req = list_first_entry_or_null(&fpq->processing[i], struct fuse_req, list);
> + if (req && request_expired(fc, req))
> + goto fpq_abort;
> + }
> + spin_unlock(&fpq->lock);
> + }
> + spin_unlock(&fc->lock);
> +
> + mod_timer(&fc->timeout.timer, jiffies + FUSE_TIMEOUT_TIMER_FREQ);
> + return;
> +
> +fpq_abort:
> + spin_unlock(&fpq->lock);
> + spin_unlock(&fc->lock);
> +abort_conn:
> + fuse_abort_conn(fc);
> +}
> +
> static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
> {
> INIT_LIST_HEAD(&req->list);
> @@ -53,6 +129,7 @@ static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
> refcount_set(&req->count, 1);
> __set_bit(FR_PENDING, &req->flags);
> req->fm = fm;
> + req->create_time = jiffies;
> }
>
> static struct fuse_req *fuse_request_alloc(struct fuse_mount *fm, gfp_t flags)
> @@ -2308,6 +2385,9 @@ void fuse_abort_conn(struct fuse_conn *fc)
> spin_unlock(&fc->lock);
>
> end_requests(&to_end);
> +
> + if (fc->timeout.req_timeout)
> + timer_delete(&fc->timeout.timer);
> } else {
> spin_unlock(&fc->lock);
> }
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index d35c37ccf9b5..9092201c4e0b 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -438,6 +438,9 @@ struct fuse_req {
>
> /** fuse_mount this request belongs to */
> struct fuse_mount *fm;
> +
> + /** When (in jiffies) the request was created */
> + unsigned long create_time;
> };
>
> struct fuse_iqueue;
> @@ -528,6 +531,16 @@ struct fuse_pqueue {
> struct list_head io;
> };
>
> +/* Frequency (in seconds) of request timeout checks, if opted into */
> +#define FUSE_TIMEOUT_TIMER_FREQ 60 * HZ
> +
> +struct fuse_timeout {
> + struct timer_list timer;
> +
> + /* Request timeout (in jiffies). 0 = no timeout */
> + unsigned long req_timeout;
> +};
> +
> /**
> * Fuse device instance
> */
> @@ -574,6 +587,8 @@ struct fuse_fs_context {
> enum fuse_dax_mode dax_mode;
> unsigned int max_read;
> unsigned int blksize;
> + /* Request timeout (in minutes). 0 = no timeout (infinite wait) */
> + unsigned int req_timeout;
> const char *subtype;
>
> /* DAX device, may be NULL */
> @@ -920,6 +935,9 @@ struct fuse_conn {
> /** IDR for backing files ids */
> struct idr backing_files_map;
> #endif
> +
> + /** Only used if the connection enforces request timeouts */
> + struct fuse_timeout timeout;
> };
>
> /*
> @@ -1181,6 +1199,9 @@ void fuse_request_end(struct fuse_req *req);
> void fuse_abort_conn(struct fuse_conn *fc);
> void fuse_wait_aborted(struct fuse_conn *fc);
>
> +/* Check if any requests timed out */
> +void fuse_check_timeout(struct timer_list *timer);
> +
> /**
> * Invalidate inode attributes
> */
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index f1779ff3f8d1..ee006f09cd04 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -735,6 +735,7 @@ enum {
> OPT_ALLOW_OTHER,
> OPT_MAX_READ,
> OPT_BLKSIZE,
> + OPT_REQUEST_TIMEOUT,
> OPT_ERR
> };
>
> @@ -749,6 +750,7 @@ static const struct fs_parameter_spec fuse_fs_parameters[] = {
> fsparam_u32 ("max_read", OPT_MAX_READ),
> fsparam_u32 ("blksize", OPT_BLKSIZE),
> fsparam_string ("subtype", OPT_SUBTYPE),
> + fsparam_u16 ("request_timeout", OPT_REQUEST_TIMEOUT),
> {}
> };
>
> @@ -844,6 +846,10 @@ static int fuse_parse_param(struct fs_context *fsc, struct fs_parameter *param)
> ctx->blksize = result.uint_32;
> break;
>
> + case OPT_REQUEST_TIMEOUT:
> + ctx->req_timeout = result.uint_16;
> + break;
> +
> default:
> return -EINVAL;
> }
> @@ -973,6 +979,8 @@ void fuse_conn_put(struct fuse_conn *fc)
>
> if (IS_ENABLED(CONFIG_FUSE_DAX))
> fuse_dax_conn_free(fc);
> + if (fc->timeout.req_timeout)
> + timer_shutdown_sync(&fc->timeout.timer);
> if (fiq->ops->release)
> fiq->ops->release(fiq);
> put_pid_ns(fc->pid_ns);
> @@ -1691,6 +1699,18 @@ int fuse_init_fs_context_submount(struct fs_context *fsc)
> }
> EXPORT_SYMBOL_GPL(fuse_init_fs_context_submount);
>
> +static void fuse_init_fc_timeout(struct fuse_conn *fc, struct fuse_fs_context *ctx)
> +{
> + if (ctx->req_timeout) {
> + if (check_mul_overflow(ctx->req_timeout * 60, HZ, &fc->timeout.req_timeout))
> + fc->timeout.req_timeout = ULONG_MAX;
> + timer_setup(&fc->timeout.timer, fuse_check_timeout, 0);
> + mod_timer(&fc->timeout.timer, jiffies + FUSE_TIMEOUT_TIMER_FREQ);
> + } else {
> + fc->timeout.req_timeout = 0;
> + }
> +}
> +
> int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
> {
> struct fuse_dev *fud = NULL;
> @@ -1753,6 +1773,7 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
> fc->destroy = ctx->destroy;
> fc->no_control = ctx->no_control;
> fc->no_force_umount = ctx->no_force_umount;
> + fuse_init_fc_timeout(fc, ctx);
>
> err = -ENOMEM;
> root = fuse_get_root_inode(sb, ctx->rootmode);
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
2024-12-06 0:37 ` Jeff Layton
@ 2024-12-06 19:19 ` Joanne Koong
2024-12-06 20:06 ` Jeff Layton
0 siblings, 1 reply; 39+ messages in thread
From: Joanne Koong @ 2024-12-06 19:19 UTC (permalink / raw)
To: Jeff Layton
Cc: miklos, linux-fsdevel, Sergey Senozhatsky, josef, bernd.schubert,
jefflexu, laoar.shao, kernel-team, Bernd Schubert
On Thu, Dec 5, 2024 at 4:37 PM Jeff Layton <jlayton@kernel.org> wrote:
>
> On Thu, 2024-11-14 at 11:13 -0800, Joanne Koong wrote:
> > There are situations where fuse servers can become unresponsive or
> > stuck, for example if the server is deadlocked. Currently, there's no
> > good way to detect if a server is stuck and needs to be killed manually.
> >
> > This commit adds an option for enforcing a timeout (in minutes) for
> > requests where if the timeout elapses without the server responding to
> > the request, the connection will be automatically aborted.
> >
>
> I haven't been keeping up with the earlier series, but I think I agree
> with Sergey that this timeout would be better expressed in seconds.
>
> Most filesystems that deal with timeouts (NFS, CIFS, etc.) specify them
> as a number of seconds, and expressing this in minutes goes against
> that convention. It also seems rather coarse-grained. I could easily
> see a situation where 5 minutes is too short, but 6 minutes is too
> long.
Sounds good, I'll change the timeout to seconds. The reason it was set
in minutes is because the timeouts have an upper margin of error
(right now, up to 1 minute) and I didn't want to give a misleading
illusion of precision.
It sounds like it'd be useful if the timer is run more frequently (eg
instead of every 1 minute, maybe running this every 15 or 30 secs) as
well. I'll make this change in the next version.
>
> With that too, you probably wouldn't need patch #1. You could treat it
> as a 32-bit integer and just clamp the value as necessary.
>
> > Please note that these timeouts are not 100% precise. The request may
> > take an extra FUSE_TIMEOUT_TIMER_FREQ seconds beyond the requested max
> > timeout due to how it's internally implemented.
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > Reviewed-by: Bernd Schubert <bschubert@ddn.com>
> > ---
> > fs/fuse/dev.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++
> > fs/fuse/fuse_i.h | 21 +++++++++++++
> > fs/fuse/inode.c | 21 +++++++++++++
> > 3 files changed, 122 insertions(+)
> >
...
>
> --
> Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
2024-12-06 19:19 ` Joanne Koong
@ 2024-12-06 20:06 ` Jeff Layton
0 siblings, 0 replies; 39+ messages in thread
From: Jeff Layton @ 2024-12-06 20:06 UTC (permalink / raw)
To: Joanne Koong
Cc: miklos, linux-fsdevel, Sergey Senozhatsky, josef, bernd.schubert,
jefflexu, laoar.shao, kernel-team, Bernd Schubert
On Fri, 2024-12-06 at 11:19 -0800, Joanne Koong wrote:
> On Thu, Dec 5, 2024 at 4:37 PM Jeff Layton <jlayton@kernel.org> wrote:
> >
> > On Thu, 2024-11-14 at 11:13 -0800, Joanne Koong wrote:
> > > There are situations where fuse servers can become unresponsive or
> > > stuck, for example if the server is deadlocked. Currently, there's no
> > > good way to detect if a server is stuck and needs to be killed manually.
> > >
> > > This commit adds an option for enforcing a timeout (in minutes) for
> > > requests where if the timeout elapses without the server responding to
> > > the request, the connection will be automatically aborted.
> > >
> >
> > I haven't been keeping up with the earlier series, but I think I agree
> > with Sergey that this timeout would be better expressed in seconds.
> >
> > Most filesystems that deal with timeouts (NFS, CIFS, etc.) specify them
> > as a number of seconds, and expressing this in minutes goes against
> > that convention. It also seems rather coarse-grained. I could easily
> > see a situation where 5 minutes is too short, but 6 minutes is too
> > long.
>
> Sounds good, I'll change the timeout to seconds. The reason it was set
> in minutes is because the timeouts have an upper margin of error
> (right now, up to 1 minute) and I didn't want to give a misleading
> illusion of precision.
>
Understood. That is a concern, but I think you can mitigate it by
documenting that the timeout just makes it eligible to be reaped,
rather than a hard timeout guarantee. That would also allow you to
change the implementation in the future to reap more (or less)
aggressively, without breaking the interface.
> It sounds like it'd be useful if the timer is run more frequently (eg
> instead of every 1 minute, maybe running this every 15 or 30 secs) as
> well. I'll make this change in the next version.
>
Yeah, 1 minute seems like quite a long time. I don't know FUSE well
enough to get a feel for how heavyweight fuse_check_timeout() is, but
it might be good to check more often, especially if you don't expect it
to take long in the common case.
> >
> > With that too, you probably wouldn't need patch #1. You could treat it
> > as a 32-bit integer and just clamp the value as necessary.
> >
> > > Please note that these timeouts are not 100% precise. The request may
> > > take an extra FUSE_TIMEOUT_TIMER_FREQ seconds beyond the requested max
> > > timeout due to how it's internally implemented.
> > >
> > > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > > Reviewed-by: Bernd Schubert <bschubert@ddn.com>
> > > ---
> > > fs/fuse/dev.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++
> > > fs/fuse/fuse_i.h | 21 +++++++++++++
> > > fs/fuse/inode.c | 21 +++++++++++++
> > > 3 files changed, 122 insertions(+)
> > >
> ...
> >
> > --
> > Jeff Layton <jlayton@kernel.org>
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
2024-11-14 19:13 ` [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests Joanne Koong
` (4 preceding siblings ...)
2024-12-06 0:37 ` Jeff Layton
@ 2024-12-06 16:30 ` Etienne Martineau
2024-12-06 19:05 ` Joanne Koong
2024-12-06 20:12 ` Jeff Layton
7 siblings, 0 replies; 39+ messages in thread
From: Etienne Martineau @ 2024-12-06 16:30 UTC (permalink / raw)
To: joannelkoong, miklos, linux-fsdevel
Cc: bernd.schubert, bschubert, jefflexu, josef, kernel-team,
laoar.shao
From: Joanne Koong <joannelkoong@gmail.com>
> There are situations where fuse servers can become unresponsive or
> stuck, for example if the server is deadlocked. Currently, there's no
> good way to detect if a server is stuck and needs to be killed manually.
I tried this V9 patchset and realized that upon max_request_timeout the connection will be dropped irrespectively if the task is in D state or not. I guess this is expected behavior.
To me the concerning aspect is when tasks are going in D state because of the consequence when running with hung_task_timeout_secs and hung_task_panic=1.
>
> This commit adds an option for enforcing a timeout (in minutes) for
> requests where if the timeout elapses without the server responding to
> the request, the connection will be automatically aborted.
>
> Please note that these timeouts are not 100% precise. The request may
> take an extra FUSE_TIMEOUT_TIMER_FREQ seconds beyond the requested max
> timeout due to how it's internally implemented.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> Reviewed-by: Bernd Schubert <bschubert@ddn.com>
> ---
> fs/fuse/dev.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++
> fs/fuse/fuse_i.h | 21 +++++++++++++
> fs/fuse/inode.c | 21 +++++++++++++
> 3 files changed, 122 insertions(+)
>
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index 29fc61a072ba..536aa4525e8f 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -45,6 +45,82 @@ static struct fuse_dev *fuse_get_dev(struct file *file)
> return READ_ONCE(file->private_data);
> }
>
> +static bool request_expired(struct fuse_conn *fc, struct fuse_req *req)
> +{
> + return jiffies > req->create_time + fc->timeout.req_timeout;
> +}
> +
> +/*
> + * Check if any requests aren't being completed by the specified request
> + * timeout. To do so, we:
> + * - check the fiq pending list
> + * - check the bg queue
> + * - check the fpq io and processing lists
> + *
> + * To make this fast, we only check against the head request on each list since
> + * these are generally queued in order of creation time (eg newer requests get
> + * queued to the tail). We might miss a few edge cases (eg requests transitioning
> + * between lists, re-sent requests at the head of the pending list having a
> + * later creation time than other requests on that list, etc.) but that is fine
> + * since if the request never gets fulfilled, it will eventually be caught.
> + */
> +void fuse_check_timeout(struct timer_list *timer)
> +{
> + struct fuse_conn *fc = container_of(timer, struct fuse_conn, timeout.timer);
Here this timer may get queued and if echo 1 > /sys/fs/fuse/connections/'nn'/abort is done at more or less the same time over the same connection I'm wondering what will happen?
At least I think we may need timer_delete_sync() instead of timer_delete() in fuse_abort_conn() and potentially call it from the top of fuse_abort_conn() instead.
> + struct fuse_iqueue *fiq = &fc->iq;
> + struct fuse_req *req;
> + struct fuse_dev *fud;
> + struct fuse_pqueue *fpq;
> + bool expired = false;
> + int i;
> +
> + spin_lock(&fiq->lock);
Do we need spin_lock_irq instead bcos this is irq context no?
>
> +static void fuse_init_fc_timeout(struct fuse_conn *fc, struct fuse_fs_context *ctx)
> +{
> + if (ctx->req_timeout) {
> + if (check_mul_overflow(ctx->req_timeout * 60, HZ, &fc->timeout.req_timeout))
> + fc->timeout.req_timeout = ULONG_MAX;
> + timer_setup(&fc->timeout.timer, fuse_check_timeout, 0);
> + mod_timer(&fc->timeout.timer, jiffies + FUSE_TIMEOUT_TIMER_FREQ);
> + } else {
> + fc->timeout.req_timeout = 0;
> + }
> +}
> +
> int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
> {
> struct fuse_dev *fud = NULL;
> @@ -1753,6 +1773,7 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
> fc->destroy = ctx->destroy;
> fc->no_control = ctx->no_control;
> fc->no_force_umount = ctx->no_force_umount;
> + fuse_init_fc_timeout(fc, ctx);
The max_request_timeout is latched in fc->timeout.req_timeout during fuse_fill_super_common() so any further modification are not taking into effect
Say:
sudo bash -c 'echo 100 > /proc/sys/fs/fuse/max_request_timeout'
sshfs -o allow_other,default_permissions you@localhost:/home/you/test ./mnt
kill -STOP `pidof /usr/lib/openssh/sftp-server`
ls ./mnt/
^C
# Trying to change back the timeout doesn't have any effect
sudo bash -c 'echo 10 > /proc/sys/fs/fuse/max_request_timeout'
Thanks
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
2024-11-14 19:13 ` [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests Joanne Koong
` (5 preceding siblings ...)
2024-12-06 16:30 ` Etienne Martineau
@ 2024-12-06 19:05 ` Joanne Koong
2024-12-06 19:56 ` Etienne
2024-12-06 20:12 ` Jeff Layton
7 siblings, 1 reply; 39+ messages in thread
From: Joanne Koong @ 2024-12-06 19:05 UTC (permalink / raw)
To: Etienne Martineau
Cc: miklos, linux-fsdevel, bernd.schubert, bschubert, jefflexu, josef,
kernel-team, laoar.shao
On Fri, Dec 6, 2024 at 8:31 AM Etienne Martineau <etmartin4313@gmail.com> wrote:
>
> From: Joanne Koong <joannelkoong@gmail.com>
>
> > There are situations where fuse servers can become unresponsive or
> > stuck, for example if the server is deadlocked. Currently, there's no
> > good way to detect if a server is stuck and needs to be killed manually.
>
> I tried this V9 patchset and realized that upon max_request_timeout the connection will be dropped irrespectively if the task is in D state or not. I guess this is expected behavior.
Yes, the connection will be dropped regardless of what state the task is in.
> To me the concerning aspect is when tasks are going in D state because of the consequence when running with hung_task_timeout_secs and hung_task_panic=1.
Could you elaborate on this a bit more? When running with
hung_task_timeout_secs and hung_task_panic=1, how does it cause the
task to go into D state?
>
> >
> > This commit adds an option for enforcing a timeout (in minutes) for
> > requests where if the timeout elapses without the server responding to
> > the request, the connection will be automatically aborted.
> >
> > Please note that these timeouts are not 100% precise. The request may
> > take an extra FUSE_TIMEOUT_TIMER_FREQ seconds beyond the requested max
> > timeout due to how it's internally implemented.
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > Reviewed-by: Bernd Schubert <bschubert@ddn.com>
> > ---
> > fs/fuse/dev.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++
> > fs/fuse/fuse_i.h | 21 +++++++++++++
> > fs/fuse/inode.c | 21 +++++++++++++
> > 3 files changed, 122 insertions(+)
> >
> > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> > index 29fc61a072ba..536aa4525e8f 100644
> > --- a/fs/fuse/dev.c
> > +++ b/fs/fuse/dev.c
> > @@ -45,6 +45,82 @@ static struct fuse_dev *fuse_get_dev(struct file *file)
> > return READ_ONCE(file->private_data);
> > }
> >
> > +static bool request_expired(struct fuse_conn *fc, struct fuse_req *req)
> > +{
> > + return jiffies > req->create_time + fc->timeout.req_timeout;
> > +}
> > +
> > +/*
> > + * Check if any requests aren't being completed by the specified request
> > + * timeout. To do so, we:
> > + * - check the fiq pending list
> > + * - check the bg queue
> > + * - check the fpq io and processing lists
> > + *
> > + * To make this fast, we only check against the head request on each list since
> > + * these are generally queued in order of creation time (eg newer requests get
> > + * queued to the tail). We might miss a few edge cases (eg requests transitioning
> > + * between lists, re-sent requests at the head of the pending list having a
> > + * later creation time than other requests on that list, etc.) but that is fine
> > + * since if the request never gets fulfilled, it will eventually be caught.
> > + */
> > +void fuse_check_timeout(struct timer_list *timer)
> > +{
> > + struct fuse_conn *fc = container_of(timer, struct fuse_conn, timeout.timer);
> Here this timer may get queued and if echo 1 > /sys/fs/fuse/connections/'nn'/abort is done at more or less the same time over the same connection I'm wondering what will happen?
> At least I think we may need timer_delete_sync() instead of timer_delete() in fuse_abort_conn() and potentially call it from the top of fuse_abort_conn() instead.
I don't think this is an issue because there's still a reference on
the "struct fuse_conn" when fuse_abort_conn() is called. The fuse_conn
struct is freed in fuse_conn_put() when the last refcount is dropped,
and in fuse_conn_put() there's this line
if (fc->timeout.req_timeout)
timer_shutdown_sync(&fc->timeout.timer);
that guarantees the timer is not queued / callback of timer is not
running / cannot be rearmed.
>
> > + struct fuse_iqueue *fiq = &fc->iq;
> > + struct fuse_req *req;
> > + struct fuse_dev *fud;
> > + struct fuse_pqueue *fpq;
> > + bool expired = false;
> > + int i;
> > +
> > + spin_lock(&fiq->lock);
> Do we need spin_lock_irq instead bcos this is irq context no?
Yeah, I missed that spin_lock doesn't disable interrupts. Sergey noted
this as well in [1] and for the next iteration of this patchset, I'm
planning to use use a kthread per server connection, similar to what
the hung task watchdog does.
[1] https://lore.kernel.org/linux-fsdevel/CAJnrk1bwYjHGGNsPwjsd5Kp3iL6ua_hmyN3kFZo1b8OVV9bOpw@mail.gmail.com/
>
> >
> > +static void fuse_init_fc_timeout(struct fuse_conn *fc, struct fuse_fs_context *ctx)
> > +{
> > + if (ctx->req_timeout) {
> > + if (check_mul_overflow(ctx->req_timeout * 60, HZ, &fc->timeout.req_timeout))
> > + fc->timeout.req_timeout = ULONG_MAX;
> > + timer_setup(&fc->timeout.timer, fuse_check_timeout, 0);
> > + mod_timer(&fc->timeout.timer, jiffies + FUSE_TIMEOUT_TIMER_FREQ);
> > + } else {
> > + fc->timeout.req_timeout = 0;
> > + }
> > +}
> > +
> > int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
> > {
> > struct fuse_dev *fud = NULL;
> > @@ -1753,6 +1773,7 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
> > fc->destroy = ctx->destroy;
> > fc->no_control = ctx->no_control;
> > fc->no_force_umount = ctx->no_force_umount;
> > + fuse_init_fc_timeout(fc, ctx);
>
> The max_request_timeout is latched in fc->timeout.req_timeout during fuse_fill_super_common() so any further modification are not taking into effect
> Say:
> sudo bash -c 'echo 100 > /proc/sys/fs/fuse/max_request_timeout'
> sshfs -o allow_other,default_permissions you@localhost:/home/you/test ./mnt
> kill -STOP `pidof /usr/lib/openssh/sftp-server`
> ls ./mnt/
> ^C
> # Trying to change back the timeout doesn't have any effect
> sudo bash -c 'echo 10 > /proc/sys/fs/fuse/max_request_timeout'
>
This is expected behavior. The timeout that's set at mount time will
be the timeout for the duration of the connection.
Thanks,
Joanne
> Thanks
>
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
2024-12-06 19:05 ` Joanne Koong
@ 2024-12-06 19:56 ` Etienne
2024-12-06 21:52 ` Joanne Koong
0 siblings, 1 reply; 39+ messages in thread
From: Etienne @ 2024-12-06 19:56 UTC (permalink / raw)
To: Joanne Koong
Cc: miklos, linux-fsdevel, bernd.schubert, bschubert, jefflexu, josef,
kernel-team, laoar.shao
> > I tried this V9 patchset and realized that upon max_request_timeout the connection will be dropped irrespectively if the task is in D state or not. I guess this is expected behavior.
>
> Yes, the connection will be dropped regardless of what state the task is in.
Thanks for confirmation
>
> > To me the concerning aspect is when tasks are going in D state because of the consequence when running with hung_task_timeout_secs and hung_task_panic=1.
>
> Could you elaborate on this a bit more? When running with
> hung_task_timeout_secs and hung_task_panic=1, how does it cause the
> task to go into D state?
Sorry for the confusion. It doesn't cause tasks to go in D state.
What I meant is that I've been looking for a way to terminate tasks
stuck in D state because we have hung_task_panic=1 and this is causing
bad consequences when they trigger the hung task timer.
> > Here this timer may get queued and if echo 1 > /sys/fs/fuse/connections/'nn'/abort is done at more or less the same time over the same connection I'm wondering what will happen?
> > At least I think we may need timer_delete_sync() instead of timer_delete() in fuse_abort_conn() and potentially call it from the top of fuse_abort_conn() instead.
>
> I don't think this is an issue because there's still a reference on
> the "struct fuse_conn" when fuse_abort_conn() is called. The fuse_conn
> struct is freed in fuse_conn_put() when the last refcount is dropped,
> and in fuse_conn_put() there's this line
>
> if (fc->timeout.req_timeout)
> timer_shutdown_sync(&fc->timeout.timer);
>
> that guarantees the timer is not queued / callback of timer is not
> running / cannot be rearmed.
Got it. I missed that last part in fuse_conn_put()
Thanks
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
2024-12-06 19:56 ` Etienne
@ 2024-12-06 21:52 ` Joanne Koong
0 siblings, 0 replies; 39+ messages in thread
From: Joanne Koong @ 2024-12-06 21:52 UTC (permalink / raw)
To: Etienne
Cc: miklos, linux-fsdevel, bernd.schubert, bschubert, jefflexu, josef,
kernel-team, laoar.shao
On Fri, Dec 6, 2024 at 11:56 AM Etienne <etmartin4313@gmail.com> wrote:
>
> > > I tried this V9 patchset and realized that upon max_request_timeout the connection will be dropped irrespectively if the task is in D state or not. I guess this is expected behavior.
> >
> > Yes, the connection will be dropped regardless of what state the task is in.
> Thanks for confirmation
>
> >
> > > To me the concerning aspect is when tasks are going in D state because of the consequence when running with hung_task_timeout_secs and hung_task_panic=1.
> >
> > Could you elaborate on this a bit more? When running with
> > hung_task_timeout_secs and hung_task_panic=1, how does it cause the
> > task to go into D state?
>
> Sorry for the confusion. It doesn't cause tasks to go in D state.
> What I meant is that I've been looking for a way to terminate tasks
> stuck in D state because we have hung_task_panic=1 and this is causing
> bad consequences when they trigger the hung task timer.
Gotcha, thanks for clarifying!
>
> > > Here this timer may get queued and if echo 1 > /sys/fs/fuse/connections/'nn'/abort is done at more or less the same time over the same connection I'm wondering what will happen?
> > > At least I think we may need timer_delete_sync() instead of timer_delete() in fuse_abort_conn() and potentially call it from the top of fuse_abort_conn() instead.
> >
> > I don't think this is an issue because there's still a reference on
> > the "struct fuse_conn" when fuse_abort_conn() is called. The fuse_conn
> > struct is freed in fuse_conn_put() when the last refcount is dropped,
> > and in fuse_conn_put() there's this line
> >
> > if (fc->timeout.req_timeout)
> > timer_shutdown_sync(&fc->timeout.timer);
> >
> > that guarantees the timer is not queued / callback of timer is not
> > running / cannot be rearmed.
>
> Got it. I missed that last part in fuse_conn_put()
> Thanks
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
2024-11-14 19:13 ` [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests Joanne Koong
` (6 preceding siblings ...)
2024-12-06 19:05 ` Joanne Koong
@ 2024-12-06 20:12 ` Jeff Layton
2024-12-06 21:54 ` Joanne Koong
7 siblings, 1 reply; 39+ messages in thread
From: Jeff Layton @ 2024-12-06 20:12 UTC (permalink / raw)
To: Joanne Koong, miklos, linux-fsdevel
Cc: josef, bernd.schubert, jefflexu, laoar.shao, kernel-team,
Bernd Schubert
On Thu, 2024-11-14 at 11:13 -0800, Joanne Koong wrote:
> There are situations where fuse servers can become unresponsive or
> stuck, for example if the server is deadlocked. Currently, there's no
> good way to detect if a server is stuck and needs to be killed manually.
>
> This commit adds an option for enforcing a timeout (in minutes) for
> requests where if the timeout elapses without the server responding to
> the request, the connection will be automatically aborted.
>
> Please note that these timeouts are not 100% precise. The request may
> take an extra FUSE_TIMEOUT_TIMER_FREQ seconds beyond the requested max
> timeout due to how it's internally implemented.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> Reviewed-by: Bernd Schubert <bschubert@ddn.com>
> ---
> fs/fuse/dev.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++
> fs/fuse/fuse_i.h | 21 +++++++++++++
> fs/fuse/inode.c | 21 +++++++++++++
> 3 files changed, 122 insertions(+)
>
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index 29fc61a072ba..536aa4525e8f 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -45,6 +45,82 @@ static struct fuse_dev *fuse_get_dev(struct file *file)
> return READ_ONCE(file->private_data);
> }
>
> +static bool request_expired(struct fuse_conn *fc, struct fuse_req *req)
> +{
> + return jiffies > req->create_time + fc->timeout.req_timeout;
> +}
> +
> +/*
> + * Check if any requests aren't being completed by the specified request
> + * timeout. To do so, we:
> + * - check the fiq pending list
> + * - check the bg queue
> + * - check the fpq io and processing lists
> + *
> + * To make this fast, we only check against the head request on each list since
> + * these are generally queued in order of creation time (eg newer requests get
> + * queued to the tail). We might miss a few edge cases (eg requests transitioning
> + * between lists, re-sent requests at the head of the pending list having a
> + * later creation time than other requests on that list, etc.) but that is fine
> + * since if the request never gets fulfilled, it will eventually be caught.
> + */
> +void fuse_check_timeout(struct timer_list *timer)
> +{
> + struct fuse_conn *fc = container_of(timer, struct fuse_conn, timeout.timer);
> + struct fuse_iqueue *fiq = &fc->iq;
> + struct fuse_req *req;
> + struct fuse_dev *fud;
> + struct fuse_pqueue *fpq;
> + bool expired = false;
> + int i;
> +
> + spin_lock(&fiq->lock);
> + req = list_first_entry_or_null(&fiq->pending, struct fuse_req, list);
> + if (req)
> + expired = request_expired(fc, req);
> + spin_unlock(&fiq->lock);
> + if (expired)
> + goto abort_conn;
> +
> + spin_lock(&fc->bg_lock);
> + req = list_first_entry_or_null(&fc->bg_queue, struct fuse_req, list);
> + if (req)
> + expired = request_expired(fc, req);
> + spin_unlock(&fc->bg_lock);
> + if (expired)
> + goto abort_conn;
> +
> + spin_lock(&fc->lock);
> + if (!fc->connected) {
> + spin_unlock(&fc->lock);
> + return;
> + }
> + list_for_each_entry(fud, &fc->devices, entry) {
> + fpq = &fud->pq;
> + spin_lock(&fpq->lock);
> + req = list_first_entry_or_null(&fpq->io, struct fuse_req, list);
> + if (req && request_expired(fc, req))
> + goto fpq_abort;
> +
> + for (i = 0; i < FUSE_PQ_HASH_SIZE; i++) {
> + req = list_first_entry_or_null(&fpq->processing[i], struct fuse_req, list);
> + if (req && request_expired(fc, req))
> + goto fpq_abort;
> + }
> + spin_unlock(&fpq->lock);
> + }
> + spin_unlock(&fc->lock);
> +
> + mod_timer(&fc->timeout.timer, jiffies + FUSE_TIMEOUT_TIMER_FREQ);
> + return;
> +
> +fpq_abort:
> + spin_unlock(&fpq->lock);
> + spin_unlock(&fc->lock);
> +abort_conn:
> + fuse_abort_conn(fc);
> +}
> +
> static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
> {
> INIT_LIST_HEAD(&req->list);
> @@ -53,6 +129,7 @@ static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
> refcount_set(&req->count, 1);
> __set_bit(FR_PENDING, &req->flags);
> req->fm = fm;
> + req->create_time = jiffies;
> }
>
> static struct fuse_req *fuse_request_alloc(struct fuse_mount *fm, gfp_t flags)
> @@ -2308,6 +2385,9 @@ void fuse_abort_conn(struct fuse_conn *fc)
> spin_unlock(&fc->lock);
>
> end_requests(&to_end);
> +
> + if (fc->timeout.req_timeout)
> + timer_delete(&fc->timeout.timer);
> } else {
> spin_unlock(&fc->lock);
> }
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index d35c37ccf9b5..9092201c4e0b 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -438,6 +438,9 @@ struct fuse_req {
>
> /** fuse_mount this request belongs to */
> struct fuse_mount *fm;
> +
> + /** When (in jiffies) the request was created */
> + unsigned long create_time;
> };
>
> struct fuse_iqueue;
> @@ -528,6 +531,16 @@ struct fuse_pqueue {
> struct list_head io;
> };
>
> +/* Frequency (in seconds) of request timeout checks, if opted into */
> +#define FUSE_TIMEOUT_TIMER_FREQ 60 * HZ
> +
> +struct fuse_timeout {
> + struct timer_list timer;
> +
> + /* Request timeout (in jiffies). 0 = no timeout */
> + unsigned long req_timeout;
> +};
> +
> /**
> * Fuse device instance
> */
> @@ -574,6 +587,8 @@ struct fuse_fs_context {
> enum fuse_dax_mode dax_mode;
> unsigned int max_read;
> unsigned int blksize;
> + /* Request timeout (in minutes). 0 = no timeout (infinite wait) */
> + unsigned int req_timeout;
> const char *subtype;
>
> /* DAX device, may be NULL */
> @@ -920,6 +935,9 @@ struct fuse_conn {
> /** IDR for backing files ids */
> struct idr backing_files_map;
> #endif
> +
> + /** Only used if the connection enforces request timeouts */
> + struct fuse_timeout timeout;
> };
>
> /*
> @@ -1181,6 +1199,9 @@ void fuse_request_end(struct fuse_req *req);
> void fuse_abort_conn(struct fuse_conn *fc);
> void fuse_wait_aborted(struct fuse_conn *fc);
>
> +/* Check if any requests timed out */
> +void fuse_check_timeout(struct timer_list *timer);
> +
> /**
> * Invalidate inode attributes
> */
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index f1779ff3f8d1..ee006f09cd04 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -735,6 +735,7 @@ enum {
> OPT_ALLOW_OTHER,
> OPT_MAX_READ,
> OPT_BLKSIZE,
> + OPT_REQUEST_TIMEOUT,
> OPT_ERR
> };
>
> @@ -749,6 +750,7 @@ static const struct fs_parameter_spec fuse_fs_parameters[] = {
> fsparam_u32 ("max_read", OPT_MAX_READ),
> fsparam_u32 ("blksize", OPT_BLKSIZE),
> fsparam_string ("subtype", OPT_SUBTYPE),
> + fsparam_u16 ("request_timeout", OPT_REQUEST_TIMEOUT),
> {}
> };
>
> @@ -844,6 +846,10 @@ static int fuse_parse_param(struct fs_context *fsc, struct fs_parameter *param)
> ctx->blksize = result.uint_32;
> break;
>
> + case OPT_REQUEST_TIMEOUT:
> + ctx->req_timeout = result.uint_16;
> + break;
> +
> default:
> return -EINVAL;
> }
> @@ -973,6 +979,8 @@ void fuse_conn_put(struct fuse_conn *fc)
>
> if (IS_ENABLED(CONFIG_FUSE_DAX))
> fuse_dax_conn_free(fc);
> + if (fc->timeout.req_timeout)
> + timer_shutdown_sync(&fc->timeout.timer);
> if (fiq->ops->release)
> fiq->ops->release(fiq);
> put_pid_ns(fc->pid_ns);
> @@ -1691,6 +1699,18 @@ int fuse_init_fs_context_submount(struct fs_context *fsc)
> }
> EXPORT_SYMBOL_GPL(fuse_init_fs_context_submount);
>
> +static void fuse_init_fc_timeout(struct fuse_conn *fc, struct fuse_fs_context *ctx)
> +{
> + if (ctx->req_timeout) {
> + if (check_mul_overflow(ctx->req_timeout * 60, HZ, &fc->timeout.req_timeout))
> + fc->timeout.req_timeout = ULONG_MAX;
> + timer_setup(&fc->timeout.timer, fuse_check_timeout, 0);
> + mod_timer(&fc->timeout.timer, jiffies + FUSE_TIMEOUT_TIMER_FREQ);
> + } else {
> + fc->timeout.req_timeout = 0;
> + }
> +}
> +
Does fuse_check_timeout need to run in IRQ context? It doesn't seem
like it does. Have you considered setting up a recurring delayed
workqueue job instead? That would run in process context, which might
make the locking in that function less hairy.
> int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
> {
> struct fuse_dev *fud = NULL;
> @@ -1753,6 +1773,7 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
> fc->destroy = ctx->destroy;
> fc->no_control = ctx->no_control;
> fc->no_force_umount = ctx->no_force_umount;
> + fuse_init_fc_timeout(fc, ctx);
>
> err = -ENOMEM;
> root = fuse_get_root_inode(sb, ctx->rootmode);
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests
2024-12-06 20:12 ` Jeff Layton
@ 2024-12-06 21:54 ` Joanne Koong
0 siblings, 0 replies; 39+ messages in thread
From: Joanne Koong @ 2024-12-06 21:54 UTC (permalink / raw)
To: Jeff Layton
Cc: miklos, linux-fsdevel, josef, bernd.schubert, jefflexu,
laoar.shao, kernel-team, Bernd Schubert
On Fri, Dec 6, 2024 at 12:12 PM Jeff Layton <jlayton@kernel.org> wrote:
>
> On Thu, 2024-11-14 at 11:13 -0800, Joanne Koong wrote:
> > There are situations where fuse servers can become unresponsive or
> > stuck, for example if the server is deadlocked. Currently, there's no
> > good way to detect if a server is stuck and needs to be killed manually.
> >
> > This commit adds an option for enforcing a timeout (in minutes) for
> > requests where if the timeout elapses without the server responding to
> > the request, the connection will be automatically aborted.
> >
> > Please note that these timeouts are not 100% precise. The request may
> > take an extra FUSE_TIMEOUT_TIMER_FREQ seconds beyond the requested max
> > timeout due to how it's internally implemented.
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > Reviewed-by: Bernd Schubert <bschubert@ddn.com>
> > ---
> > fs/fuse/dev.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++
> > fs/fuse/fuse_i.h | 21 +++++++++++++
> > fs/fuse/inode.c | 21 +++++++++++++
> > 3 files changed, 122 insertions(+)
> >
> > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> > index 29fc61a072ba..536aa4525e8f 100644
> > --- a/fs/fuse/dev.c
> > +++ b/fs/fuse/dev.c
> > return -EINVAL;
> > }
> > @@ -973,6 +979,8 @@ void fuse_conn_put(struct fuse_conn *fc)
> >
> > if (IS_ENABLED(CONFIG_FUSE_DAX))
> > fuse_dax_conn_free(fc);
> > + if (fc->timeout.req_timeout)
> > + timer_shutdown_sync(&fc->timeout.timer);
> > if (fiq->ops->release)
> > fiq->ops->release(fiq);
> > put_pid_ns(fc->pid_ns);
> > @@ -1691,6 +1699,18 @@ int fuse_init_fs_context_submount(struct fs_context *fsc)
> > }
> > EXPORT_SYMBOL_GPL(fuse_init_fs_context_submount);
> >
> > +static void fuse_init_fc_timeout(struct fuse_conn *fc, struct fuse_fs_context *ctx)
> > +{
> > + if (ctx->req_timeout) {
> > + if (check_mul_overflow(ctx->req_timeout * 60, HZ, &fc->timeout.req_timeout))
> > + fc->timeout.req_timeout = ULONG_MAX;
> > + timer_setup(&fc->timeout.timer, fuse_check_timeout, 0);
> > + mod_timer(&fc->timeout.timer, jiffies + FUSE_TIMEOUT_TIMER_FREQ);
> > + } else {
> > + fc->timeout.req_timeout = 0;
> > + }
> > +}
> > +
>
>
> Does fuse_check_timeout need to run in IRQ context? It doesn't seem
> like it does. Have you considered setting up a recurring delayed
> workqueue job instead? That would run in process context, which might
> make the locking in that function less hairy.
>
Great idea, I'll use a recurring delayed workqueue job instead of a
kthread, since it's more lightweight.
Thanks,
Joanne
>
> > int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
> > {
> > struct fuse_dev *fud = NULL;
> > @@ -1753,6 +1773,7 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
> > fc->destroy = ctx->destroy;
> > fc->no_control = ctx->no_control;
> > fc->no_force_umount = ctx->no_force_umount;
> > + fuse_init_fc_timeout(fc, ctx);
> >
> > err = -ENOMEM;
> > root = fuse_get_root_inode(sb, ctx->rootmode);
>
> --
> Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 39+ messages in thread