linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [REGRESSION] fuse: execve() fails with ETXTBSY due to async fuse_flush
@ 2023-08-14  6:03 Jürg Billeter
  2023-08-14 11:02 ` Miklos Szeredi
  0 siblings, 1 reply; 14+ messages in thread
From: Jürg Billeter @ 2023-08-14  6:03 UTC (permalink / raw)
  To: Miklos Szeredi, Eric W. Biederman
  Cc: Tycho Andersen, linux-fsdevel, linux-kernel, regressions

Since v6.3-rc1 commit 5a8bee63b1 ("fuse: in fuse_flush only wait if
someone wants the return code") `fput()` is called asynchronously if a
file is closed as part of a process exiting, i.e., if there was no
explicit `close()` before exit.

If the file was open for writing, also `put_write_access()` is called
asynchronously as part of the async `fput()`.

If that newly written file is an executable, attempting to `execve()`
the new file can fail with `ETXTBSY` if it's called after the writer
process exited but before the async `fput()` has run.

I've confirmed that this issue is absent in v6.2 and reverting
5a8bee63b1 on top of v6.4.10 fixes the regression.

#regzbot introduced: 5a8bee63b1

Jürg

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [REGRESSION] fuse: execve() fails with ETXTBSY due to async fuse_flush
  2023-08-14  6:03 [REGRESSION] fuse: execve() fails with ETXTBSY due to async fuse_flush Jürg Billeter
@ 2023-08-14 11:02 ` Miklos Szeredi
  2023-08-14 12:07   ` Bernd Schubert
  2023-08-14 14:00   ` Tycho Andersen
  0 siblings, 2 replies; 14+ messages in thread
From: Miklos Szeredi @ 2023-08-14 11:02 UTC (permalink / raw)
  To: Jürg Billeter
  Cc: Eric W. Biederman, Tycho Andersen, linux-fsdevel, linux-kernel,
	regressions

On Mon, 14 Aug 2023 at 08:03, Jürg Billeter <j@bitron.ch> wrote:
>
> Since v6.3-rc1 commit 5a8bee63b1 ("fuse: in fuse_flush only wait if
> someone wants the return code") `fput()` is called asynchronously if a
> file is closed as part of a process exiting, i.e., if there was no
> explicit `close()` before exit.
>
> If the file was open for writing, also `put_write_access()` is called
> asynchronously as part of the async `fput()`.
>
> If that newly written file is an executable, attempting to `execve()`
> the new file can fail with `ETXTBSY` if it's called after the writer
> process exited but before the async `fput()` has run.

Thanks for the report.

At this point, I think it would be best to revert the original patch,
since only v6.4 has it.

The original fix was already a workaround, and I don't see a clear
path forward in this direction.  We need to see if there's better
direction.

Ideas?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [REGRESSION] fuse: execve() fails with ETXTBSY due to async fuse_flush
  2023-08-14 11:02 ` Miklos Szeredi
@ 2023-08-14 12:07   ` Bernd Schubert
  2023-08-14 12:28     ` Miklos Szeredi
  2023-08-14 14:00   ` Tycho Andersen
  1 sibling, 1 reply; 14+ messages in thread
From: Bernd Schubert @ 2023-08-14 12:07 UTC (permalink / raw)
  To: Miklos Szeredi, Jürg Billeter
  Cc: Eric W. Biederman, Tycho Andersen, linux-fsdevel, linux-kernel,
	regressions



On 8/14/23 13:02, Miklos Szeredi wrote:
> On Mon, 14 Aug 2023 at 08:03, Jürg Billeter <j@bitron.ch> wrote:
>>
>> Since v6.3-rc1 commit 5a8bee63b1 ("fuse: in fuse_flush only wait if
>> someone wants the return code") `fput()` is called asynchronously if a
>> file is closed as part of a process exiting, i.e., if there was no
>> explicit `close()` before exit.
>>
>> If the file was open for writing, also `put_write_access()` is called
>> asynchronously as part of the async `fput()`.
>>
>> If that newly written file is an executable, attempting to `execve()`
>> the new file can fail with `ETXTBSY` if it's called after the writer
>> process exited but before the async `fput()` has run.
> 
> Thanks for the report.
> 
> At this point, I think it would be best to revert the original patch,
> since only v6.4 has it.
> 
> The original fix was already a workaround, and I don't see a clear
> path forward in this direction.  We need to see if there's better
> direction.
> 
> Ideas?

Is there a good reason to flush O_RDONLY?


fuse: Avoid flush for O_RDONLY

From: Bernd Schubert <bschubert@ddn.com>

A file opened in read-only moded does not have data to be
flushed, so no need to send flush at all.

This also mitigates -EBUSY for executables, which is due to
async flush with commit 5a8bee63b1.

Fixes: 5a8bee63b1 (unless executable opened in rw)
Signed-off-by: Bernd Schubert <bschubert@ddn.com>


index 89d97f6188e0..e058a6af6751 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -545,7 +545,8 @@ static int fuse_flush(struct file *file, fl_owner_t id)
         if (fuse_is_bad(inode))
                 return -EIO;
  
-       if (ff->open_flags & FOPEN_NOFLUSH && !fm->fc->writeback_cache)
+       if ((ff->open_flags & FOPEN_NOFLUSH && !fm->fc->writeback_cache) ||
+           ((file->f_flags & O_ACCMODE) == O_RDONLY))
                 return 0;
  
         fa = kzalloc(sizeof(*fa), GFP_KERNEL);




^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [REGRESSION] fuse: execve() fails with ETXTBSY due to async fuse_flush
  2023-08-14 12:07   ` Bernd Schubert
@ 2023-08-14 12:28     ` Miklos Szeredi
  2023-08-14 12:38       ` Jürg Billeter
  0 siblings, 1 reply; 14+ messages in thread
From: Miklos Szeredi @ 2023-08-14 12:28 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Jürg Billeter, Eric W. Biederman, Tycho Andersen,
	linux-fsdevel, linux-kernel, regressions

On Mon, 14 Aug 2023 at 14:07, Bernd Schubert <bernd.schubert@fastmail.fm> wrote:
>
>
>
> On 8/14/23 13:02, Miklos Szeredi wrote:
> > On Mon, 14 Aug 2023 at 08:03, Jürg Billeter <j@bitron.ch> wrote:
> >>
> >> Since v6.3-rc1 commit 5a8bee63b1 ("fuse: in fuse_flush only wait if
> >> someone wants the return code") `fput()` is called asynchronously if a
> >> file is closed as part of a process exiting, i.e., if there was no
> >> explicit `close()` before exit.
> >>
> >> If the file was open for writing, also `put_write_access()` is called
> >> asynchronously as part of the async `fput()`.
> >>
> >> If that newly written file is an executable, attempting to `execve()`
> >> the new file can fail with `ETXTBSY` if it's called after the writer
> >> process exited but before the async `fput()` has run.
> >
> > Thanks for the report.
> >
> > At this point, I think it would be best to revert the original patch,
> > since only v6.4 has it.
> >
> > The original fix was already a workaround, and I don't see a clear
> > path forward in this direction.  We need to see if there's better
> > direction.
> >
> > Ideas?
>
> Is there a good reason to flush O_RDONLY?

->flush() is somewhat of a misnomer, it's the callback for explicit
close(2) and also for implicit close by exit(2).

The reason it's called flush is that nfs/fuse use it to ensure
close-to-open cache consistency, which means flushing dirty data on
close.

On fuse it also used to unlock remote posix locks, and we really know
what other "creative" uses were found for this request.  So while we
could turn off sending FLUSH for read-only case (and use SETLK/F_UNLCK
for the remote lock case) but first let's see a use case.  Sending
FLUSH can already be disabled by returning ENOSYS.

>
> fuse: Avoid flush for O_RDONLY
>
> From: Bernd Schubert <bschubert@ddn.com>
>
> A file opened in read-only moded does not have data to be
> flushed, so no need to send flush at all.
>
> This also mitigates -EBUSY for executables, which is due to
> async flush with commit 5a8bee63b1.

Does it?  If I read the bug report correctly, it's the write case that
causes EBUSY.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [REGRESSION] fuse: execve() fails with ETXTBSY due to async fuse_flush
  2023-08-14 12:28     ` Miklos Szeredi
@ 2023-08-14 12:38       ` Jürg Billeter
  2023-08-14 13:44         ` Bernd Schubert
  0 siblings, 1 reply; 14+ messages in thread
From: Jürg Billeter @ 2023-08-14 12:38 UTC (permalink / raw)
  To: Miklos Szeredi, Bernd Schubert
  Cc: Eric W. Biederman, Tycho Andersen, linux-fsdevel, linux-kernel,
	regressions

On Mon, 2023-08-14 at 14:28 +0200, Miklos Szeredi wrote:
> On Mon, 14 Aug 2023 at 14:07, Bernd Schubert 
> > fuse: Avoid flush for O_RDONLY
> > 
> > From: Bernd Schubert <bschubert@ddn.com>
> > 
> > A file opened in read-only moded does not have data to be
> > flushed, so no need to send flush at all.
> > 
> > This also mitigates -EBUSY for executables, which is due to
> > async flush with commit 5a8bee63b1.
> 
> Does it?  If I read the bug report correctly, it's the write case that
> causes EBUSY.

Indeed, I see this when trying to execute a file after a process wrote
to (created) that file. As far as I can tell, `ETXTBSY` can't happen on
exec without the file being opened for writing and thus, I don't see
this patch mitigating this bug.

Jürg

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [REGRESSION] fuse: execve() fails with ETXTBSY due to async fuse_flush
  2023-08-14 12:38       ` Jürg Billeter
@ 2023-08-14 13:44         ` Bernd Schubert
  0 siblings, 0 replies; 14+ messages in thread
From: Bernd Schubert @ 2023-08-14 13:44 UTC (permalink / raw)
  To: Jürg Billeter, Miklos Szeredi
  Cc: Eric W. Biederman, Tycho Andersen, linux-fsdevel, linux-kernel,
	regressions



On 8/14/23 14:38, Jürg Billeter wrote:
> On Mon, 2023-08-14 at 14:28 +0200, Miklos Szeredi wrote:
>> On Mon, 14 Aug 2023 at 14:07, Bernd Schubert
>>> fuse: Avoid flush for O_RDONLY
>>>
>>> From: Bernd Schubert <bschubert@ddn.com>
>>>
>>> A file opened in read-only moded does not have data to be
>>> flushed, so no need to send flush at all.
>>>
>>> This also mitigates -EBUSY for executables, which is due to
>>> async flush with commit 5a8bee63b1.
>>
>> Does it?  If I read the bug report correctly, it's the write case that
>> causes EBUSY.
> 
> Indeed, I see this when trying to execute a file after a process wrote
> to (created) that file. As far as I can tell, `ETXTBSY` can't happen on
> exec without the file being opened for writing and thus, I don't see
> this patch mitigating this bug.

Sorry, my fault, I should have read your message/report more carefully.


Bernd

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [REGRESSION] fuse: execve() fails with ETXTBSY due to async fuse_flush
  2023-08-14 11:02 ` Miklos Szeredi
  2023-08-14 12:07   ` Bernd Schubert
@ 2023-08-14 14:00   ` Tycho Andersen
  2023-08-14 14:35     ` Miklos Szeredi
  2023-08-14 21:34     ` Bernd Schubert
  1 sibling, 2 replies; 14+ messages in thread
From: Tycho Andersen @ 2023-08-14 14:00 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jürg Billeter, Eric W. Biederman, linux-fsdevel,
	linux-kernel, regressions

On Mon, Aug 14, 2023 at 01:02:35PM +0200, Miklos Szeredi wrote:
> On Mon, 14 Aug 2023 at 08:03, Jürg Billeter <j@bitron.ch> wrote:
> >
> > Since v6.3-rc1 commit 5a8bee63b1 ("fuse: in fuse_flush only wait if
> > someone wants the return code") `fput()` is called asynchronously if a
> > file is closed as part of a process exiting, i.e., if there was no
> > explicit `close()` before exit.
> >
> > If the file was open for writing, also `put_write_access()` is called
> > asynchronously as part of the async `fput()`.
> >
> > If that newly written file is an executable, attempting to `execve()`
> > the new file can fail with `ETXTBSY` if it's called after the writer
> > process exited but before the async `fput()` has run.
> 
> Thanks for the report.
> 
> At this point, I think it would be best to revert the original patch,
> since only v6.4 has it.

I agree.

> The original fix was already a workaround, and I don't see a clear
> path forward in this direction.  We need to see if there's better
> direction.
> 
> Ideas?

It seems like we really do need to wait here. I guess that means we
need some kind of exit-proof wait?

Tycho

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [REGRESSION] fuse: execve() fails with ETXTBSY due to async fuse_flush
  2023-08-14 14:00   ` Tycho Andersen
@ 2023-08-14 14:35     ` Miklos Szeredi
  2023-08-14 22:36       ` Tycho Andersen
  2023-08-14 21:34     ` Bernd Schubert
  1 sibling, 1 reply; 14+ messages in thread
From: Miklos Szeredi @ 2023-08-14 14:35 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Jürg Billeter, Eric W. Biederman, linux-fsdevel,
	linux-kernel, regressions

On Mon, 14 Aug 2023 at 16:00, Tycho Andersen <tycho@tycho.pizza> wrote:

> It seems like we really do need to wait here. I guess that means we
> need some kind of exit-proof wait?

Could you please recap the original problem?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [REGRESSION] fuse: execve() fails with ETXTBSY due to async fuse_flush
  2023-08-14 14:00   ` Tycho Andersen
  2023-08-14 14:35     ` Miklos Szeredi
@ 2023-08-14 21:34     ` Bernd Schubert
  1 sibling, 0 replies; 14+ messages in thread
From: Bernd Schubert @ 2023-08-14 21:34 UTC (permalink / raw)
  To: Tycho Andersen, Miklos Szeredi
  Cc: Jürg Billeter, Eric W. Biederman, linux-fsdevel,
	linux-kernel, regressions



On 8/14/23 16:00, Tycho Andersen wrote:
> On Mon, Aug 14, 2023 at 01:02:35PM +0200, Miklos Szeredi wrote:
>> On Mon, 14 Aug 2023 at 08:03, Jürg Billeter <j@bitron.ch> wrote:
>>>
>>> Since v6.3-rc1 commit 5a8bee63b1 ("fuse: in fuse_flush only wait if
>>> someone wants the return code") `fput()` is called asynchronously if a
>>> file is closed as part of a process exiting, i.e., if there was no
>>> explicit `close()` before exit.
>>>
>>> If the file was open for writing, also `put_write_access()` is called
>>> asynchronously as part of the async `fput()`.
>>>
>>> If that newly written file is an executable, attempting to `execve()`
>>> the new file can fail with `ETXTBSY` if it's called after the writer
>>> process exited but before the async `fput()` has run.
>>
>> Thanks for the report.
>>
>> At this point, I think it would be best to revert the original patch,
>> since only v6.4 has it.
> 
> I agree.
> 
>> The original fix was already a workaround, and I don't see a clear
>> path forward in this direction.  We need to see if there's better
>> direction.
>>
>> Ideas?
> 
> It seems like we really do need to wait here. I guess that means we
> need some kind of exit-proof wait?


I'm not sure how hackish it is, if fuse_flush gets converted to 
queue_work() and with a new work-queue in struct fuse_inode. That 
work_queue could be flushed through a new inode operation from 
do_open_execat.


Bernd

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [REGRESSION] fuse: execve() fails with ETXTBSY due to async fuse_flush
  2023-08-14 14:35     ` Miklos Szeredi
@ 2023-08-14 22:36       ` Tycho Andersen
  2023-08-21 14:24         ` Miklos Szeredi
  0 siblings, 1 reply; 14+ messages in thread
From: Tycho Andersen @ 2023-08-14 22:36 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jürg Billeter, Eric W. Biederman, linux-fsdevel,
	linux-kernel, regressions

On Mon, Aug 14, 2023 at 04:35:56PM +0200, Miklos Szeredi wrote:
> On Mon, 14 Aug 2023 at 16:00, Tycho Andersen <tycho@tycho.pizza> wrote:
> 
> > It seems like we really do need to wait here. I guess that means we
> > need some kind of exit-proof wait?
> 
> Could you please recap the original problem?

Sure, the symptom is a deadlock, something like:

# cat /proc/1528591/stack
[<0>] do_wait+0x156/0x2f0
[<0>] kernel_wait4+0x8d/0x140
[<0>] zap_pid_ns_processes+0x104/0x180
[<0>] do_exit+0xa41/0xb80
[<0>] do_group_exit+0x3a/0xa0
[<0>] __x64_sys_exit_group+0x14/0x20
[<0>] do_syscall_64+0x37/0xb0
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae

which is stuck waiting for:

# cat /proc/1544574/stack
[<0>] request_wait_answer+0x12f/0x210
[<0>] fuse_simple_request+0x109/0x2c0
[<0>] fuse_flush+0x16f/0x1b0
[<0>] filp_close+0x27/0x70
[<0>] put_files_struct+0x6b/0xc0
[<0>] do_exit+0x360/0xb80
[<0>] do_group_exit+0x3a/0xa0
[<0>] get_signal+0x140/0x870
[<0>] arch_do_signal_or_restart+0xae/0x7c0
[<0>] exit_to_user_mode_prepare+0x10f/0x1c0
[<0>] syscall_exit_to_user_mode+0x26/0x40
[<0>] do_syscall_64+0x46/0xb0
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae

I have a reproducer here:
https://github.com/tych0/kernel-utils/blob/master/fuse2/Makefile#L7

The problem is that the second thread has called do_exit() ->
exit_signals(), but then tries to request_wait_answer() which uses the
core wait primitives that no longer get woken up from signals due to
the code in exit_signals(). So when we try to exit the pid ns, the
whole cleanup hangs.

It seems we really do need to wait in do_exit(), otherwise we get
the behavior described in this regression...

Tycho

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [REGRESSION] fuse: execve() fails with ETXTBSY due to async fuse_flush
  2023-08-14 22:36       ` Tycho Andersen
@ 2023-08-21 14:24         ` Miklos Szeredi
  2023-08-21 15:02           ` Tycho Andersen
  0 siblings, 1 reply; 14+ messages in thread
From: Miklos Szeredi @ 2023-08-21 14:24 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Jürg Billeter, Eric W. Biederman, linux-fsdevel,
	linux-kernel, regressions

On Tue, 15 Aug 2023 at 00:36, Tycho Andersen <tycho@tycho.pizza> wrote:
>
> On Mon, Aug 14, 2023 at 04:35:56PM +0200, Miklos Szeredi wrote:
> > On Mon, 14 Aug 2023 at 16:00, Tycho Andersen <tycho@tycho.pizza> wrote:
> >
> > > It seems like we really do need to wait here. I guess that means we
> > > need some kind of exit-proof wait?
> >
> > Could you please recap the original problem?
>
> Sure, the symptom is a deadlock, something like:
>
> # cat /proc/1528591/stack
> [<0>] do_wait+0x156/0x2f0
> [<0>] kernel_wait4+0x8d/0x140
> [<0>] zap_pid_ns_processes+0x104/0x180
> [<0>] do_exit+0xa41/0xb80
> [<0>] do_group_exit+0x3a/0xa0
> [<0>] __x64_sys_exit_group+0x14/0x20
> [<0>] do_syscall_64+0x37/0xb0
> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> which is stuck waiting for:
>
> # cat /proc/1544574/stack
> [<0>] request_wait_answer+0x12f/0x210
> [<0>] fuse_simple_request+0x109/0x2c0
> [<0>] fuse_flush+0x16f/0x1b0
> [<0>] filp_close+0x27/0x70
> [<0>] put_files_struct+0x6b/0xc0
> [<0>] do_exit+0x360/0xb80
> [<0>] do_group_exit+0x3a/0xa0
> [<0>] get_signal+0x140/0x870
> [<0>] arch_do_signal_or_restart+0xae/0x7c0
> [<0>] exit_to_user_mode_prepare+0x10f/0x1c0
> [<0>] syscall_exit_to_user_mode+0x26/0x40
> [<0>] do_syscall_64+0x46/0xb0
> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> I have a reproducer here:
> https://github.com/tych0/kernel-utils/blob/master/fuse2/Makefile#L7

The issue seems to be that the server process is recursing into the
filesystem it is serving (nested_fsync()).  It's quite easy to
deadlock fuse this way, and I'm not sure why this would be needed for
any server implementation.   Can you explain?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [REGRESSION] fuse: execve() fails with ETXTBSY due to async fuse_flush
  2023-08-21 14:24         ` Miklos Szeredi
@ 2023-08-21 15:02           ` Tycho Andersen
  2023-08-21 15:31             ` Miklos Szeredi
  0 siblings, 1 reply; 14+ messages in thread
From: Tycho Andersen @ 2023-08-21 15:02 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jürg Billeter, Eric W. Biederman, linux-fsdevel,
	linux-kernel, regressions

On Mon, Aug 21, 2023 at 04:24:00PM +0200, Miklos Szeredi wrote:
> On Tue, 15 Aug 2023 at 00:36, Tycho Andersen <tycho@tycho.pizza> wrote:
> >
> > On Mon, Aug 14, 2023 at 04:35:56PM +0200, Miklos Szeredi wrote:
> > > On Mon, 14 Aug 2023 at 16:00, Tycho Andersen <tycho@tycho.pizza> wrote:
> > >
> > > > It seems like we really do need to wait here. I guess that means we
> > > > need some kind of exit-proof wait?
> > >
> > > Could you please recap the original problem?
> >
> > Sure, the symptom is a deadlock, something like:
> >
> > # cat /proc/1528591/stack
> > [<0>] do_wait+0x156/0x2f0
> > [<0>] kernel_wait4+0x8d/0x140
> > [<0>] zap_pid_ns_processes+0x104/0x180
> > [<0>] do_exit+0xa41/0xb80
> > [<0>] do_group_exit+0x3a/0xa0
> > [<0>] __x64_sys_exit_group+0x14/0x20
> > [<0>] do_syscall_64+0x37/0xb0
> > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae
> >
> > which is stuck waiting for:
> >
> > # cat /proc/1544574/stack
> > [<0>] request_wait_answer+0x12f/0x210
> > [<0>] fuse_simple_request+0x109/0x2c0
> > [<0>] fuse_flush+0x16f/0x1b0
> > [<0>] filp_close+0x27/0x70
> > [<0>] put_files_struct+0x6b/0xc0
> > [<0>] do_exit+0x360/0xb80
> > [<0>] do_group_exit+0x3a/0xa0
> > [<0>] get_signal+0x140/0x870
> > [<0>] arch_do_signal_or_restart+0xae/0x7c0
> > [<0>] exit_to_user_mode_prepare+0x10f/0x1c0
> > [<0>] syscall_exit_to_user_mode+0x26/0x40
> > [<0>] do_syscall_64+0x46/0xb0
> > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae
> >
> > I have a reproducer here:
> > https://github.com/tych0/kernel-utils/blob/master/fuse2/Makefile#L7
> 
> The issue seems to be that the server process is recursing into the
> filesystem it is serving (nested_fsync()).  It's quite easy to
> deadlock fuse this way, and I'm not sure why this would be needed for
> any server implementation.   Can you explain?

I think the idea is that they're saving snapshots of their own threads
to the fs for debugging purposes.

Whether this is a sane thing to do or not, it doesn't seem like it
should deadlock pid ns destruction.

Tycho

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [REGRESSION] fuse: execve() fails with ETXTBSY due to async fuse_flush
  2023-08-21 15:02           ` Tycho Andersen
@ 2023-08-21 15:31             ` Miklos Szeredi
  2023-08-29 17:42               ` Tycho Andersen
  0 siblings, 1 reply; 14+ messages in thread
From: Miklos Szeredi @ 2023-08-21 15:31 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Jürg Billeter, Eric W. Biederman, linux-fsdevel,
	linux-kernel, regressions

On Mon, 21 Aug 2023 at 17:02, Tycho Andersen <tycho@tycho.pizza> wrote:
>
> On Mon, Aug 21, 2023 at 04:24:00PM +0200, Miklos Szeredi wrote:
> > On Tue, 15 Aug 2023 at 00:36, Tycho Andersen <tycho@tycho.pizza> wrote:
> > >
> > > On Mon, Aug 14, 2023 at 04:35:56PM +0200, Miklos Szeredi wrote:
> > > > On Mon, 14 Aug 2023 at 16:00, Tycho Andersen <tycho@tycho.pizza> wrote:
> > > >
> > > > > It seems like we really do need to wait here. I guess that means we
> > > > > need some kind of exit-proof wait?
> > > >
> > > > Could you please recap the original problem?
> > >
> > > Sure, the symptom is a deadlock, something like:
> > >
> > > # cat /proc/1528591/stack
> > > [<0>] do_wait+0x156/0x2f0
> > > [<0>] kernel_wait4+0x8d/0x140
> > > [<0>] zap_pid_ns_processes+0x104/0x180
> > > [<0>] do_exit+0xa41/0xb80
> > > [<0>] do_group_exit+0x3a/0xa0
> > > [<0>] __x64_sys_exit_group+0x14/0x20
> > > [<0>] do_syscall_64+0x37/0xb0
> > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae
> > >
> > > which is stuck waiting for:
> > >
> > > # cat /proc/1544574/stack
> > > [<0>] request_wait_answer+0x12f/0x210
> > > [<0>] fuse_simple_request+0x109/0x2c0
> > > [<0>] fuse_flush+0x16f/0x1b0
> > > [<0>] filp_close+0x27/0x70
> > > [<0>] put_files_struct+0x6b/0xc0
> > > [<0>] do_exit+0x360/0xb80
> > > [<0>] do_group_exit+0x3a/0xa0
> > > [<0>] get_signal+0x140/0x870
> > > [<0>] arch_do_signal_or_restart+0xae/0x7c0
> > > [<0>] exit_to_user_mode_prepare+0x10f/0x1c0
> > > [<0>] syscall_exit_to_user_mode+0x26/0x40
> > > [<0>] do_syscall_64+0x46/0xb0
> > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae
> > >
> > > I have a reproducer here:
> > > https://github.com/tych0/kernel-utils/blob/master/fuse2/Makefile#L7
> >
> > The issue seems to be that the server process is recursing into the
> > filesystem it is serving (nested_fsync()).  It's quite easy to
> > deadlock fuse this way, and I'm not sure why this would be needed for
> > any server implementation.   Can you explain?
>
> I think the idea is that they're saving snapshots of their own threads
> to the fs for debugging purposes.

This seems a fairly special situation.   Have they (whoever they may
be) thought about fixing this in their server?

> Whether this is a sane thing to do or not, it doesn't seem like it
> should deadlock pid ns destruction.

True.   So the suggested solution is to allow wait_event_killable() to
return if a terminal signal is pending in the exiting state and only
in that case turn the flush into a background request?  That would
still allow for regressions like the one reported, but that would be
much less likely to happen in real life.  Okay, I said this for the
original solution as well, so this may turn out to be wrong as well.

Anyway, I'd prefer if this was fixed in the server code, as it looks
fairly special and adding complexity to the kernel for this case might
not be justifiable.   But I'm also open to suggestions on fixing this
in the kernel in a not too complex manner.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [REGRESSION] fuse: execve() fails with ETXTBSY due to async fuse_flush
  2023-08-21 15:31             ` Miklos Szeredi
@ 2023-08-29 17:42               ` Tycho Andersen
  0 siblings, 0 replies; 14+ messages in thread
From: Tycho Andersen @ 2023-08-29 17:42 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jürg Billeter, Eric W. Biederman, linux-fsdevel,
	linux-kernel, regressions

On Mon, Aug 21, 2023 at 05:31:48PM +0200, Miklos Szeredi wrote:

(Apologies for the delay, I have been away without cell signal for
some time.)

> > I think the idea is that they're saving snapshots of their own threads
> > to the fs for debugging purposes.
> 
> This seems a fairly special situation.   Have they (whoever they may
> be) thought about fixing this in their server?

Sorry, "we" here is some internal team that works for my employer
Netflix. We can't use imap clients on our corporate e-mails, whee.

> > Whether this is a sane thing to do or not, it doesn't seem like it
> > should deadlock pid ns destruction.
> 
> True.   So the suggested solution is to allow wait_event_killable() to
> return if a terminal signal is pending in the exiting state and only
> in that case turn the flush into a background request?  That would
> still allow for regressions like the one reported, but that would be
> much less likely to happen in real life.  Okay, I said this for the
> original solution as well, so this may turn out to be wrong as well.

I wonder if there's room here for a completion that doesn't use the
wait primitives. Something like an atomic + queuing in task_work()
would both fix this bug and not exhibit this regression, IIUC.

> Anyway, I'd prefer if this was fixed in the server code, as it looks
> fairly special and adding complexity to the kernel for this case might
> not be justifiable.   But I'm also open to suggestions on fixing this
> in the kernel in a not too complex manner.

I don't think this is specific to the server-accessing-its-own-file
case. My reproducer uses that because I didn't quite understand the
bug fully at the time. I believe that *any* task that is killed with
an inflight fuse request will exhibit this. We have seen this fairly
rarely on another fuse fs we use throughout the fleet:
https://github.com/lxc/lxcfs and it doesn't really do anything
strange, and is mounted from the host's pid ns.

Tycho

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-08-29 17:42 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-14  6:03 [REGRESSION] fuse: execve() fails with ETXTBSY due to async fuse_flush Jürg Billeter
2023-08-14 11:02 ` Miklos Szeredi
2023-08-14 12:07   ` Bernd Schubert
2023-08-14 12:28     ` Miklos Szeredi
2023-08-14 12:38       ` Jürg Billeter
2023-08-14 13:44         ` Bernd Schubert
2023-08-14 14:00   ` Tycho Andersen
2023-08-14 14:35     ` Miklos Szeredi
2023-08-14 22:36       ` Tycho Andersen
2023-08-21 14:24         ` Miklos Szeredi
2023-08-21 15:02           ` Tycho Andersen
2023-08-21 15:31             ` Miklos Szeredi
2023-08-29 17:42               ` Tycho Andersen
2023-08-14 21:34     ` Bernd Schubert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).