All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vivek Goyal <vgoyal@redhat.com>
To: Tycho Andersen <tycho@tycho.pizza>
Cc: Eric Biederman <ebiederm@xmission.com>,
	Christian Brauner <brauner@kernel.org>,
	Miklos Szeredi <miklos@szeredi.hu>,
	fuse-devel@lists.sourceforge.net, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: strange interaction between fuse + pidns
Date: Fri, 24 Jun 2022 13:36:59 -0400	[thread overview]
Message-ID: <YrX2O4Yv8elsQkF9@redhat.com> (raw)
In-Reply-To: <YrT6Hdqp36HLK9PJ@netflix>

On Thu, Jun 23, 2022 at 05:41:17PM -0600, Tycho Andersen wrote:
> On Thu, Jun 23, 2022 at 05:55:20PM -0400, Vivek Goyal wrote:
> > So in this case single process is client as well as server. IOW, one
> > thread is fuse server servicing fuse requests and other thread is fuse
> > client accessing fuse filesystem?
> 
> Yes. Probably an abuse of the API and something people Should Not Do,
> but as you say the kernel still shouldn't lock up like this.
> 
> > > since the thread has a copy of
> > > the fd table with an fd pointing to the same fuse device, the reference
> > > count isn't decremented to zero in fuse_dev_release(), and the task hangs
> > > forever.
> > 
> > So why did fuse server thread stop responding to fuse messages. Why
> > did it not complete flush.
> 
> In this particular case I think it's because the application crashed
> for unrelated reasons and tried to exit the pidns, hitting this
> problem.
> 
> > BTW, unkillable wait happens on ly fc->no_interrupt = 1. And this seems
> > to be set only if server probably some previous interrupt request
> > returned -ENOSYS.
> > 
> > fuse_dev_do_write() {
> >                 else if (oh.error == -ENOSYS)
> >                         fc->no_interrupt = 1;
> > }
> > 
> > So a simple workaround might be for server to implement support for
> > interrupting requests.
> 
> Yes, but that is the libfuse default IIUC.

Looking at libfuse code. I understand low level API interface and for
that looks like generic code itself will take care of this (without
needing support from filesystem).

libfuse/lib/fuse_lowlevel.c

do_interrupt().

> 
> > Having said that, this does sounds like a problem and probably should
> > be fixed at kernel level.
> > 
> > > 
> > > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> > > index 0e537e580dc1..c604dfcaec26 100644
> > > --- a/fs/fuse/dev.c
> > > +++ b/fs/fuse/dev.c
> > > @@ -297,7 +297,6 @@ void fuse_request_end(struct fuse_req *req)
> > >  		spin_unlock(&fiq->lock);
> > >  	}
> > >  	WARN_ON(test_bit(FR_PENDING, &req->flags));
> > > -	WARN_ON(test_bit(FR_SENT, &req->flags));
> > >  	if (test_bit(FR_BACKGROUND, &req->flags)) {
> > >  		spin_lock(&fc->bg_lock);
> > >  		clear_bit(FR_BACKGROUND, &req->flags);
> > > @@ -381,30 +380,33 @@ static void request_wait_answer(struct fuse_req *req)
> > >  			queue_interrupt(req);
> > >  	}
> > >  
> > > -	if (!test_bit(FR_FORCE, &req->flags)) {
> > > -		/* Only fatal signals may interrupt this */
> > > -		err = wait_event_killable(req->waitq,
> > > -					test_bit(FR_FINISHED, &req->flags));
> > > -		if (!err)
> > > -			return;
> > > +	/* Only fatal signals may interrupt this */
> > > +	err = wait_event_killable(req->waitq,
> > > +				test_bit(FR_FINISHED, &req->flags));
> > 
> > Trying to do a fatal signal killable wait sounds reasonable. But I am
> > not sure about the history.
> > 
> > - Why FORCE requests can't do killable wait.
> > - Why flush needs to have FORCE flag set.
> 
> args->force implies a few other things besides this killable wait in
> fuse_simple_request(), most notably:
> 
> req = fuse_request_alloc(fm, GFP_KERNEL | __GFP_NOFAIL);
> 
> and
> 
> __set_bit(FR_WAITING, &req->flags);

FR_WAITING stuff is common between both type of requests. We set it
in fuse_get_req() as well which is called for non-force requests.

So there seem to be only two key difference. 

- We allocate request with flag __GFP_NOFAIL for force. So don't
  want memory allocation to fail.

- And this special casing of non-killable wait. 

Miklos probably will have more thoughts on this. 

Thanks
Vivek

> 
> seems like it probably can be invoked from some non-user/atomic
> context somehow?
> 
> > > +	if (!err)
> > > +		return;
> > >  
> > > -		spin_lock(&fiq->lock);
> > > -		/* Request is not yet in userspace, bail out */
> > > -		if (test_bit(FR_PENDING, &req->flags)) {
> > > -			list_del(&req->list);
> > > -			spin_unlock(&fiq->lock);
> > > -			__fuse_put_request(req);
> > > -			req->out.h.error = -EINTR;
> > > -			return;
> > > -		}
> > > +	spin_lock(&fiq->lock);
> > > +	/* Request is not yet in userspace, bail out */
> > > +	if (test_bit(FR_PENDING, &req->flags)) {
> > > +		list_del(&req->list);
> > >  		spin_unlock(&fiq->lock);
> > > +		__fuse_put_request(req);
> > > +		req->out.h.error = -EINTR;
> > > +		return;
> > >  	}
> > > +	spin_unlock(&fiq->lock);
> > >  
> > >  	/*
> > > -	 * Either request is already in userspace, or it was forced.
> > > -	 * Wait it out.
> > > +	 * Womp womp. We sent a request to userspace and now we're getting
> > > +	 * killed.
> > >  	 */
> > > -	wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags));
> > > +	set_bit(FR_INTERRUPTED, &req->flags);
> > > +	/* matches barrier in fuse_dev_do_read() */
> > > +	smp_mb__after_atomic();
> > > +	/* request *must* be FR_SENT here, because we ignored FR_PENDING before */
> > > +	WARN_ON(!test_bit(FR_SENT, &req->flags));
> > > +	queue_interrupt(req);
> > >  }
> > >  
> > >  static void __fuse_request_send(struct fuse_req *req)
> > > 
> > > avaialble as a full patch here:
> > > https://github.com/tych0/linux/commit/81b9ff4c8c1af24f6544945da808dbf69a1293f7
> > > 
> > > but now things are even weirder. Tasks are stuck at the killable wait, but with
> > > a SIGKILL pending for the thread group.
> > 
> > That's strange. No idea what's going on.
> 
> Thanks for taking a look. This is where it falls apart for me. In
> principle the patch seems simple, but this sleeping behavior is beyond
> my understanding.
> 
> Tycho
> 


  reply	other threads:[~2022-06-24 17:37 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-23 17:21 strange interaction between fuse + pidns Tycho Andersen
2022-06-23 21:55 ` Vivek Goyal
2022-06-23 23:41   ` Tycho Andersen
2022-06-24 17:36     ` Vivek Goyal [this message]
2022-07-11 10:35 ` Miklos Szeredi
2022-07-11 13:59   ` Miklos Szeredi
2022-07-11 20:25     ` Tycho Andersen
2022-07-11 21:37       ` Eric W. Biederman
2022-07-11 22:53         ` Tycho Andersen
2022-07-11 23:06           ` Eric W. Biederman
2022-07-12 13:43             ` Tycho Andersen
2022-07-12 14:34               ` Eric W. Biederman
2022-07-12 15:14                 ` Tycho Andersen
2022-07-13 17:53                   ` [PATCH] sched: __fatal_signal_pending() should also check PF_EXITING Tycho Andersen
2022-07-20 15:03                     ` Serge E. Hallyn
2022-07-20 20:58                       ` Tycho Andersen
2022-07-21  1:54                         ` Serge E. Hallyn
2022-07-27 15:44                           ` Tycho Andersen
2022-07-27 16:32                             ` Eric W. Biederman
2022-07-27 17:55                               ` Tycho Andersen
2022-07-28 18:48                                 ` Eric W. Biederman
2022-07-27 17:55                             ` Oleg Nesterov
2022-07-27 18:18                               ` Tycho Andersen
2022-07-27 19:19                                 ` Oleg Nesterov
2022-07-27 19:40                                   ` Tycho Andersen
2022-07-28  9:12                                     ` Oleg Nesterov
2022-07-28 21:20                                       ` Tycho Andersen
2022-07-29  5:04                                         ` Eric W. Biederman
2022-07-29 13:50                                           ` Tycho Andersen
2022-07-29 16:15                                             ` Eric W. Biederman
2022-07-29 16:48                                               ` Tycho Andersen
2022-07-29 17:40                                                 ` [RFC][PATCH] fuse: In fuse_flush only wait if someone wants the return code Eric W. Biederman
2022-07-29 20:47                                                   ` Oleg Nesterov
2022-07-30  0:15                                                     ` Al Viro
2022-07-30  5:10                                                       ` [RFC][PATCH v2] " Eric W. Biederman
2022-08-01 15:16                                                         ` Tycho Andersen
2022-08-02 12:50                                                         ` Miklos Szeredi
2022-08-15 13:59                                                         ` Tycho Andersen
2022-08-15 17:55                                                           ` Serge E. Hallyn
2022-09-01 14:06                                                           ` [PATCH] " Tycho Andersen
2022-09-19 15:03                                                             ` Tycho Andersen
2022-09-20 18:02                                                               ` Serge E. Hallyn
2022-09-26 14:17                                                               ` Tycho Andersen
2022-09-27  9:46                                                             ` Miklos Szeredi
2022-09-29 14:05                                                               ` [fuse-devel] " Stef Bon
2022-09-29 16:39                                                               ` [PATCH v2] " Tycho Andersen
2022-09-30 13:35                                                                 ` Miklos Szeredi
2022-09-30 14:01                                                                   ` Tycho Andersen
2022-09-30 14:41                                                                     ` Miklos Szeredi
2022-09-30 16:09                                                                       ` Tycho Andersen
2022-10-26  9:01                                                                         ` Miklos Szeredi
2022-11-14 16:02                                                                           ` [PATCH v3] " Tycho Andersen
2022-11-28 15:00                                                                             ` Tycho Andersen
2022-12-08 14:26                                                                               ` Miklos Szeredi
2022-12-08 17:49                                                                                 ` Tycho Andersen
2022-12-19 19:16                                                                                   ` Tycho Andersen
2023-01-03 14:51                                                                                     ` Tycho Andersen
2023-01-05 15:15                                                                                       ` Serge E. Hallyn
2023-01-26 14:12                                                                                       ` Miklos Szeredi
2022-09-30 19:47                                                               ` [PATCH] " Serge E. Hallyn
2022-09-19 15:46                                                           ` [RFC][PATCH v2] " Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YrX2O4Yv8elsQkF9@redhat.com \
    --to=vgoyal@redhat.com \
    --cc=brauner@kernel.org \
    --cc=ebiederm@xmission.com \
    --cc=fuse-devel@lists.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=tycho@tycho.pizza \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.