[RFC PATCH] sunrpc: do not allow process to freeze within RPC state machine

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH] sunrpc: do not allow process to freeze within RPC state machine
@ 2016-08-03 16:54 Stanislav Kinsburskiy
  2016-08-03 17:36 ` Jeff Layton
  2016-08-03 17:36 ` Cyrill Gorcunov
  0 siblings, 2 replies; 7+ messages in thread
From: Stanislav Kinsburskiy @ 2016-08-03 16:54 UTC (permalink / raw)
  To: bfields, jlayton, trond.myklebust, anna.schumaker
  Cc: linux-nfs, netdev, linux-kernel, gorcunov, davem, devel

Otherwise freezer cgroup state might never become "FROZEN".

Here is a deadlock scheme for 2 processes in one freezer cgroup, which is
freezing:

CPU 0                                   CPU 1
--------                                --------
do_last
inode_lock(dir->d_inode)
vfs_create
nfs_create
...
__rpc_execute
rpc_wait_bit_killable
__refrigerator
                                        do_last
                                        inode_lock(dir->d_inode)

So, the problem is that one process takes directory inode mutex, executes
creation request and goes to refrigerator.
Another one waits till directory lock is released, remains "thawed" and thus
freezer cgroup state never becomes "FROZEN".

Notes:
1) Interesting, that this is not a pure deadlock: one can thaw cgroup and then
freeze it again.
2) The issue was introduced by commit d310310cbff18ec385c6ab4d58f33b100192a96a.
3) This patch is not aimed to fix the issue, but to show the problem root.
Look like this problem moght be applicable to other hunks from the commit,
mentioned above.

Signed-off-by: Stanislav Kinsburskiy <skinsbursky@virtuozzo.com>
---
 net/sunrpc/sched.c |    1 -
 1 file changed, 1 deletion(-)

diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index 9ae5885..ec7ccc1 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -253,7 +253,6 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue);

 static int rpc_wait_bit_killable(struct wait_bit_key *key, int mode)
 {
-	freezable_schedule_unsafe();
 	if (signal_pending_state(mode, current))
 		return -ERESTARTSYS;
 	return 0;

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] sunrpc: do not allow process to freeze within RPC state machine
  2016-08-03 16:54 [RFC PATCH] sunrpc: do not allow process to freeze within RPC state machine Stanislav Kinsburskiy
@ 2016-08-03 17:36 ` Jeff Layton
  2016-08-03 18:14   ` Chuck Lever
  2016-08-04 10:55   ` Stanislav Kinsburskiy
  2016-08-03 17:36 ` Cyrill Gorcunov
  1 sibling, 2 replies; 7+ messages in thread
From: Jeff Layton @ 2016-08-03 17:36 UTC (permalink / raw)
  To: Stanislav Kinsburskiy, bfields, trond.myklebust, anna.schumaker
  Cc: linux-nfs, netdev, linux-kernel, gorcunov, davem, devel

On Wed, 2016-08-03 at 20:54 +0400, Stanislav Kinsburskiy wrote:
> Otherwise freezer cgroup state might never become "FROZEN".
> 
> Here is a deadlock scheme for 2 processes in one freezer cgroup,
> which is
> freezing:
> 
> CPU 0                                   CPU 1
> --------                                --------
> do_last
> inode_lock(dir->d_inode)
> vfs_create
> nfs_create
> ...
> __rpc_execute
> rpc_wait_bit_killable
> __refrigerator
>                                         do_last
>                                         inode_lock(dir->d_inode)
> 
> So, the problem is that one process takes directory inode mutex,
> executes
> creation request and goes to refrigerator.
> Another one waits till directory lock is released, remains "thawed"
> and thus
> freezer cgroup state never becomes "FROZEN".
> 
> Notes:
> 1) Interesting, that this is not a pure deadlock: one can thaw cgroup
> and then
> freeze it again.
> 2) The issue was introduced by commit
> d310310cbff18ec385c6ab4d58f33b100192a96a.
> 3) This patch is not aimed to fix the issue, but to show the problem
> root.
> Look like this problem moght be applicable to other hunks from the
> commit,
> mentioned above.
> 
> 
> Signed-off-by: Stanislav Kinsburskiy <skinsbursky@virtuozzo.com>
> ---
>  net/sunrpc/sched.c |    1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> index 9ae5885..ec7ccc1 100644
> --- a/net/sunrpc/sched.c
> +++ b/net/sunrpc/sched.c
> @@ -253,7 +253,6 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue);
>  
>  static int rpc_wait_bit_killable(struct wait_bit_key *key, int mode)
>  {
> -	freezable_schedule_unsafe();
>  	if (signal_pending_state(mode, current))
>  		return -ERESTARTSYS;
>  	return 0;
> 

Ummm...so what actually does the schedule() with this patch?

There was a bit of discussion on this recently -- see the thread with
this subject line in linux-nfs:

    Re: Hang due to nfs letting tasks freeze with locked inodes

Basically it comes down to this:

All of the proposals so far to fix this problem just switch out the
freezable_schedule_unsafe (and similar) calls for those that don't
allow the process to freeze.

The problem there is that we originally added that stuff in response to
bug reports about machines failing to suspend. What often happens is
that the network interfaces come down, and then the freezer runs over
all of the processes, which never return because they're blocked
waiting on the server to reply.

...shrug...

Maybe we should just go ahead and do it (and to CIFS as well). Just be
prepared for the inevitable complaints about laptops failing to suspend
once you do.

Part of the fix, I think is to add a return code (similar to
ERESTARTSYS) that gets interpreted near the kernel-userland boundary
as: "allow the process to be frozen, and then retry the call once it's
resumed".

With that, filesystems could return the error code when they want to
redrive the entire syscall from that level. That won't work for non-
idempotent requests though. We'd need to do something more elaborate
there.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] sunrpc: do not allow process to freeze within RPC state machine
  2016-08-03 17:36 ` Jeff Layton
@ 2016-08-03 18:14   ` Chuck Lever
  2016-08-04 10:55   ` Stanislav Kinsburskiy
  1 sibling, 0 replies; 7+ messages in thread
From: Chuck Lever @ 2016-08-03 18:14 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Stanislav Kinsburskiy, J. Bruce Fields, Trond Myklebust,
	Anna Schumaker, Linux NFS Mailing List, netdev, linux-kernel,
	gorcunov, davem, devel


> On Aug 3, 2016, at 1:36 PM, Jeff Layton <jlayton@redhat.com> wrote:
> 
> On Wed, 2016-08-03 at 20:54 +0400, Stanislav Kinsburskiy wrote:
>> Otherwise freezer cgroup state might never become "FROZEN".
>> 
>> Here is a deadlock scheme for 2 processes in one freezer cgroup,
>> which is
>> freezing:
>> 
>> CPU 0                                   CPU 1
>> --------                                --------
>> do_last
>> inode_lock(dir->d_inode)
>> vfs_create
>> nfs_create
>> ...
>> __rpc_execute
>> rpc_wait_bit_killable
>> __refrigerator
>>                                         do_last
>>                                         inode_lock(dir->d_inode)
>> 
>> So, the problem is that one process takes directory inode mutex,
>> executes
>> creation request and goes to refrigerator.
>> Another one waits till directory lock is released, remains "thawed"
>> and thus
>> freezer cgroup state never becomes "FROZEN".
>> 
>> Notes:
>> 1) Interesting, that this is not a pure deadlock: one can thaw cgroup
>> and then
>> freeze it again.
>> 2) The issue was introduced by commit
>> d310310cbff18ec385c6ab4d58f33b100192a96a.
>> 3) This patch is not aimed to fix the issue, but to show the problem
>> root.
>> Look like this problem moght be applicable to other hunks from the
>> commit,
>> mentioned above.
>> 
>> 
>> Signed-off-by: Stanislav Kinsburskiy <skinsbursky@virtuozzo.com>
>> ---
>>  net/sunrpc/sched.c |    1 -
>>  1 file changed, 1 deletion(-)
>> 
>> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
>> index 9ae5885..ec7ccc1 100644
>> --- a/net/sunrpc/sched.c
>> +++ b/net/sunrpc/sched.c
>> @@ -253,7 +253,6 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue);
>>  
>>  static int rpc_wait_bit_killable(struct wait_bit_key *key, int mode)
>>  {
>> -	freezable_schedule_unsafe();
>>  	if (signal_pending_state(mode, current))
>>  		return -ERESTARTSYS;
>>  	return 0;
>> 
> 
> Ummm...so what actually does the schedule() with this patch?
> 
> There was a bit of discussion on this recently -- see the thread with
> this subject line in linux-nfs:
> 
>     Re: Hang due to nfs letting tasks freeze with locked inodes
> 
> Basically it comes down to this:
> 
> All of the proposals so far to fix this problem just switch out the
> freezable_schedule_unsafe (and similar) calls for those that don't
> allow the process to freeze.
> 
> The problem there is that we originally added that stuff in response to
> bug reports about machines failing to suspend. What often happens is
> that the network interfaces come down, and then the freezer runs over
> all of the processes, which never return because they're blocked
> waiting on the server to reply.
> 
> ...shrug...
> 
> Maybe we should just go ahead and do it (and to CIFS as well). Just be
> prepared for the inevitable complaints about laptops failing to suspend
> once you do.
> 
> Part of the fix, I think is to add a return code (similar to
> ERESTARTSYS) that gets interpreted near the kernel-userland boundary
> as: "allow the process to be frozen, and then retry the call once it's
> resumed".
> 
> With that, filesystems could return the error code when they want to
> redrive the entire syscall from that level. That won't work for non-
> idempotent requests though. We'd need to do something more elaborate
> there.

There is a similar problem with NFS/RDMA.

An IB device driver can be unloaded at any time. The driver performs
an upcall to all consumers to request that they release all RDMA
resources associated with the device.

For RPC-over-RDMA, we could kill all running RPCs at this point. Or,
the RPCs could be suspended in place. The latter is desirable if the
device were re-inserted, or there were an alternate path to the NFS
server: then there would be no workload interruption.

Signals would have to be allowed so that ^C and soft timeouts still
work as expected.


--
Chuck Lever

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] sunrpc: do not allow process to freeze within RPC state machine
  2016-08-03 17:36 ` Jeff Layton
  2016-08-03 18:14   ` Chuck Lever
@ 2016-08-04 10:55   ` Stanislav Kinsburskiy
       [not found]     ` <cb097e9e-760b-5512-ff4a-276b31bfc83f-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
  1 sibling, 1 reply; 7+ messages in thread
From: Stanislav Kinsburskiy @ 2016-08-04 10:55 UTC (permalink / raw)
  To: Jeff Layton, bfields, trond.myklebust, anna.schumaker
  Cc: linux-nfs, netdev, linux-kernel, gorcunov, davem, devel



03.08.2016 19:36, Jeff Layton пишет:
> On Wed, 2016-08-03 at 20:54 +0400, Stanislav Kinsburskiy wrote:
>> Otherwise freezer cgroup state might never become "FROZEN".
>>
>> Here is a deadlock scheme for 2 processes in one freezer cgroup,
>> which is
>> freezing:
>>
>> CPU 0                                   CPU 1
>> --------                                --------
>> do_last
>> inode_lock(dir->d_inode)
>> vfs_create
>> nfs_create
>> ...
>> __rpc_execute
>> rpc_wait_bit_killable
>> __refrigerator
>>                                          do_last
>>                                          inode_lock(dir->d_inode)
>>
>> So, the problem is that one process takes directory inode mutex,
>> executes
>> creation request and goes to refrigerator.
>> Another one waits till directory lock is released, remains "thawed"
>> and thus
>> freezer cgroup state never becomes "FROZEN".
>>
>> Notes:
>> 1) Interesting, that this is not a pure deadlock: one can thaw cgroup
>> and then
>> freeze it again.
>> 2) The issue was introduced by commit
>> d310310cbff18ec385c6ab4d58f33b100192a96a.
>> 3) This patch is not aimed to fix the issue, but to show the problem
>> root.
>> Look like this problem moght be applicable to other hunks from the
>> commit,
>> mentioned above.
>>
>>
>> Signed-off-by: Stanislav Kinsburskiy <skinsbursky@virtuozzo.com>
>> ---
>>   net/sunrpc/sched.c |    1 -
>>   1 file changed, 1 deletion(-)
>>
>> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
>> index 9ae5885..ec7ccc1 100644
>> --- a/net/sunrpc/sched.c
>> +++ b/net/sunrpc/sched.c
>> @@ -253,7 +253,6 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue);
>>   
>>   static int rpc_wait_bit_killable(struct wait_bit_key *key, int mode)
>>   {
>> -	freezable_schedule_unsafe();
>>   	if (signal_pending_state(mode, current))
>>   		return -ERESTARTSYS;
>>   	return 0;
>>
> Ummm...so what actually does the schedule() with this patch?

Schedule() replaces freezable_schedule_unsafe() of course, sorry for this.

> There was a bit of discussion on this recently -- see the thread with
> this subject line in linux-nfs:
>
>      Re: Hang due to nfs letting tasks freeze with locked inodes

Thanks, had a look.

> Basically it comes down to this:
>
> All of the proposals so far to fix this problem just switch out the
> freezable_schedule_unsafe (and similar) calls for those that don't
> allow the process to freeze.
>
> The problem there is that we originally added that stuff in response to
> bug reports about machines failing to suspend. What often happens is
> that the network interfaces come down, and then the freezer runs over
> all of the processes, which never return because they're blocked
> waiting on the server to reply.

I probably don't understand something, but this sounds somewhat wrong to 
me: freezing processes _after_ network is down.


>
> ...shrug...
>
> Maybe we should just go ahead and do it (and to CIFS as well). Just be
> prepared for the inevitable complaints about laptops failing to suspend
> once you do.

The worst part in all of this, from my POW, is that current behavior
makes NFS non-freezable in a generic case, even in case of freezing a
container, which has it's own net ns and NFS mount.
So, I would say, that returning of previous logic would make the
world better.

> Part of the fix, I think is to add a return code (similar to
> ERESTARTSYS) that gets interpreted near the kernel-userland boundary
> as: "allow the process to be frozen, and then retry the call once it's
> resumed".
>
> With that, filesystems could return the error code when they want to
> redrive the entire syscall from that level. That won't work for non-
> idempotent requests though. We'd need to do something more elaborate
> there.
>

Might be, that breaking rpc request is something that should be avoided 
at all.
With all these locks being held, almost all (any?) of the requests to 
remote server
should be considered as an atomic operation from freezer point of view.
The process always can be frozen on signal handling.

IOW, I might worth considering a scenario, when NFS is not freezable at all,
and any problems with suspend on laptops/whatever have to solved in 
suspend code.

^ permalink raw reply	[flat|nested] 7+ messages in thread

[parent not found: <cb097e9e-760b-5512-ff4a-276b31bfc83f-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>]

* Re: [RFC PATCH] sunrpc: do not allow process to freeze within RPC state machine
       [not found]     ` <cb097e9e-760b-5512-ff4a-276b31bfc83f-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
@ 2016-08-04 13:16       ` Jeff Layton
  2016-08-04 13:57         ` Stanislav Kinsburskiy
  0 siblings, 1 reply; 7+ messages in thread
From: Jeff Layton @ 2016-08-04 13:16 UTC (permalink / raw)
  To: skinsbursky-5HdwGun5lf+gSpxsJD1C4w,
	bfields-uC3wQj2KruNg9hUCZPvPmw,
	trond.myklebust-7I+n7zu2hftEKMMhf/gKZA,
	anna.schumaker-HgOvQuBEEgTQT0dZR+AlfA
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	gorcunov-5HdwGun5lf+gSpxsJD1C4w, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	devel-GEFAQzZX7r8dnm+yROfE0A

On Thu, 2016-08-04 at 12:55 +0200, Stanislav Kinsburskiy wrote:
> 
> 03.08.2016 19:36, Jeff Layton пишет:
> > 
> > On Wed, 2016-08-03 at 20:54 +0400, Stanislav Kinsburskiy wrote:
> > > 
> > > Otherwise freezer cgroup state might never become "FROZEN".
> > > 
> > > Here is a deadlock scheme for 2 processes in one freezer cgroup,
> > > which is
> > > freezing:
> > > 
> > > CPU 0                                   CPU 1
> > > --------                                --------
> > > do_last
> > > inode_lock(dir->d_inode)
> > > vfs_create
> > > nfs_create
> > > ...
> > > __rpc_execute
> > > rpc_wait_bit_killable
> > > __refrigerator
> > >                                          do_last
> > >                                          inode_lock(dir->d_inode)
> > > 
> > > So, the problem is that one process takes directory inode mutex,
> > > executes
> > > creation request and goes to refrigerator.
> > > Another one waits till directory lock is released, remains "thawed"
> > > and thus
> > > freezer cgroup state never becomes "FROZEN".
> > > 
> > > Notes:
> > > 1) Interesting, that this is not a pure deadlock: one can thaw cgroup
> > > and then
> > > freeze it again.
> > > 2) The issue was introduced by commit
> > > d310310cbff18ec385c6ab4d58f33b100192a96a.
> > > 3) This patch is not aimed to fix the issue, but to show the problem
> > > root.
> > > Look like this problem moght be applicable to other hunks from the
> > > commit,
> > > mentioned above.
> > > 
> > > 
> > > > > > Signed-off-by: Stanislav Kinsburskiy <skinsbursky-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
> > > ---
> > >   net/sunrpc/sched.c |    1 -
> > >   1 file changed, 1 deletion(-)
> > > 
> > > diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> > > index 9ae5885..ec7ccc1 100644
> > > --- a/net/sunrpc/sched.c
> > > +++ b/net/sunrpc/sched.c
> > > @@ -253,7 +253,6 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue);
> > >   
> > >   static int rpc_wait_bit_killable(struct wait_bit_key *key, int mode)
> > >   {
> > > > > > -	freezable_schedule_unsafe();
> > > > > >   	if (signal_pending_state(mode, current))
> > > > > >   		return -ERESTARTSYS;
> > > > > >   	return 0;
> > > 
> > Ummm...so what actually does the schedule() with this patch?
> 
> Schedule() replaces freezable_schedule_unsafe() of course, sorry for this.
> 
> > 
> > There was a bit of discussion on this recently -- see the thread with
> > this subject line in linux-nfs:
> > 
> >      Re: Hang due to nfs letting tasks freeze with locked inodes
> 
> Thanks, had a look.
> 
> > 
> > Basically it comes down to this:
> > 
> > All of the proposals so far to fix this problem just switch out the
> > freezable_schedule_unsafe (and similar) calls for those that don't
> > allow the process to freeze.
> > 
> > The problem there is that we originally added that stuff in response to
> > bug reports about machines failing to suspend. What often happens is
> > that the network interfaces come down, and then the freezer runs over
> > all of the processes, which never return because they're blocked
> > waiting on the server to reply.
> 
> I probably don't understand something, but this sounds somewhat wrong to 
> me: freezing processes _after_ network is down.
> 
> 
> > 
> > 
> > ...shrug...
> > 
> > Maybe we should just go ahead and do it (and to CIFS as well). Just be
> > prepared for the inevitable complaints about laptops failing to suspend
> > once you do.
> 
> The worst part in all of this, from my POW, is that current behavior
> makes NFS non-freezable in a generic case, even in case of freezing a
> container, which has it's own net ns and NFS mount.
> So, I would say, that returning of previous logic would make the
> world better.
> 
> > 
> > Part of the fix, I think is to add a return code (similar to
> > ERESTARTSYS) that gets interpreted near the kernel-userland boundary
> > as: "allow the process to be frozen, and then retry the call once it's
> > resumed".
> > 
> > With that, filesystems could return the error code when they want to
> > redrive the entire syscall from that level. That won't work for non-
> > idempotent requests though. We'd need to do something more elaborate
> > there.
> > 
> 
> Might be, that breaking rpc request is something that should be avoided 
> at all.
> With all these locks being held, almost all (any?) of the requests to 
> remote server
> should be considered as an atomic operation from freezer point of view.
> The process always can be frozen on signal handling.
> 
> IOW, I might worth considering a scenario, when NFS is not freezable at all,
> and any problems with suspend on laptops/whatever have to solved in 
> suspend code.
> 
> 

Fair enough. At this point, I don't care much one way or another. Maybe
if we make this change and laptops start failing to suspend, we'll be
able to use that as leverage pursue other avenues to make the
suspend/resume subsystem work with NFS.

That said, the patch you have really isn't sufficient. There are places
where the NFS client can sleep while waiting for things other than RPC
calls.

I think what you'd want is something like the patch below. This also
has a similar delta for cifs.ko, which will have similar problems.

You can pick these from here as well:

     https://git.samba.org/?p=jlayton/linux.git;a=shortlog;h=refs/heads/netfs-freeze

This is untested, of course, and we can probably eliminate some of the
_freezable_ macros with this change as well.

Trond, Anna...what are your thoughts here?

diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index fcfd48d263f6..ac1cf968a07d 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -252,7 +252,7 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue);
 
 static int rpc_wait_bit_killable(struct wait_bit_key *key, int mode)
 {
-	freezable_schedule_unsafe();
+	schedule();
 	if (signal_pending_state(mode, current))
 		return -ERESTARTSYS;
 	return 0;
diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c
index b87efd0c92d6..7707df9d5ca7 100644
--- a/fs/cifs/inode.c
+++ b/fs/cifs/inode.c
@@ -22,7 +22,6 @@
 #include <linux/stat.h>
 #include <linux/slab.h>
 #include <linux/pagemap.h>
-#include <linux/freezer.h>
 #include <asm/div64.h>
 #include "cifsfs.h"
 #include "cifspdu.h"
@@ -1871,7 +1870,7 @@ cifs_invalidate_mapping(struct inode *inode)
 static int
 cifs_wait_bit_killable(struct wait_bit_key *key, int mode)
 {
-	freezable_schedule_unsafe();
+	schedule();
 	if (signal_pending_state(mode, current))
 		return -ERESTARTSYS;
 	return 0;
diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
index 206a597b2293..5ec700e9d7ed 100644
--- a/fs/cifs/transport.c
+++ b/fs/cifs/transport.c
@@ -26,7 +26,6 @@
 #include <linux/wait.h>
 #include <linux/net.h>
 #include <linux/delay.h>
-#include <linux/freezer.h>
 #include <linux/tcp.h>
 #include <linux/highmem.h>
 #include <asm/uaccess.h>
@@ -438,7 +437,7 @@ wait_for_response(struct TCP_Server_Info *server, struct mid_q_entry *midQ)
 {
 	int error;
 
-	error = wait_event_freezekillable_unsafe(server->response_q,
+	error = wait_event_killable(server->response_q,
 				    midQ->mid_state != MID_REQUEST_SUBMITTED);
 	if (error < 0)
 		return -ERESTARTSYS;
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index bf4ec5ecc97e..b9f31854db7a 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -37,7 +37,6 @@
 #include <linux/nfs_xdr.h>
 #include <linux/slab.h>
 #include <linux/compat.h>
-#include <linux/freezer.h>
 
 #include <asm/uaccess.h>
 
@@ -73,7 +72,7 @@ nfs_fattr_to_ino_t(struct nfs_fattr *fattr)
 
 static int nfs_wait_killable(int mode)
 {
-	freezable_schedule_unsafe();
+	schedule();
 	if (signal_pending_state(mode, current))
 		return -ERESTARTSYS;
 	return 0;
diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c
index cb28cceefebe..f938d245f256 100644
--- a/fs/nfs/nfs3proc.c
+++ b/fs/nfs/nfs3proc.c
@@ -17,7 +17,6 @@
 #include <linux/nfs_page.h>
 #include <linux/lockd/bind.h>
 #include <linux/nfs_mount.h>
-#include <linux/freezer.h>
 #include <linux/xattr.h>
 
 #include "iostat.h"
@@ -35,7 +34,7 @@ nfs3_rpc_wrapper(struct rpc_clnt *clnt, struct rpc_message *msg, int flags)
 		res = rpc_call_sync(clnt, msg, flags);
 		if (res != -EJUKEBOX)
 			break;
-		freezable_schedule_timeout_killable_unsafe(NFS_JUKEBOX_RETRY_TIME);
+		schedule_timeout_killable(NFS_JUKEBOX_RETRY_TIME);
 		res = -ERESTARTSYS;
 	} while (!fatal_signal_pending(current));
 	return res;
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index da5c9e58e907..304b080c519b 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -54,7 +54,6 @@
 #include <linux/module.h>
 #include <linux/xattr.h>
 #include <linux/utsname.h>
-#include <linux/freezer.h>
 
 #include "nfs4_fs.h"
 #include "delegation.h"
@@ -348,8 +347,7 @@ static int nfs4_delay(struct rpc_clnt *clnt, long *timeout)
 
 	might_sleep();
 
-	freezable_schedule_timeout_killable_unsafe(
-		nfs4_update_delay(timeout));
+	schedule_timeout_killable(nfs4_update_delay(timeout));
 	if (fatal_signal_pending(current))
 		res = -ERESTARTSYS;
 	return res;
@@ -5484,7 +5482,7 @@ int nfs4_proc_delegreturn(struct inode *inode, struct rpc_cred *cred, const nfs4
 static unsigned long
 nfs4_set_lock_task_retry(unsigned long timeout)
 {
-	freezable_schedule_timeout_killable_unsafe(timeout);
+	schedule_timeout_killable(timeout);
 	timeout <<= 1;
 	if (timeout > NFS4_LOCK_MAXTIMEOUT)
 		return NFS4_LOCK_MAXTIMEOUT;
diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index 9ae588511aaf..73946acc01d2 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -18,7 +18,6 @@
 #include <linux/smp.h>
 #include <linux/spinlock.h>
 #include <linux/mutex.h>
-#include <linux/freezer.h>
 
 #include <linux/sunrpc/clnt.h>
 
@@ -253,7 +252,7 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue);
 
 static int rpc_wait_bit_killable(struct wait_bit_key *key, int mode)
 {
-	freezable_schedule_unsafe();
+	schedule();
 	if (signal_pending_state(mode, current))
 		return -ERESTARTSYS;
 	return 0;


-- 
Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] sunrpc: do not allow process to freeze within RPC state machine
  2016-08-04 13:16       ` Jeff Layton
@ 2016-08-04 13:57         ` Stanislav Kinsburskiy
  0 siblings, 0 replies; 7+ messages in thread
From: Stanislav Kinsburskiy @ 2016-08-04 13:57 UTC (permalink / raw)
  To: Jeff Layton, bfields, trond.myklebust, anna.schumaker
  Cc: linux-nfs, netdev, linux-kernel, gorcunov, davem, devel



04.08.2016 15:16, Jeff Layton пишет:
> On Thu, 2016-08-04 at 12:55 +0200, Stanislav Kinsburskiy wrote:
>> 03.08.2016 19:36, Jeff Layton пишет:
>>> On Wed, 2016-08-03 at 20:54 +0400, Stanislav Kinsburskiy wrote:
>>>> Otherwise freezer cgroup state might never become "FROZEN".
>>>>
>>>> Here is a deadlock scheme for 2 processes in one freezer cgroup,
>>>> which is
>>>> freezing:
>>>>
>>>> CPU 0                                   CPU 1
>>>> --------                                --------
>>>> do_last
>>>> inode_lock(dir->d_inode)
>>>> vfs_create
>>>> nfs_create
>>>> ...
>>>> __rpc_execute
>>>> rpc_wait_bit_killable
>>>> __refrigerator
>>>>                                           do_last
>>>>                                           inode_lock(dir->d_inode)
>>>>
>>>> So, the problem is that one process takes directory inode mutex,
>>>> executes
>>>> creation request and goes to refrigerator.
>>>> Another one waits till directory lock is released, remains "thawed"
>>>> and thus
>>>> freezer cgroup state never becomes "FROZEN".
>>>>
>>>> Notes:
>>>> 1) Interesting, that this is not a pure deadlock: one can thaw cgroup
>>>> and then
>>>> freeze it again.
>>>> 2) The issue was introduced by commit
>>>> d310310cbff18ec385c6ab4d58f33b100192a96a.
>>>> 3) This patch is not aimed to fix the issue, but to show the problem
>>>> root.
>>>> Look like this problem moght be applicable to other hunks from the
>>>> commit,
>>>> mentioned above.
>>>>
>>>>
>>>>>>> Signed-off-by: Stanislav Kinsburskiy <skinsbursky@virtuozzo.com>
>>>> ---
>>>>    net/sunrpc/sched.c |    1 -
>>>>    1 file changed, 1 deletion(-)
>>>>
>>>> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
>>>> index 9ae5885..ec7ccc1 100644
>>>> --- a/net/sunrpc/sched.c
>>>> +++ b/net/sunrpc/sched.c
>>>> @@ -253,7 +253,6 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue);
>>>>    
>>>>    static int rpc_wait_bit_killable(struct wait_bit_key *key, int mode)
>>>>    {
>>>>>>> -	freezable_schedule_unsafe();
>>>>>>>    	if (signal_pending_state(mode, current))
>>>>>>>    		return -ERESTARTSYS;
>>>>>>>    	return 0;
>>> Ummm...so what actually does the schedule() with this patch?
>> Schedule() replaces freezable_schedule_unsafe() of course, sorry for this.
>>
>>> There was a bit of discussion on this recently -- see the thread with
>>> this subject line in linux-nfs:
>>>
>>>       Re: Hang due to nfs letting tasks freeze with locked inodes
>> Thanks, had a look.
>>
>>> Basically it comes down to this:
>>>
>>> All of the proposals so far to fix this problem just switch out the
>>> freezable_schedule_unsafe (and similar) calls for those that don't
>>> allow the process to freeze.
>>>
>>> The problem there is that we originally added that stuff in response to
>>> bug reports about machines failing to suspend. What often happens is
>>> that the network interfaces come down, and then the freezer runs over
>>> all of the processes, which never return because they're blocked
>>> waiting on the server to reply.
>> I probably don't understand something, but this sounds somewhat wrong to
>> me: freezing processes _after_ network is down.
>>
>>
>>>
>>> ...shrug...
>>>
>>> Maybe we should just go ahead and do it (and to CIFS as well). Just be
>>> prepared for the inevitable complaints about laptops failing to suspend
>>> once you do.
>> The worst part in all of this, from my POW, is that current behavior
>> makes NFS non-freezable in a generic case, even in case of freezing a
>> container, which has it's own net ns and NFS mount.
>> So, I would say, that returning of previous logic would make the
>> world better.
>>
>>> Part of the fix, I think is to add a return code (similar to
>>> ERESTARTSYS) that gets interpreted near the kernel-userland boundary
>>> as: "allow the process to be frozen, and then retry the call once it's
>>> resumed".
>>>
>>> With that, filesystems could return the error code when they want to
>>> redrive the entire syscall from that level. That won't work for non-
>>> idempotent requests though. We'd need to do something more elaborate
>>> there.
>>>
>> Might be, that breaking rpc request is something that should be avoided
>> at all.
>> With all these locks being held, almost all (any?) of the requests to
>> remote server
>> should be considered as an atomic operation from freezer point of view.
>> The process always can be frozen on signal handling.
>>
>> IOW, I might worth considering a scenario, when NFS is not freezable at all,
>> and any problems with suspend on laptops/whatever have to solved in
>> suspend code.
>>
>>
> Fair enough. At this point, I don't care much one way or another. Maybe
> if we make this change and laptops start failing to suspend, we'll be
> able to use that as leverage pursue other avenues to make the
> suspend/resume subsystem work with NFS.
>
> That said, the patch you have really isn't sufficient. There are places
> where the NFS client can sleep while waiting for things other than RPC
> calls.

Sure. As I said, this patch wasn't aimed to fix the issue but rather 
start the discussion.
Thanks for your patch.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] sunrpc: do not allow process to freeze within RPC state machine
  2016-08-03 16:54 [RFC PATCH] sunrpc: do not allow process to freeze within RPC state machine Stanislav Kinsburskiy
  2016-08-03 17:36 ` Jeff Layton
@ 2016-08-03 17:36 ` Cyrill Gorcunov
  1 sibling, 0 replies; 7+ messages in thread
From: Cyrill Gorcunov @ 2016-08-03 17:36 UTC (permalink / raw)
  To: Stanislav Kinsburskiy
  Cc: bfields, jlayton, trond.myklebust, anna.schumaker, linux-nfs,
	netdev, linux-kernel, davem, devel

On Wed, Aug 03, 2016 at 08:54:50PM +0400, Stanislav Kinsburskiy wrote:
> Otherwise freezer cgroup state might never become "FROZEN".
> 
> Here is a deadlock scheme for 2 processes in one freezer cgroup, which is
> freezing:
> 
> CPU 0                                   CPU 1
> --------                                --------
> do_last
> inode_lock(dir->d_inode)
> vfs_create
> nfs_create
> ...
> __rpc_execute
> rpc_wait_bit_killable
> __refrigerator
>                                         do_last
>                                         inode_lock(dir->d_inode)
> 
> So, the problem is that one process takes directory inode mutex, executes
> creation request and goes to refrigerator.
> Another one waits till directory lock is released, remains "thawed" and thus
> freezer cgroup state never becomes "FROZEN".
> 
> Notes:
> 1) Interesting, that this is not a pure deadlock: one can thaw cgroup and then
> freeze it again.
> 2) The issue was introduced by commit d310310cbff18ec385c6ab4d58f33b100192a96a.
> 3) This patch is not aimed to fix the issue, but to show the problem root.
> Look like this problem moght be applicable to other hunks from the commit,
> mentioned above.
> 
> Signed-off-by: Stanislav Kinsburskiy <skinsbursky@virtuozzo.com>

I think it's worth adding backtrace as well
---

=== pid: 708987 === (file_read)

[<ffffffff810de47b>] __refrigerator+0x5b/0x190
[<ffffffffa01e6d36>] rpc_wait_bit_killable+0x66/0x80 [sunrpc]
[<ffffffffa01e7bf4>] __rpc_execute+0x154/0x420 [sunrpc]
[<ffffffffa01e972e>] rpc_execute+0x5e/0xa0 [sunrpc]
[<ffffffffa01df220>] rpc_run_task+0x70/0x90 [sunrpc]
[<ffffffffa01df290>] rpc_call_sync+0x50/0xc0 [sunrpc]
[<ffffffffa035c5cb>] nfs3_rpc_wrapper.constprop.10+0x6b/0xb0 [nfsv3]
[<ffffffffa035c6cf>] nfs3_proc_setattr+0xbf/0x140 [nfsv3]
[<ffffffffa035db03>] nfs3_proc_create+0x1a3/0x220 [nfsv3]
[<ffffffffa036c023>] nfs_create+0x83/0x150 [nfs]
[<ffffffff8120639c>] vfs_create+0x8c/0x110
[<ffffffff8120963d>] do_last+0xc0d/0x11d0
[<ffffffff81209cc2>] path_openat+0xc2/0x460
[<ffffffff8120b45b>] do_filp_open+0x4b/0xb0
[<ffffffff811f8c63>] do_sys_open+0xf3/0x1f0
[<ffffffff811f8d7e>] SyS_open+0x1e/0x20
[<ffffffff81643449>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

=== pid: 708988 === (file_read)

[<ffffffff81208cb3>] do_last+0x283/0x11d0
[<ffffffff81209cc2>] path_openat+0xc2/0x460
[<ffffffff8120b45b>] do_filp_open+0x4b/0xb0
[<ffffffff811f8c63>] do_sys_open+0xf3/0x1f0
[<ffffffff811f8d7e>] SyS_open+0x1e/0x20
[<ffffffff81643449>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-08-04 13:57 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-03 16:54 [RFC PATCH] sunrpc: do not allow process to freeze within RPC state machine Stanislav Kinsburskiy
2016-08-03 17:36 ` Jeff Layton
2016-08-03 18:14   ` Chuck Lever
2016-08-04 10:55   ` Stanislav Kinsburskiy
     [not found]     ` <cb097e9e-760b-5512-ff4a-276b31bfc83f-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
2016-08-04 13:16       ` Jeff Layton
2016-08-04 13:57         ` Stanislav Kinsburskiy
2016-08-03 17:36 ` Cyrill Gorcunov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).