* Hang due to nfs letting tasks freeze with locked inodes @ 2016-07-06 17:46 Seth Forshee 2016-07-06 22:07 ` Jeff Layton 0 siblings, 1 reply; 19+ messages in thread From: Seth Forshee @ 2016-07-06 17:46 UTC (permalink / raw) To: Trond Myklebust, Anna Schumaker Cc: linux-fsdevel, linux-nfs, linux-kernel, Tycho Andersen We're seeing a hang when freezing a container with an nfs bind mount while running iozone. Two iozone processes were hung with this stack trace. [<ffffffff81821b15>] schedule+0x35/0x80 [<ffffffff81821dbe>] schedule_preempt_disabled+0xe/0x10 [<ffffffff818239f9>] __mutex_lock_slowpath+0xb9/0x130 [<ffffffff81823a8f>] mutex_lock+0x1f/0x30 [<ffffffff8121d00b>] do_unlinkat+0x12b/0x2d0 [<ffffffff8121dc16>] SyS_unlink+0x16/0x20 [<ffffffff81825bf2>] entry_SYSCALL_64_fastpath+0x16/0x71 This seems to be due to another iozone thread frozen during unlink with this stack trace: [<ffffffff810e9cfa>] __refrigerator+0x7a/0x140 [<ffffffffc08e80b8>] nfs4_handle_exception+0x118/0x130 [nfsv4] [<ffffffffc08e9efd>] nfs4_proc_remove+0x7d/0xf0 [nfsv4] [<ffffffffc088a329>] nfs_unlink+0x149/0x350 [nfs] [<ffffffff81219bd1>] vfs_unlink+0xf1/0x1a0 [<ffffffff8121d159>] do_unlinkat+0x279/0x2d0 [<ffffffff8121dc16>] SyS_unlink+0x16/0x20 [<ffffffff81825bf2>] entry_SYSCALL_64_fastpath+0x16/0x71 Since nfs is allowing the thread to be frozen with the inode locked it's preventing other threads trying to lock the same inode from freezing. It seems like a bad idea for nfs to be doing this. Can nfs do something different here to prevent this? Maybe use a non-freezable sleep and let the operation complete, or else abort the operation and return ERESTARTSYS? Thanks, Seth ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Hang due to nfs letting tasks freeze with locked inodes 2016-07-06 17:46 Hang due to nfs letting tasks freeze with locked inodes Seth Forshee @ 2016-07-06 22:07 ` Jeff Layton 2016-07-07 3:55 ` Seth Forshee ` (2 more replies) 0 siblings, 3 replies; 19+ messages in thread From: Jeff Layton @ 2016-07-06 22:07 UTC (permalink / raw) To: Seth Forshee, Trond Myklebust, Anna Schumaker Cc: linux-fsdevel, linux-nfs, linux-kernel, Tycho Andersen On Wed, 2016-07-06 at 12:46 -0500, Seth Forshee wrote: > We're seeing a hang when freezing a container with an nfs bind mount while > running iozone. Two iozone processes were hung with this stack trace. > > [] schedule+0x35/0x80 > [] schedule_preempt_disabled+0xe/0x10 > [] __mutex_lock_slowpath+0xb9/0x130 > [] mutex_lock+0x1f/0x30 > [] do_unlinkat+0x12b/0x2d0 > [] SyS_unlink+0x16/0x20 > [] entry_SYSCALL_64_fastpath+0x16/0x71 > > This seems to be due to another iozone thread frozen during unlink with > this stack trace: > > [] __refrigerator+0x7a/0x140 > [] nfs4_handle_exception+0x118/0x130 [nfsv4] > [] nfs4_proc_remove+0x7d/0xf0 [nfsv4] > [] nfs_unlink+0x149/0x350 [nfs] > [] vfs_unlink+0xf1/0x1a0 > [] do_unlinkat+0x279/0x2d0 > [] SyS_unlink+0x16/0x20 > [] entry_SYSCALL_64_fastpath+0x16/0x71 > > Since nfs is allowing the thread to be frozen with the inode locked it's > preventing other threads trying to lock the same inode from freezing. It > seems like a bad idea for nfs to be doing this. > Yeah, known problem. Not a simple one to fix though. > Can nfs do something different here to prevent this? Maybe use a > non-freezable sleep and let the operation complete, or else abort the > operation and return ERESTARTSYS? The problem with letting the op complete is that often by the time you get to the point of trying to freeze processes, the network interfaces are already shut down. So the operation you're waiting on might never complete. Stuff like suspend operations on your laptop fail, leading to fun bug reports like: "Oh, my laptop burned to crisp inside my bag because the suspend never completed." You could (in principle) return something like -ERESTARTSYS iff the call has not yet been transmitted. If it has already been transmitted, then you might end up sending the call a second time (but not as an RPC retransmission of course). If that call was non-idempotent then you end up with all of _those_ sorts of problems. Also, -ERESTARTSYS is not quite right as it doesn't always cause the call to be restarted. It depends on the syscall. I think this would probably need some other sort of syscall-restart machinery plumbed in. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Hang due to nfs letting tasks freeze with locked inodes 2016-07-06 22:07 ` Jeff Layton @ 2016-07-07 3:55 ` Seth Forshee 2016-07-07 10:29 ` Jeff Layton 2016-07-07 23:53 ` Dave Chinner 2016-07-08 12:22 ` Michal Hocko 2 siblings, 1 reply; 19+ messages in thread From: Seth Forshee @ 2016-07-07 3:55 UTC (permalink / raw) To: Jeff Layton Cc: Trond Myklebust, Anna Schumaker, linux-fsdevel, linux-nfs, linux-kernel, Tycho Andersen On Wed, Jul 06, 2016 at 06:07:18PM -0400, Jeff Layton wrote: > On Wed, 2016-07-06 at 12:46 -0500, Seth Forshee wrote: > > We're seeing a hang when freezing a container with an nfs bind mount while > > running iozone. Two iozone processes were hung with this stack trace. > > > > [] schedule+0x35/0x80 > > [] schedule_preempt_disabled+0xe/0x10 > > [] __mutex_lock_slowpath+0xb9/0x130 > > [] mutex_lock+0x1f/0x30 > > [] do_unlinkat+0x12b/0x2d0 > > [] SyS_unlink+0x16/0x20 > > [] entry_SYSCALL_64_fastpath+0x16/0x71 > > > > This seems to be due to another iozone thread frozen during unlink with > > this stack trace: > > > > [] __refrigerator+0x7a/0x140 > > [] nfs4_handle_exception+0x118/0x130 [nfsv4] > > [] nfs4_proc_remove+0x7d/0xf0 [nfsv4] > > [] nfs_unlink+0x149/0x350 [nfs] > > [] vfs_unlink+0xf1/0x1a0 > > [] do_unlinkat+0x279/0x2d0 > > [] SyS_unlink+0x16/0x20 > > [] entry_SYSCALL_64_fastpath+0x16/0x71 > > > > Since nfs is allowing the thread to be frozen with the inode locked it's > > preventing other threads trying to lock the same inode from freezing. It > > seems like a bad idea for nfs to be doing this. > > > > Yeah, known problem. Not a simple one to fix though. > > > Can nfs do something different here to prevent this? Maybe use a > > non-freezable sleep and let the operation complete, or else abort the > > operation and return ERESTARTSYS? > > The problem with letting the op complete is that often by the time you > get to the point of trying to freeze processes, the network interfaces > are already shut down. So the operation you're waiting on might never > complete. Stuff like suspend operations on your laptop fail, leading to > fun bug reports like: "Oh, my laptop burned to crisp inside my bag > because the suspend never completed." > > You could (in principle) return something like -ERESTARTSYS iff the > call has not yet been transmitted. If it has already been transmitted, > then you might end up sending the call a second time (but not as an RPC > retransmission of course). If that call was non-idempotent then you end > up with all of _those_ sorts of problems. > > Also, -ERESTARTSYS is not quite right as it doesn't always cause the > call to be restarted. It depends on the syscall. I think this would > probably need some other sort of syscall-restart machinery plumbed in. I don't really know much at all about how NFS works, so I hope you don't mind indulging me in some questions. What happens then if you suspend waiting for an op to complete and then resume an hour later? Will it actually succeed or end up returning some sort of "timed out" error? If it's going to be an error (or even likely to be one) could the op just be aborted immediately with an error code? It just seems like there must be something better than potentially deadlocking the kernel. Thanks, Seth ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Hang due to nfs letting tasks freeze with locked inodes 2016-07-07 3:55 ` Seth Forshee @ 2016-07-07 10:29 ` Jeff Layton 0 siblings, 0 replies; 19+ messages in thread From: Jeff Layton @ 2016-07-07 10:29 UTC (permalink / raw) To: Seth Forshee Cc: Trond Myklebust, Anna Schumaker, linux-fsdevel, linux-nfs, linux-kernel, Tycho Andersen On Wed, 2016-07-06 at 22:55 -0500, Seth Forshee wrote: > On Wed, Jul 06, 2016 at 06:07:18PM -0400, Jeff Layton wrote: > > > > On Wed, 2016-07-06 at 12:46 -0500, Seth Forshee wrote: > > > > > > We're seeing a hang when freezing a container with an nfs bind mount while > > > running iozone. Two iozone processes were hung with this stack trace. > > > > > > [] schedule+0x35/0x80 > > > [] schedule_preempt_disabled+0xe/0x10 > > > [] __mutex_lock_slowpath+0xb9/0x130 > > > [] mutex_lock+0x1f/0x30 > > > [] do_unlinkat+0x12b/0x2d0 > > > [] SyS_unlink+0x16/0x20 > > > [] entry_SYSCALL_64_fastpath+0x16/0x71 > > > > > > This seems to be due to another iozone thread frozen during unlink with > > > this stack trace: > > > > > > [] __refrigerator+0x7a/0x140 > > > [] nfs4_handle_exception+0x118/0x130 [nfsv4] > > > [] nfs4_proc_remove+0x7d/0xf0 [nfsv4] > > > [] nfs_unlink+0x149/0x350 [nfs] > > > [] vfs_unlink+0xf1/0x1a0 > > > [] do_unlinkat+0x279/0x2d0 > > > [] SyS_unlink+0x16/0x20 > > > [] entry_SYSCALL_64_fastpath+0x16/0x71 > > > > > > Since nfs is allowing the thread to be frozen with the inode locked it's > > > preventing other threads trying to lock the same inode from freezing. It > > > seems like a bad idea for nfs to be doing this. > > > > > Yeah, known problem. Not a simple one to fix though. > > > > > > > > Can nfs do something different here to prevent this? Maybe use a > > > non-freezable sleep and let the operation complete, or else abort the > > > operation and return ERESTARTSYS? > > The problem with letting the op complete is that often by the time you > > get to the point of trying to freeze processes, the network interfaces > > are already shut down. So the operation you're waiting on might never > > complete. Stuff like suspend operations on your laptop fail, leading to > > fun bug reports like: "Oh, my laptop burned to crisp inside my bag > > because the suspend never completed." > > > > You could (in principle) return something like -ERESTARTSYS iff the > > call has not yet been transmitted. If it has already been transmitted, > > then you might end up sending the call a second time (but not as an RPC > > retransmission of course). If that call was non-idempotent then you end > > up with all of _those_ sorts of problems. > > > > Also, -ERESTARTSYS is not quite right as it doesn't always cause the > > call to be restarted. It depends on the syscall. I think this would > > probably need some other sort of syscall-restart machinery plumbed in. > I don't really know much at all about how NFS works, so I hope you don't > mind indulging me in some questions. > > What happens then if you suspend waiting for an op to complete and then > resume an hour later? Will it actually succeed or end up returning some > sort of "timed out" error? > Well, the RPC would likely time out. The RPC engine would then likely end up retransmitting it. What happens at that point depends on a lot of different factors -- what sort of call it was and how the server behaves, whether it's NFSv3 or v4, etc... If it was an idempotent call or the server still has the reply in its duplicate reply cache, then everything "just works". If it's non- idempotent or relies on some now-expired state, then you might get an error because the same call ended up getting retransmitted or the state that it relies on is now gone. > If it's going to be an error (or even likely to be one) could the op > just be aborted immediately with an error code? It just seems like there > must be something better than potentially deadlocking the kernel. > Not without breaking "hard" retry semantics. We had discussed at one point adding a 3rd alternative to hard vs. soft mount options (squishy?) that would do more or less what you suggest: allow syscalls to return an error when the task is being frozen. You'd only really want to do that though if you've already transmitted the call, waited for a while (several seconds) and didn't get a reply. If the call hasn't been transmitted yet, then you'd either want to restart the call from scratch after unfreezing (a'la something like ERESTARTSYS). -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Hang due to nfs letting tasks freeze with locked inodes 2016-07-06 22:07 ` Jeff Layton 2016-07-07 3:55 ` Seth Forshee @ 2016-07-07 23:53 ` Dave Chinner 2016-07-08 11:33 ` Jeff Layton 2016-07-08 12:48 ` Seth Forshee 2016-07-08 12:22 ` Michal Hocko 2 siblings, 2 replies; 19+ messages in thread From: Dave Chinner @ 2016-07-07 23:53 UTC (permalink / raw) To: Jeff Layton Cc: Seth Forshee, Trond Myklebust, Anna Schumaker, linux-fsdevel, linux-nfs, linux-kernel, Tycho Andersen On Wed, Jul 06, 2016 at 06:07:18PM -0400, Jeff Layton wrote: > On Wed, 2016-07-06 at 12:46 -0500, Seth Forshee wrote: > > We're seeing a hang when freezing a container with an nfs bind mount while > > running iozone. Two iozone processes were hung with this stack trace. > > > > �[] schedule+0x35/0x80 > > �[] schedule_preempt_disabled+0xe/0x10 > > �[] __mutex_lock_slowpath+0xb9/0x130 > > �[] mutex_lock+0x1f/0x30 > > �[] do_unlinkat+0x12b/0x2d0 > > �[] SyS_unlink+0x16/0x20 > > �[] entry_SYSCALL_64_fastpath+0x16/0x71 > > > > This seems to be due to another iozone thread frozen during unlink with > > this stack trace: > > > > �[] __refrigerator+0x7a/0x140 > > �[] nfs4_handle_exception+0x118/0x130 [nfsv4] > > �[] nfs4_proc_remove+0x7d/0xf0 [nfsv4] > > �[] nfs_unlink+0x149/0x350 [nfs] > > �[] vfs_unlink+0xf1/0x1a0 > > �[] do_unlinkat+0x279/0x2d0 > > �[] SyS_unlink+0x16/0x20 > > �[] entry_SYSCALL_64_fastpath+0x16/0x71 > > > > Since nfs is allowing the thread to be frozen with the inode locked it's > > preventing other threads trying to lock the same inode from freezing. It > > seems like a bad idea for nfs to be doing this. > > > > Yeah, known problem. Not a simple one to fix though. Actually, it is simple to fix. <insert broken record about suspend should be using freeze_super(), not sys_sync(), to suspend filesystem operations> i.e. the VFS blocks new operations from starting, and then then the NFS client simply needs to implement ->freeze_fs to drain all it's active operations before returning. Problem solved. > > Can nfs do something different here to prevent this? Maybe use a > > non-freezable sleep and let the operation complete, or else abort the > > operation and return ERESTARTSYS? > > The problem with letting the op complete is that often by the time you > get to the point of trying to freeze processes, the network interfaces > are already shut down. So the operation you're waiting on might never > complete. Stuff like suspend operations on your laptop fail, leading to > fun bug reports like: "Oh, my laptop burned to crisp inside my bag > because the suspend never completed." Yup, precisely the sort of problems we've had over the past 10 years with XFS because we do lots of stuff aynchronously in the background (just like NFS) and hence sys_sync() isn't sufficient to quiesce a filesystem's operations. But I'm used to being ignored on this topic (for almost 10 years, now!). Indeed, it's been made clear in the past that I know absolutely nothing about what is needed to be done to safely suspend filesystem operations... :/ Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Hang due to nfs letting tasks freeze with locked inodes 2016-07-07 23:53 ` Dave Chinner @ 2016-07-08 11:33 ` Jeff Layton 2016-07-08 12:48 ` Seth Forshee 1 sibling, 0 replies; 19+ messages in thread From: Jeff Layton @ 2016-07-08 11:33 UTC (permalink / raw) To: Dave Chinner Cc: Seth Forshee, Trond Myklebust, Anna Schumaker, linux-fsdevel, linux-nfs, linux-kernel, Tycho Andersen On Fri, 2016-07-08 at 09:53 +1000, Dave Chinner wrote: > On Wed, Jul 06, 2016 at 06:07:18PM -0400, Jeff Layton wrote: > > On Wed, 2016-07-06 at 12:46 -0500, Seth Forshee wrote: > > > We're seeing a hang when freezing a container with an nfs bind > > > mount while > > > running iozone. Two iozone processes were hung with this stack > > > trace. > > > > > > [] schedule+0x35/0x80 > > > [] schedule_preempt_disabled+0xe/0x10 > > > [] __mutex_lock_slowpath+0xb9/0x130 > > > [] mutex_lock+0x1f/0x30 > > > [] do_unlinkat+0x12b/0x2d0 > > > [] SyS_unlink+0x16/0x20 > > > [] entry_SYSCALL_64_fastpath+0x16/0x71 > > > > > > This seems to be due to another iozone thread frozen during > > > unlink with > > > this stack trace: > > > > > > [] __refrigerator+0x7a/0x140 > > > [] nfs4_handle_exception+0x118/0x130 [nfsv4] > > > [] nfs4_proc_remove+0x7d/0xf0 [nfsv4] > > > [] nfs_unlink+0x149/0x350 [nfs] > > > [] vfs_unlink+0xf1/0x1a0 > > > [] do_unlinkat+0x279/0x2d0 > > > [] SyS_unlink+0x16/0x20 > > > [] entry_SYSCALL_64_fastpath+0x16/0x71 > > > > > > Since nfs is allowing the thread to be frozen with the inode > > > locked it's > > > preventing other threads trying to lock the same inode from > > > freezing. It > > > seems like a bad idea for nfs to be doing this. > > > > > > > Yeah, known problem. Not a simple one to fix though. > > Actually, it is simple to fix. > > <insert broken record about suspend should be using freeze_super(), > not sys_sync(), to suspend filesystem operations> > > i.e. the VFS blocks new operations from starting, and then then the > NFS client simply needs to implement ->freeze_fs to drain all it's > active operations before returning. Problem solved. > Not a bad idea. In the case of NFS though, I'm not sure we'd actually do anything different than what we're doing though. Part of the problem is that by the time FWIW, we already have CONFIG_SUSPEND_SKIP_SYNC. It might be worth experimenting with a CONFIG_SUSPEND_FREEZE_FS that does what you suggest? > > > Can nfs do something different here to prevent this? Maybe use a > > > non-freezable sleep and let the operation complete, or else abort > > > the > > > operation and return ERESTARTSYS? > > > > The problem with letting the op complete is that often by the time > > you > > get to the point of trying to freeze processes, the network > > interfaces > > are already shut down. So the operation you're waiting on might > > never > > complete. Stuff like suspend operations on your laptop fail, > > leading to > > fun bug reports like: "Oh, my laptop burned to crisp inside my bag > > because the suspend never completed." > > Yup, precisely the sort of problems we've had over the past 10 years > with XFS because we do lots of stuff aynchronously in the background > (just like NFS) and hence sys_sync() isn't sufficient to quiesce a > filesystem's operations. > Yeah, adding a freeze_fs operation for NFS (and using that during suspend) sounds reasonable at first blush. I can probably trawl the archives to better understand, but what are the arguments against doing that? Is it just that freeze_fs is relatively new and the suspend/resume subsystems haven't caught up? > But I'm used to being ignored on this topic (for almost 10 years, > now!). Indeed, it's been made clear in the past that I know > absolutely nothing about what is needed to be done to safely > suspend filesystem operations... :/ > > Cheers, > > Dave. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Hang due to nfs letting tasks freeze with locked inodes 2016-07-07 23:53 ` Dave Chinner 2016-07-08 11:33 ` Jeff Layton @ 2016-07-08 12:48 ` Seth Forshee 2016-07-08 12:55 ` Trond Myklebust 1 sibling, 1 reply; 19+ messages in thread From: Seth Forshee @ 2016-07-08 12:48 UTC (permalink / raw) To: Dave Chinner Cc: Jeff Layton, Trond Myklebust, Anna Schumaker, linux-fsdevel, linux-nfs, linux-kernel, Tycho Andersen On Fri, Jul 08, 2016 at 09:53:30AM +1000, Dave Chinner wrote: > On Wed, Jul 06, 2016 at 06:07:18PM -0400, Jeff Layton wrote: > > On Wed, 2016-07-06 at 12:46 -0500, Seth Forshee wrote: > > > We're seeing a hang when freezing a container with an nfs bind mount while > > > running iozone. Two iozone processes were hung with this stack trace. > > > > > > [] schedule+0x35/0x80 > > > [] schedule_preempt_disabled+0xe/0x10 > > > [] __mutex_lock_slowpath+0xb9/0x130 > > > [] mutex_lock+0x1f/0x30 > > > [] do_unlinkat+0x12b/0x2d0 > > > [] SyS_unlink+0x16/0x20 > > > [] entry_SYSCALL_64_fastpath+0x16/0x71 > > > > > > This seems to be due to another iozone thread frozen during unlink with > > > this stack trace: > > > > > > [] __refrigerator+0x7a/0x140 > > > [] nfs4_handle_exception+0x118/0x130 [nfsv4] > > > [] nfs4_proc_remove+0x7d/0xf0 [nfsv4] > > > [] nfs_unlink+0x149/0x350 [nfs] > > > [] vfs_unlink+0xf1/0x1a0 > > > [] do_unlinkat+0x279/0x2d0 > > > [] SyS_unlink+0x16/0x20 > > > [] entry_SYSCALL_64_fastpath+0x16/0x71 > > > > > > Since nfs is allowing the thread to be frozen with the inode locked it's > > > preventing other threads trying to lock the same inode from freezing. It > > > seems like a bad idea for nfs to be doing this. > > > > > > > Yeah, known problem. Not a simple one to fix though. > > Actually, it is simple to fix. > > <insert broken record about suspend should be using freeze_super(), > not sys_sync(), to suspend filesystem operations> > > i.e. the VFS blocks new operations from starting, and then then the > NFS client simply needs to implement ->freeze_fs to drain all it's > active operations before returning. Problem solved. No, this won't solve my problem. We're not doing a full suspend, rather using a freezer cgroup to freeze a subset of processes. We don't want to want to fully freeze the filesystem. Thanks, Seth ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Hang due to nfs letting tasks freeze with locked inodes 2016-07-08 12:48 ` Seth Forshee @ 2016-07-08 12:55 ` Trond Myklebust 2016-07-08 13:05 ` Trond Myklebust 0 siblings, 1 reply; 19+ messages in thread From: Trond Myklebust @ 2016-07-08 12:55 UTC (permalink / raw) To: Seth Forshee Cc: Chinner Dave, Jeff Layton, Schumaker Anna, linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org, Tycho Andersen > On Jul 8, 2016, at 08:48, Seth Forshee <seth.forshee@canonical.com> wrote: > > On Fri, Jul 08, 2016 at 09:53:30AM +1000, Dave Chinner wrote: >> On Wed, Jul 06, 2016 at 06:07:18PM -0400, Jeff Layton wrote: >>> On Wed, 2016-07-06 at 12:46 -0500, Seth Forshee wrote: >>>> We're seeing a hang when freezing a container with an nfs bind mount while >>>> running iozone. Two iozone processes were hung with this stack trace. >>>> >>>> [] schedule+0x35/0x80 >>>> [] schedule_preempt_disabled+0xe/0x10 >>>> [] __mutex_lock_slowpath+0xb9/0x130 >>>> [] mutex_lock+0x1f/0x30 >>>> [] do_unlinkat+0x12b/0x2d0 >>>> [] SyS_unlink+0x16/0x20 >>>> [] entry_SYSCALL_64_fastpath+0x16/0x71 >>>> >>>> This seems to be due to another iozone thread frozen during unlink with >>>> this stack trace: >>>> >>>> [] __refrigerator+0x7a/0x140 >>>> [] nfs4_handle_exception+0x118/0x130 [nfsv4] >>>> [] nfs4_proc_remove+0x7d/0xf0 [nfsv4] >>>> [] nfs_unlink+0x149/0x350 [nfs] >>>> [] vfs_unlink+0xf1/0x1a0 >>>> [] do_unlinkat+0x279/0x2d0 >>>> [] SyS_unlink+0x16/0x20 >>>> [] entry_SYSCALL_64_fastpath+0x16/0x71 >>>> >>>> Since nfs is allowing the thread to be frozen with the inode locked it's >>>> preventing other threads trying to lock the same inode from freezing. It >>>> seems like a bad idea for nfs to be doing this. >>>> >>> >>> Yeah, known problem. Not a simple one to fix though. >> >> Actually, it is simple to fix. >> >> <insert broken record about suspend should be using freeze_super(), >> not sys_sync(), to suspend filesystem operations> >> >> i.e. the VFS blocks new operations from starting, and then then the >> NFS client simply needs to implement ->freeze_fs to drain all it's >> active operations before returning. Problem solved. > > No, this won't solve my problem. We're not doing a full suspend, rather > using a freezer cgroup to freeze a subset of processes. We don't want to > want to fully freeze the filesystem. …and therein lies the rub. The whole cgroup freezer stuff assumes that you can safely deactivate a bunch of processes that may or may not hold state in the filesystem. That’s definitely not OK when you hold locks etc that can affect processes that lies outside the cgroup (and/or outside the NFS client itself). Cheers Trond ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Hang due to nfs letting tasks freeze with locked inodes 2016-07-08 12:55 ` Trond Myklebust @ 2016-07-08 13:05 ` Trond Myklebust 2016-07-11 1:20 ` Dave Chinner 0 siblings, 1 reply; 19+ messages in thread From: Trond Myklebust @ 2016-07-08 13:05 UTC (permalink / raw) To: Seth Forshee Cc: Chinner Dave, Jeff Layton, Schumaker Anna, linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org, Tycho Andersen > On Jul 8, 2016, at 08:55, Trond Myklebust <trondmy@primarydata.com> wrote: > > >> On Jul 8, 2016, at 08:48, Seth Forshee <seth.forshee@canonical.com> wrote: >> >> On Fri, Jul 08, 2016 at 09:53:30AM +1000, Dave Chinner wrote: >>> On Wed, Jul 06, 2016 at 06:07:18PM -0400, Jeff Layton wrote: >>>> On Wed, 2016-07-06 at 12:46 -0500, Seth Forshee wrote: >>>>> We're seeing a hang when freezing a container with an nfs bind mount while >>>>> running iozone. Two iozone processes were hung with this stack trace. >>>>> >>>>> [] schedule+0x35/0x80 >>>>> [] schedule_preempt_disabled+0xe/0x10 >>>>> [] __mutex_lock_slowpath+0xb9/0x130 >>>>> [] mutex_lock+0x1f/0x30 >>>>> [] do_unlinkat+0x12b/0x2d0 >>>>> [] SyS_unlink+0x16/0x20 >>>>> [] entry_SYSCALL_64_fastpath+0x16/0x71 >>>>> >>>>> This seems to be due to another iozone thread frozen during unlink with >>>>> this stack trace: >>>>> >>>>> [] __refrigerator+0x7a/0x140 >>>>> [] nfs4_handle_exception+0x118/0x130 [nfsv4] >>>>> [] nfs4_proc_remove+0x7d/0xf0 [nfsv4] >>>>> [] nfs_unlink+0x149/0x350 [nfs] >>>>> [] vfs_unlink+0xf1/0x1a0 >>>>> [] do_unlinkat+0x279/0x2d0 >>>>> [] SyS_unlink+0x16/0x20 >>>>> [] entry_SYSCALL_64_fastpath+0x16/0x71 >>>>> >>>>> Since nfs is allowing the thread to be frozen with the inode locked it's >>>>> preventing other threads trying to lock the same inode from freezing. It >>>>> seems like a bad idea for nfs to be doing this. >>>>> >>>> >>>> Yeah, known problem. Not a simple one to fix though. >>> >>> Actually, it is simple to fix. >>> >>> <insert broken record about suspend should be using freeze_super(), >>> not sys_sync(), to suspend filesystem operations> >>> >>> i.e. the VFS blocks new operations from starting, and then then the >>> NFS client simply needs to implement ->freeze_fs to drain all it's >>> active operations before returning. Problem solved. >> >> No, this won't solve my problem. We're not doing a full suspend, rather >> using a freezer cgroup to freeze a subset of processes. We don't want to >> want to fully freeze the filesystem. > > …and therein lies the rub. The whole cgroup freezer stuff assumes that you can safely deactivate a bunch of processes that may or may not hold state in the filesystem. That’s definitely not OK when you hold locks etc that can affect processes that lies outside the cgroup (and/or outside the NFS client itself). > In case it wasn’t clear, I’m not just talking about VFS mutexes here. I’m also talking about all the other stuff, a lot of which the kernel has no control over, including POSIX file locking, share locks, leases/delegations, etc. Trond ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Hang due to nfs letting tasks freeze with locked inodes 2016-07-08 13:05 ` Trond Myklebust @ 2016-07-11 1:20 ` Dave Chinner 0 siblings, 0 replies; 19+ messages in thread From: Dave Chinner @ 2016-07-11 1:20 UTC (permalink / raw) To: Trond Myklebust Cc: Seth Forshee, Jeff Layton, Schumaker Anna, linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org, Tycho Andersen On Fri, Jul 08, 2016 at 01:05:40PM +0000, Trond Myklebust wrote: > > On Jul 8, 2016, at 08:55, Trond Myklebust > > <trondmy@primarydata.com> wrote: > >> On Jul 8, 2016, at 08:48, Seth Forshee > >> <seth.forshee@canonical.com> wrote: On Fri, Jul 08, 2016 at > >> 09:53:30AM +1000, Dave Chinner wrote: > >>> On Wed, Jul 06, 2016 at 06:07:18PM -0400, Jeff Layton wrote: > >>>> On Wed, 2016-07-06 at 12:46 -0500, Seth Forshee wrote: > >>>>> We're seeing a hang when freezing a container with an nfs > >>>>> bind mount while running iozone. Two iozone processes were > >>>>> hung with this stack trace. > >>>>> > >>>>> [] schedule+0x35/0x80 [] schedule_preempt_disabled+0xe/0x10 > >>>>> [] __mutex_lock_slowpath+0xb9/0x130 [] mutex_lock+0x1f/0x30 > >>>>> [] do_unlinkat+0x12b/0x2d0 [] SyS_unlink+0x16/0x20 [] > >>>>> entry_SYSCALL_64_fastpath+0x16/0x71 > >>>>> > >>>>> This seems to be due to another iozone thread frozen during > >>>>> unlink with this stack trace: > >>>>> > >>>>> [] __refrigerator+0x7a/0x140 [] > >>>>> nfs4_handle_exception+0x118/0x130 [nfsv4] [] > >>>>> nfs4_proc_remove+0x7d/0xf0 [nfsv4] [] nfs_unlink+0x149/0x350 > >>>>> [nfs] [] vfs_unlink+0xf1/0x1a0 [] do_unlinkat+0x279/0x2d0 [] > >>>>> SyS_unlink+0x16/0x20 [] entry_SYSCALL_64_fastpath+0x16/0x71 > >>>>> > >>>>> Since nfs is allowing the thread to be frozen with the inode > >>>>> locked it's preventing other threads trying to lock the same > >>>>> inode from freezing. It seems like a bad idea for nfs to be > >>>>> doing this. > >>>>> > >>>> > >>>> Yeah, known problem. Not a simple one to fix though. > >>> > >>> Actually, it is simple to fix. > >>> > >>> <insert broken record about suspend should be using > >>> freeze_super(), not sys_sync(), to suspend filesystem > >>> operations> > >>> > >>> i.e. the VFS blocks new operations from starting, and then > >>> then the NFS client simply needs to implement ->freeze_fs to > >>> drain all it's active operations before returning. Problem > >>> solved. > >> > >> No, this won't solve my problem. We're not doing a full > >> suspend, rather using a freezer cgroup to freeze a subset of > >> processes. We don't want to want to fully freeze the > >> filesystem. > > > > …and therein lies the rub. The whole cgroup freezer stuff > > assumes that you can safely deactivate a bunch of processes that > > may or may not hold state in the filesystem. That’s > > definitely not OK when you hold locks etc that can affect > > processes that lies outside the cgroup (and/or outside the NFS > > client itself). Not just locks, but even just reference counts are bad. e.g. just being suspended with an active write reference to the superblock will cause the next filesystem freeze to hang waiting for that reference to drain. In essence, that's a filesystem-wide DOS vector for anyone using snapshots.... > In case it wasn’t clear, I’m not just talking about VFS > mutexes here. I’m also talking about all the other stuff, a > lot of which the kernel has no control over, including POSIX file > locking, share locks, leases/delegations, etc. Yeah, freezer base process-granularity suspend just seems like a bad idea to me... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Hang due to nfs letting tasks freeze with locked inodes 2016-07-06 22:07 ` Jeff Layton 2016-07-07 3:55 ` Seth Forshee 2016-07-07 23:53 ` Dave Chinner @ 2016-07-08 12:22 ` Michal Hocko 2016-07-08 12:47 ` Seth Forshee 2016-07-08 12:51 ` Jeff Layton 2 siblings, 2 replies; 19+ messages in thread From: Michal Hocko @ 2016-07-08 12:22 UTC (permalink / raw) To: Jeff Layton Cc: Seth Forshee, Trond Myklebust, Anna Schumaker, linux-fsdevel, linux-nfs, linux-kernel, Tycho Andersen On Wed 06-07-16 18:07:18, Jeff Layton wrote: > On Wed, 2016-07-06 at 12:46 -0500, Seth Forshee wrote: > > We're seeing a hang when freezing a container with an nfs bind mount while > > running iozone. Two iozone processes were hung with this stack trace. > > > > �[] schedule+0x35/0x80 > > �[] schedule_preempt_disabled+0xe/0x10 > > �[] __mutex_lock_slowpath+0xb9/0x130 > > �[] mutex_lock+0x1f/0x30 > > �[] do_unlinkat+0x12b/0x2d0 > > �[] SyS_unlink+0x16/0x20 > > �[] entry_SYSCALL_64_fastpath+0x16/0x71 > > > > This seems to be due to another iozone thread frozen during unlink with > > this stack trace: > > > > �[] __refrigerator+0x7a/0x140 > > �[] nfs4_handle_exception+0x118/0x130 [nfsv4] > > �[] nfs4_proc_remove+0x7d/0xf0 [nfsv4] > > �[] nfs_unlink+0x149/0x350 [nfs] > > �[] vfs_unlink+0xf1/0x1a0 > > �[] do_unlinkat+0x279/0x2d0 > > �[] SyS_unlink+0x16/0x20 > > �[] entry_SYSCALL_64_fastpath+0x16/0x71 > > > > Since nfs is allowing the thread to be frozen with the inode locked it's > > preventing other threads trying to lock the same inode from freezing. It > > seems like a bad idea for nfs to be doing this. > > > > Yeah, known problem. Not a simple one to fix though. Apart from alternative Dave was mentioning in other email, what is the point to use freezable wait from this path in the first place? nfs4_handle_exception does nfs4_wait_clnt_recover from the same path and that does wait_on_bit_action with TASK_KILLABLE so we are waiting in two different modes from the same path AFAICS. There do not seem to be other callers of nfs4_delay outside of nfs4_handle_exception. Sounds like something is not quite right here to me. If the nfs4_delay did regular wait then the freezing would fail as well but at least it would be clear who is the culrprit rather than having an indirect dependency. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Hang due to nfs letting tasks freeze with locked inodes 2016-07-08 12:22 ` Michal Hocko @ 2016-07-08 12:47 ` Seth Forshee 2016-07-08 12:51 ` Jeff Layton 1 sibling, 0 replies; 19+ messages in thread From: Seth Forshee @ 2016-07-08 12:47 UTC (permalink / raw) To: Michal Hocko Cc: Jeff Layton, Trond Myklebust, Anna Schumaker, linux-fsdevel, linux-nfs, linux-kernel, Tycho Andersen On Fri, Jul 08, 2016 at 02:22:24PM +0200, Michal Hocko wrote: > On Wed 06-07-16 18:07:18, Jeff Layton wrote: > > On Wed, 2016-07-06 at 12:46 -0500, Seth Forshee wrote: > > > We're seeing a hang when freezing a container with an nfs bind mount while > > > running iozone. Two iozone processes were hung with this stack trace. > > > > > > [] schedule+0x35/0x80 > > > [] schedule_preempt_disabled+0xe/0x10 > > > [] __mutex_lock_slowpath+0xb9/0x130 > > > [] mutex_lock+0x1f/0x30 > > > [] do_unlinkat+0x12b/0x2d0 > > > [] SyS_unlink+0x16/0x20 > > > [] entry_SYSCALL_64_fastpath+0x16/0x71 > > > > > > This seems to be due to another iozone thread frozen during unlink with > > > this stack trace: > > > > > > [] __refrigerator+0x7a/0x140 > > > [] nfs4_handle_exception+0x118/0x130 [nfsv4] > > > [] nfs4_proc_remove+0x7d/0xf0 [nfsv4] > > > [] nfs_unlink+0x149/0x350 [nfs] > > > [] vfs_unlink+0xf1/0x1a0 > > > [] do_unlinkat+0x279/0x2d0 > > > [] SyS_unlink+0x16/0x20 > > > [] entry_SYSCALL_64_fastpath+0x16/0x71 > > > > > > Since nfs is allowing the thread to be frozen with the inode locked it's > > > preventing other threads trying to lock the same inode from freezing. It > > > seems like a bad idea for nfs to be doing this. > > > > > > > Yeah, known problem. Not a simple one to fix though. > > Apart from alternative Dave was mentioning in other email, what is the > point to use freezable wait from this path in the first place? > > nfs4_handle_exception does nfs4_wait_clnt_recover from the same path and > that does wait_on_bit_action with TASK_KILLABLE so we are waiting in two > different modes from the same path AFAICS. There do not seem to be other > callers of nfs4_delay outside of nfs4_handle_exception. Sounds like > something is not quite right here to me. If the nfs4_delay did regular > wait then the freezing would fail as well but at least it would be clear > who is the culrprit rather than having an indirect dependency. It turns out there are more paths than this one doing a freezable wait, and they're all also killable. This leads me to a slightly different question than yours, why nfs can give up waiting in the case of a signal but not when the task is frozen. I know the changes below aren't "correct," but I've been experimenting with them anyway to see what would happen. So far things seem to be fine, and the deadlock is gone. That should give you an idea of all the places I found using a freezable wait. Seth diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c index f714b98..62dbe59 100644 --- a/fs/nfs/inode.c +++ b/fs/nfs/inode.c @@ -77,8 +77,8 @@ nfs_fattr_to_ino_t(struct nfs_fattr *fattr) */ int nfs_wait_bit_killable(struct wait_bit_key *key, int mode) { - freezable_schedule_unsafe(); - if (signal_pending_state(mode, current)) + schedule(); + if (signal_pending_state(mode, current) || freezing(current)) return -ERESTARTSYS; return 0; } diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c index cb28cce..2315183 100644 --- a/fs/nfs/nfs3proc.c +++ b/fs/nfs/nfs3proc.c @@ -35,9 +35,9 @@ nfs3_rpc_wrapper(struct rpc_clnt *clnt, struct rpc_message *msg, int flags) res = rpc_call_sync(clnt, msg, flags); if (res != -EJUKEBOX) break; - freezable_schedule_timeout_killable_unsafe(NFS_JUKEBOX_RETRY_TIME); + schedule_timeout_killable(NFS_JUKEBOX_RETRY_TIME); res = -ERESTARTSYS; - } while (!fatal_signal_pending(current)); + } while (!fatal_signal_pending(current) && !freezing(current)); return res; } diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 98a4415..0dad2fb 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -334,9 +334,8 @@ static int nfs4_delay(struct rpc_clnt *clnt, long *timeout) might_sleep(); - freezable_schedule_timeout_killable_unsafe( - nfs4_update_delay(timeout)); - if (fatal_signal_pending(current)) + schedule_timeout_killable(nfs4_update_delay(timeout)); + if (fatal_signal_pending(current) || freezing(current)) res = -ERESTARTSYS; return res; } @@ -5447,7 +5446,7 @@ int nfs4_proc_delegreturn(struct inode *inode, struct rpc_cred *cred, const nfs4 static unsigned long nfs4_set_lock_task_retry(unsigned long timeout) { - freezable_schedule_timeout_killable_unsafe(timeout); + schedule_timeout_killable(timeout); timeout <<= 1; if (timeout > NFS4_LOCK_MAXTIMEOUT) return NFS4_LOCK_MAXTIMEOUT; @@ -6148,7 +6147,7 @@ nfs4_proc_lock(struct file *filp, int cmd, struct file_lock *request) break; timeout = nfs4_set_lock_task_retry(timeout); status = -ERESTARTSYS; - if (signalled()) + if (signalled() || freezing(current)) break; } while(status < 0); return status; diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c index 73ad57a..0218dc2 100644 --- a/net/sunrpc/sched.c +++ b/net/sunrpc/sched.c @@ -252,8 +252,8 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue); static int rpc_wait_bit_killable(struct wait_bit_key *key, int mode) { - freezable_schedule_unsafe(); - if (signal_pending_state(mode, current)) + schedule(); + if (signal_pending_state(mode, current) || freezing(current)) return -ERESTARTSYS; return 0; } ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: Hang due to nfs letting tasks freeze with locked inodes 2016-07-08 12:22 ` Michal Hocko 2016-07-08 12:47 ` Seth Forshee @ 2016-07-08 12:51 ` Jeff Layton 2016-07-08 14:23 ` Michal Hocko 1 sibling, 1 reply; 19+ messages in thread From: Jeff Layton @ 2016-07-08 12:51 UTC (permalink / raw) To: Michal Hocko Cc: Seth Forshee, Trond Myklebust, Anna Schumaker, linux-fsdevel, linux-nfs, linux-kernel, Tycho Andersen On Fri, 2016-07-08 at 14:22 +0200, Michal Hocko wrote: > On Wed 06-07-16 18:07:18, Jeff Layton wrote: > > > > On Wed, 2016-07-06 at 12:46 -0500, Seth Forshee wrote: > > > > > > We're seeing a hang when freezing a container with an nfs bind mount while > > > running iozone. Two iozone processes were hung with this stack trace. > > > > > > [] schedule+0x35/0x80 > > > [] schedule_preempt_disabled+0xe/0x10 > > > [] __mutex_lock_slowpath+0xb9/0x130 > > > [] mutex_lock+0x1f/0x30 > > > [] do_unlinkat+0x12b/0x2d0 > > > [] SyS_unlink+0x16/0x20 > > > [] entry_SYSCALL_64_fastpath+0x16/0x71 > > > > > > This seems to be due to another iozone thread frozen during unlink with > > > this stack trace: > > > > > > [] __refrigerator+0x7a/0x140 > > > [] nfs4_handle_exception+0x118/0x130 [nfsv4] > > > [] nfs4_proc_remove+0x7d/0xf0 [nfsv4] > > > [] nfs_unlink+0x149/0x350 [nfs] > > > [] vfs_unlink+0xf1/0x1a0 > > > [] do_unlinkat+0x279/0x2d0 > > > [] SyS_unlink+0x16/0x20 > > > [] entry_SYSCALL_64_fastpath+0x16/0x71 > > > > > > Since nfs is allowing the thread to be frozen with the inode locked it's > > > preventing other threads trying to lock the same inode from freezing. It > > > seems like a bad idea for nfs to be doing this. > > > > > Yeah, known problem. Not a simple one to fix though. > Apart from alternative Dave was mentioning in other email, what is the > point to use freezable wait from this path in the first place? > > nfs4_handle_exception does nfs4_wait_clnt_recover from the same path and > that does wait_on_bit_action with TASK_KILLABLE so we are waiting in two > different modes from the same path AFAICS. There do not seem to be other > callers of nfs4_delay outside of nfs4_handle_exception. Sounds like > something is not quite right here to me. If the nfs4_delay did regular > wait then the freezing would fail as well but at least it would be clear > who is the culrprit rather than having an indirect dependency. The codepaths involved there are a lot more complex than that unfortunately. nfs4_delay is the function that we use to handle the case where the server returns NFS4ERR_DELAY. Basically telling us that it's too busy right now or has some transient error and the client should retry after a small, sliding delay. That codepath could probably be made more freezer-safe. The typical case however, is that we've sent a call and just haven't gotten a reply. That's the trickier one to handle. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Hang due to nfs letting tasks freeze with locked inodes 2016-07-08 12:51 ` Jeff Layton @ 2016-07-08 14:23 ` Michal Hocko 2016-07-08 14:27 ` Jeff Layton 0 siblings, 1 reply; 19+ messages in thread From: Michal Hocko @ 2016-07-08 14:23 UTC (permalink / raw) To: Jeff Layton Cc: Seth Forshee, Trond Myklebust, Anna Schumaker, linux-fsdevel, linux-nfs, linux-kernel, Tycho Andersen On Fri 08-07-16 08:51:54, Jeff Layton wrote: > On Fri, 2016-07-08 at 14:22 +0200, Michal Hocko wrote: [...] > > Apart from alternative Dave was mentioning in other email, what is the > > point to use freezable wait from this path in the first place? > > > > nfs4_handle_exception does nfs4_wait_clnt_recover from the same path and > > that does wait_on_bit_action with TASK_KILLABLE so we are waiting in two > > different modes from the same path AFAICS. There do not seem to be other > > callers of nfs4_delay outside of nfs4_handle_exception. Sounds like > > something is not quite right here to me. If the nfs4_delay did regular > > wait then the freezing would fail as well but at least it would be clear > > who is the culrprit rather than having an indirect dependency. > > The codepaths involved there are a lot more complex than that > unfortunately. > > nfs4_delay is the function that we use to handle the case where the > server returns NFS4ERR_DELAY. Basically telling us that it's too busy > right now or has some transient error and the client should retry after > a small, sliding delay. > > That codepath could probably be made more freezer-safe. The typical > case however, is that we've sent a call and just haven't gotten a > reply. That's the trickier one to handle. Why using a regular non-freezable wait would be a problem? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Hang due to nfs letting tasks freeze with locked inodes 2016-07-08 14:23 ` Michal Hocko @ 2016-07-08 14:27 ` Jeff Layton 2016-07-11 7:23 ` Michal Hocko 0 siblings, 1 reply; 19+ messages in thread From: Jeff Layton @ 2016-07-08 14:27 UTC (permalink / raw) To: Michal Hocko Cc: Seth Forshee, Trond Myklebust, Anna Schumaker, linux-fsdevel, linux-nfs, linux-kernel, Tycho Andersen On Fri, 2016-07-08 at 16:23 +0200, Michal Hocko wrote: > On Fri 08-07-16 08:51:54, Jeff Layton wrote: > > > > On Fri, 2016-07-08 at 14:22 +0200, Michal Hocko wrote: > [...] > > > > > > > > Apart from alternative Dave was mentioning in other email, what > > > is the > > > point to use freezable wait from this path in the first place? > > > > > > nfs4_handle_exception does nfs4_wait_clnt_recover from the same > > > path and > > > that does wait_on_bit_action with TASK_KILLABLE so we are waiting > > > in two > > > different modes from the same path AFAICS. There do not seem to > > > be other > > > callers of nfs4_delay outside of nfs4_handle_exception. Sounds > > > like > > > something is not quite right here to me. If the nfs4_delay did > > > regular > > > wait then the freezing would fail as well but at least it would > > > be clear > > > who is the culrprit rather than having an indirect dependency. > > The codepaths involved there are a lot more complex than that > > unfortunately. > > > > nfs4_delay is the function that we use to handle the case where the > > server returns NFS4ERR_DELAY. Basically telling us that it's too > > busy > > right now or has some transient error and the client should retry > > after > > a small, sliding delay. > > > > That codepath could probably be made more freezer-safe. The typical > > case however, is that we've sent a call and just haven't gotten a > > reply. That's the trickier one to handle. > Why using a regular non-freezable wait would be a problem? It has been a while since I looked at that code, but IIRC, that could block the freezer for up to 15s, which is a significant portion of the 20s that you get before the freezer gives up. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Hang due to nfs letting tasks freeze with locked inodes 2016-07-08 14:27 ` Jeff Layton @ 2016-07-11 7:23 ` Michal Hocko 2016-07-11 11:03 ` Jeff Layton 0 siblings, 1 reply; 19+ messages in thread From: Michal Hocko @ 2016-07-11 7:23 UTC (permalink / raw) To: Jeff Layton Cc: Seth Forshee, Trond Myklebust, Anna Schumaker, linux-fsdevel, linux-nfs, linux-kernel, Tycho Andersen On Fri 08-07-16 10:27:38, Jeff Layton wrote: > On Fri, 2016-07-08 at 16:23 +0200, Michal Hocko wrote: > > On Fri 08-07-16 08:51:54, Jeff Layton wrote: > > > > > > On Fri, 2016-07-08 at 14:22 +0200, Michal Hocko wrote: > > [...] > > > > > > > > > > > Apart from alternative Dave was mentioning in other email, what > > > > is the > > > > point to use freezable wait from this path in the first place? > > > > > > > > nfs4_handle_exception does nfs4_wait_clnt_recover from the same > > > > path and > > > > that does wait_on_bit_action with TASK_KILLABLE so we are waiting > > > > in two > > > > different modes from the same path AFAICS. There do not seem to > > > > be other > > > > callers of nfs4_delay outside of nfs4_handle_exception. Sounds > > > > like > > > > something is not quite right here to me. If the nfs4_delay did > > > > regular > > > > wait then the freezing would fail as well but at least it would > > > > be clear > > > > who is the culrprit rather than having an indirect dependency. > > > The codepaths involved there are a lot more complex than that > > > unfortunately. > > > > > > nfs4_delay is the function that we use to handle the case where the > > > server returns NFS4ERR_DELAY. Basically telling us that it's too > > > busy > > > right now or has some transient error and the client should retry > > > after > > > a small, sliding delay. > > > > > > That codepath could probably be made more freezer-safe. The typical > > > case however, is that we've sent a call and just haven't gotten a > > > reply. That's the trickier one to handle. > > Why using a regular non-freezable wait would be a problem? > > It has been a while since I looked at that code, but IIRC, that could > block the freezer for up to 15s, which is a significant portion of the > 20s that you get before the freezer gives up. But how does that differ from the situation when the freezer has to give up on the timeout because another task fails due to lock dependency. As Trond and Dave have written in other emails. It is really danngerous to freeze a task while it is holding locks and other resources. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Hang due to nfs letting tasks freeze with locked inodes 2016-07-11 7:23 ` Michal Hocko @ 2016-07-11 11:03 ` Jeff Layton 2016-07-11 11:43 ` Michal Hocko 2016-07-11 12:50 ` Seth Forshee 0 siblings, 2 replies; 19+ messages in thread From: Jeff Layton @ 2016-07-11 11:03 UTC (permalink / raw) To: Michal Hocko Cc: Seth Forshee, Trond Myklebust, Anna Schumaker, linux-fsdevel, linux-nfs, linux-kernel, Tycho Andersen On Mon, 2016-07-11 at 09:23 +0200, Michal Hocko wrote: > On Fri 08-07-16 10:27:38, Jeff Layton wrote: > > On Fri, 2016-07-08 at 16:23 +0200, Michal Hocko wrote: > > > On Fri 08-07-16 08:51:54, Jeff Layton wrote: > > > > > > > > On Fri, 2016-07-08 at 14:22 +0200, Michal Hocko wrote: > > > [...] > > > > > > > > > > > > > > Apart from alternative Dave was mentioning in other email, what > > > > > is the > > > > > point to use freezable wait from this path in the first place? > > > > > > > > > > nfs4_handle_exception does nfs4_wait_clnt_recover from the same > > > > > path and > > > > > that does wait_on_bit_action with TASK_KILLABLE so we are waiting > > > > > in two > > > > > different modes from the same path AFAICS. There do not seem to > > > > > be other > > > > > callers of nfs4_delay outside of nfs4_handle_exception. Sounds > > > > > like > > > > > something is not quite right here to me. If the nfs4_delay did > > > > > regular > > > > > wait then the freezing would fail as well but at least it would > > > > > be clear > > > > > who is the culrprit rather than having an indirect dependency. > > > > The codepaths involved there are a lot more complex than that > > > > unfortunately. > > > > > > > > nfs4_delay is the function that we use to handle the case where the > > > > server returns NFS4ERR_DELAY. Basically telling us that it's too > > > > busy > > > > right now or has some transient error and the client should retry > > > > after > > > > a small, sliding delay. > > > > > > > > That codepath could probably be made more freezer-safe. The typical > > > > case however, is that we've sent a call and just haven't gotten a > > > > reply. That's the trickier one to handle. > > > Why using a regular non-freezable wait would be a problem? > > > > It has been a while since I looked at that code, but IIRC, that could > > block the freezer for up to 15s, which is a significant portion of the > > 20s that you get before the freezer gives up. > > But how does that differ from the situation when the freezer has to give > up on the timeout because another task fails due to lock dependency. > > As Trond and Dave have written in other emails. It is really danngerous > to freeze a task while it is holding locks and other resources. It's not really dangerous if you're freezing every task on the host. Sure, you're freezing with locks held, but everything else is freezing too, so nothing will be contending for those locks. I'm not at all opposed to changing how all of that works. My only stipulation is that we not break the ability to reliably suspend a host that is actively using an NFS mount. If you can come up with a way to do that that also works for freezing cgroups, then I'm all for it. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Hang due to nfs letting tasks freeze with locked inodes 2016-07-11 11:03 ` Jeff Layton @ 2016-07-11 11:43 ` Michal Hocko 2016-07-11 12:50 ` Seth Forshee 1 sibling, 0 replies; 19+ messages in thread From: Michal Hocko @ 2016-07-11 11:43 UTC (permalink / raw) To: Jeff Layton Cc: Seth Forshee, Trond Myklebust, Anna Schumaker, linux-fsdevel, linux-nfs, linux-kernel, Tycho Andersen On Mon 11-07-16 07:03:31, Jeff Layton wrote: > On Mon, 2016-07-11 at 09:23 +0200, Michal Hocko wrote: > > On Fri 08-07-16 10:27:38, Jeff Layton wrote: > > > On Fri, 2016-07-08 at 16:23 +0200, Michal Hocko wrote: > > > > On Fri 08-07-16 08:51:54, Jeff Layton wrote: > > > > > > > > > > On Fri, 2016-07-08 at 14:22 +0200, Michal Hocko wrote: > > > > [...] > > > > > > > > > > > > > > > > > Apart from alternative Dave was mentioning in other email, what > > > > > > is the > > > > > > point to use freezable wait from this path in the first place? > > > > > > > > > > > > nfs4_handle_exception does nfs4_wait_clnt_recover from the same > > > > > > path and > > > > > > that does wait_on_bit_action with TASK_KILLABLE so we are waiting > > > > > > in two > > > > > > different modes from the same path AFAICS. There do not seem to > > > > > > be other > > > > > > callers of nfs4_delay outside of nfs4_handle_exception. Sounds > > > > > > like > > > > > > something is not quite right here to me. If the nfs4_delay did > > > > > > regular > > > > > > wait then the freezing would fail as well but at least it would > > > > > > be clear > > > > > > who is the culrprit rather than having an indirect dependency. > > > > > The codepaths involved there are a lot more complex than that > > > > > unfortunately. > > > > > > > > > > nfs4_delay is the function that we use to handle the case where the > > > > > server returns NFS4ERR_DELAY. Basically telling us that it's too > > > > > busy > > > > > right now or has some transient error and the client should retry > > > > > after > > > > > a small, sliding delay. > > > > > > > > > > That codepath could probably be made more freezer-safe. The typical > > > > > case however, is that we've sent a call and just haven't gotten a > > > > > reply. That's the trickier one to handle. > > > > Why using a regular non-freezable wait would be a problem? > > > > > > It has been a while since I looked at that code, but IIRC, that could > > > block the freezer for up to 15s, which is a significant portion of the > > > 20s that you get before the freezer gives up. > > > > But how does that differ from the situation when the freezer has to give > > up on the timeout because another task fails due to lock dependency. > > > > As Trond and Dave have written in other emails. It is really danngerous > > to freeze a task while it is holding locks and other resources. > > It's not really dangerous if you're freezing every task on the host. > Sure, you're freezing with locks held, but everything else is freezing > too, so nothing will be contending for those locks. But the very same path is used also for cgroup freezer so you can end up freezing a task while it holds locks which might block tasks from unrelated cgroups, right? > I'm not at all opposed to changing how all of that works. My only > stipulation is that we not break the ability to reliably suspend a host > that is actively using an NFS mount. If you can come up with a way to > do that that also works for freezing cgroups, then I'm all for it. My knowledge of NFS is too limited to help you out here but I guess it would be a good start to stop using unsafe freezer APIs. Or use it only when you are sure you cannot block any resources. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Hang due to nfs letting tasks freeze with locked inodes 2016-07-11 11:03 ` Jeff Layton 2016-07-11 11:43 ` Michal Hocko @ 2016-07-11 12:50 ` Seth Forshee 1 sibling, 0 replies; 19+ messages in thread From: Seth Forshee @ 2016-07-11 12:50 UTC (permalink / raw) To: Jeff Layton Cc: Michal Hocko, Trond Myklebust, Anna Schumaker, linux-fsdevel, linux-nfs, linux-kernel, Tycho Andersen On Mon, Jul 11, 2016 at 07:03:31AM -0400, Jeff Layton wrote: > On Mon, 2016-07-11 at 09:23 +0200, Michal Hocko wrote: > > On Fri 08-07-16 10:27:38, Jeff Layton wrote: > > > On Fri, 2016-07-08 at 16:23 +0200, Michal Hocko wrote: > > > > On Fri 08-07-16 08:51:54, Jeff Layton wrote: > > > > > > > > > > On Fri, 2016-07-08 at 14:22 +0200, Michal Hocko wrote: > > > > [...] > > > > > > > > > > > > > > > > > Apart from alternative Dave was mentioning in other email, what > > > > > > is the > > > > > > point to use freezable wait from this path in the first place? > > > > > > > > > > > > nfs4_handle_exception does nfs4_wait_clnt_recover from the same > > > > > > path and > > > > > > that does wait_on_bit_action with TASK_KILLABLE so we are waiting > > > > > > in two > > > > > > different modes from the same path AFAICS. There do not seem to > > > > > > be other > > > > > > callers of nfs4_delay outside of nfs4_handle_exception. Sounds > > > > > > like > > > > > > something is not quite right here to me. If the nfs4_delay did > > > > > > regular > > > > > > wait then the freezing would fail as well but at least it would > > > > > > be clear > > > > > > who is the culrprit rather than having an indirect dependency. > > > > > The codepaths involved there are a lot more complex than that > > > > > unfortunately. > > > > > > > > > > nfs4_delay is the function that we use to handle the case where the > > > > > server returns NFS4ERR_DELAY. Basically telling us that it's too > > > > > busy > > > > > right now or has some transient error and the client should retry > > > > > after > > > > > a small, sliding delay. > > > > > > > > > > That codepath could probably be made more freezer-safe. The typical > > > > > case however, is that we've sent a call and just haven't gotten a > > > > > reply. That's the trickier one to handle. > > > > Why using a regular non-freezable wait would be a problem? > > > > > > It has been a while since I looked at that code, but IIRC, that could > > > block the freezer for up to 15s, which is a significant portion of the > > > 20s that you get before the freezer gives up. > > > > But how does that differ from the situation when the freezer has to give > > up on the timeout because another task fails due to lock dependency. > > > > As Trond and Dave have written in other emails. It is really danngerous > > to freeze a task while it is holding locks and other resources. > > It's not really dangerous if you're freezing every task on the host. > Sure, you're freezing with locks held, but everything else is freezing > too, so nothing will be contending for those locks. Unless you have tasks either already waiting on those locks or that will attaempt to lock them before calling try_to_freeze. That happens to be the case in this cgroup freezer hang too, all the tasks stuck waiting on the i_mutex are p being frozen. > I'm not at all opposed to changing how all of that works. My only > stipulation is that we not break the ability to reliably suspend a host > that is actively using an NFS mount. If you can come up with a way to > do that that also works for freezing cgroups, then I'm all for it. The deadlock that we've seen should apply equally to suspend and to the cgroup freezer. The only difference is that suspend will eventually time out and abort the suspend whereas the cgroup freezer does not. Seth ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2016-07-11 12:50 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-07-06 17:46 Hang due to nfs letting tasks freeze with locked inodes Seth Forshee 2016-07-06 22:07 ` Jeff Layton 2016-07-07 3:55 ` Seth Forshee 2016-07-07 10:29 ` Jeff Layton 2016-07-07 23:53 ` Dave Chinner 2016-07-08 11:33 ` Jeff Layton 2016-07-08 12:48 ` Seth Forshee 2016-07-08 12:55 ` Trond Myklebust 2016-07-08 13:05 ` Trond Myklebust 2016-07-11 1:20 ` Dave Chinner 2016-07-08 12:22 ` Michal Hocko 2016-07-08 12:47 ` Seth Forshee 2016-07-08 12:51 ` Jeff Layton 2016-07-08 14:23 ` Michal Hocko 2016-07-08 14:27 ` Jeff Layton 2016-07-11 7:23 ` Michal Hocko 2016-07-11 11:03 ` Jeff Layton 2016-07-11 11:43 ` Michal Hocko 2016-07-11 12:50 ` Seth Forshee
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).