* LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
@ 2013-03-04 13:57 Ming Lei
2013-03-04 14:14 ` Myklebust, Trond
0 siblings, 1 reply; 61+ messages in thread
From: Ming Lei @ 2013-03-04 13:57 UTC (permalink / raw)
To: Trond Myklebust, J. Bruce Fields; +Cc: Linux Kernel Mailing List, linux-nfs
Hi,
The below warning can be triggered each time when mount.nfs is
running on 3.9-rc1.
Not sure if freezable_schedule() inside rpc_wait_bit_killable should
be changed to schedule() since nfs_clid_init_mutex is held in the path.
[ 41.387939] =====================================
[ 41.392913] [ BUG: mount.nfs/643 still has locks held! ]
[ 41.398559] 3.9.0-rc1+ #1740 Not tainted
[ 41.402709] -------------------------------------
[ 41.407714] 1 lock held by mount.nfs/643:
[ 41.411956] #0: (nfs_clid_init_mutex){+.+...}, at: [<c0226d6c>]
nfs4_discover_server_trunking+0x60/0x1d4
[ 41.422363]
[ 41.422363] stack backtrace:
[ 41.427032] [<c0014dd4>] (unwind_backtrace+0x0/0xe0) from
[<c04a8300>] (rpc_wait_bit_killable+0x38/0xc8)
[ 41.437103] [<c04a8300>] (rpc_wait_bit_killable+0x38/0xc8) from
[<c054e454>] (__wait_on_bit+0x54/0x9c)
[ 41.446990] [<c054e454>] (__wait_on_bit+0x54/0x9c) from
[<c054e514>] (out_of_line_wait_on_bit+0x78/0x84)
[ 41.457061] [<c054e514>] (out_of_line_wait_on_bit+0x78/0x84) from
[<c04a8f88>] (__rpc_execute+0x170/0x348)
[ 41.467407] [<c04a8f88>] (__rpc_execute+0x170/0x348) from
[<c04a250c>] (rpc_run_task+0x9c/0xa4)
[ 41.476715] [<c04a250c>] (rpc_run_task+0x9c/0xa4) from [<c04a265c>]
(rpc_call_sync+0x70/0xb0)
[ 41.485778] [<c04a265c>] (rpc_call_sync+0x70/0xb0) from
[<c021af54>] (nfs4_proc_setclientid+0x1a0/0x1c8)
[ 41.495819] [<c021af54>] (nfs4_proc_setclientid+0x1a0/0x1c8) from
[<c0224eb4>] (nfs40_discover_server_trunki
ng+0xec/0x148)
[ 41.507507] [<c0224eb4>]
(nfs40_discover_server_trunking+0xec/0x148) from [<c0226da0>]
(nfs4_discover_server
_trunking+0x94/0x1d4)
[ 41.519866] [<c0226da0>] (nfs4_discover_server_trunking+0x94/0x1d4)
from [<c022c664>] (nfs4_init_client+0x15
0/0x1b0)
[ 41.531036] [<c022c664>] (nfs4_init_client+0x150/0x1b0) from
[<c01fe954>] (nfs_get_client+0x2cc/0x320)
[ 41.540863] [<c01fe954>] (nfs_get_client+0x2cc/0x320) from
[<c022c080>] (nfs4_set_client+0x80/0xb0)
[ 41.550476] [<c022c080>] (nfs4_set_client+0x80/0xb0) from
[<c022cef8>] (nfs4_create_server+0xb0/0x21c)
[ 41.560333] [<c022cef8>] (nfs4_create_server+0xb0/0x21c) from
[<c0227524>] (nfs4_remote_mount+0x28/0x54)
[ 41.570373] [<c0227524>] (nfs4_remote_mount+0x28/0x54) from
[<c0113a8c>] (mount_fs+0x6c/0x160)
[ 41.579498] [<c0113a8c>] (mount_fs+0x6c/0x160) from [<c012a47c>]
(vfs_kern_mount+0x4c/0xc0)
[ 41.588378] [<c012a47c>] (vfs_kern_mount+0x4c/0xc0) from
[<c022734c>] (nfs_do_root_mount+0x74/0x90)
[ 41.597961] [<c022734c>] (nfs_do_root_mount+0x74/0x90) from
[<c0227574>] (nfs4_try_mount+0x24/0x3c)
[ 41.607513] [<c0227574>] (nfs4_try_mount+0x24/0x3c) from
[<c02070f8>] (nfs_fs_mount+0x6dc/0x7a0)
[ 41.616821] [<c02070f8>] (nfs_fs_mount+0x6dc/0x7a0) from
[<c0113a8c>] (mount_fs+0x6c/0x160)
[ 41.625701] [<c0113a8c>] (mount_fs+0x6c/0x160) from [<c012a47c>]
(vfs_kern_mount+0x4c/0xc0)
[ 41.634582] [<c012a47c>] (vfs_kern_mount+0x4c/0xc0) from
[<c012c330>] (do_mount+0x710/0x81c)
[ 41.643524] [<c012c330>] (do_mount+0x710/0x81c) from [<c012c4c0>]
(sys_mount+0x84/0xb8)
[ 41.652008] [<c012c4c0>] (sys_mount+0x84/0xb8) from [<c000d6c0>]
(ret_fast_syscall+0x0/0x48)
[ 41.715911] device: '0:28': device_add
[ 41.720062] PM: Adding info for No Bus:0:28
[ 41.746887] device: '0:29': device_add
[ 41.751037] PM: Adding info for No Bus:0:29
[ 41.780700] device: '0:28': device_unregister
[ 41.785400] PM: Removing info for No Bus:0:28
[ 41.790344] device: '0:28': device_create_release
Thanks,
--
Ming Lei
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-04 13:57 LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! Ming Lei @ 2013-03-04 14:14 ` Myklebust, Trond 2013-03-04 14:23 ` Jeff Layton 2013-03-04 14:40 ` Ming Lei 0 siblings, 2 replies; 61+ messages in thread From: Myklebust, Trond @ 2013-03-04 14:14 UTC (permalink / raw) To: Ming Lei, Jeff Layton Cc: J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org On Mon, 2013-03-04 at 21:57 +0800, Ming Lei wrote: > Hi, > > The below warning can be triggered each time when mount.nfs is > running on 3.9-rc1. > > Not sure if freezable_schedule() inside rpc_wait_bit_killable should > be changed to schedule() since nfs_clid_init_mutex is held in the path. Cc:ing Jeff, who added freezable_schedule(), and applied it to rpc_wait_bit_killable. So this is occurring when the kernel enters the freeze state? Why does it occur only with nfs_clid_init_mutex, and not with all the other mutexes that we hold across RPC calls? We hold inode->i_mutex across RPC calls all the time when doing renames, unlinks, file creation,... > [ 41.387939] ===================================== > [ 41.392913] [ BUG: mount.nfs/643 still has locks held! ] > [ 41.398559] 3.9.0-rc1+ #1740 Not tainted > [ 41.402709] ------------------------------------- > [ 41.407714] 1 lock held by mount.nfs/643: > [ 41.411956] #0: (nfs_clid_init_mutex){+.+...}, at: [<c0226d6c>] > nfs4_discover_server_trunking+0x60/0x1d4 > [ 41.422363] > [ 41.422363] stack backtrace: > [ 41.427032] [<c0014dd4>] (unwind_backtrace+0x0/0xe0) from > [<c04a8300>] (rpc_wait_bit_killable+0x38/0xc8) > [ 41.437103] [<c04a8300>] (rpc_wait_bit_killable+0x38/0xc8) from > [<c054e454>] (__wait_on_bit+0x54/0x9c) > [ 41.446990] [<c054e454>] (__wait_on_bit+0x54/0x9c) from > [<c054e514>] (out_of_line_wait_on_bit+0x78/0x84) > [ 41.457061] [<c054e514>] (out_of_line_wait_on_bit+0x78/0x84) from > [<c04a8f88>] (__rpc_execute+0x170/0x348) > [ 41.467407] [<c04a8f88>] (__rpc_execute+0x170/0x348) from > [<c04a250c>] (rpc_run_task+0x9c/0xa4) > [ 41.476715] [<c04a250c>] (rpc_run_task+0x9c/0xa4) from [<c04a265c>] > (rpc_call_sync+0x70/0xb0) > [ 41.485778] [<c04a265c>] (rpc_call_sync+0x70/0xb0) from > [<c021af54>] (nfs4_proc_setclientid+0x1a0/0x1c8) > [ 41.495819] [<c021af54>] (nfs4_proc_setclientid+0x1a0/0x1c8) from > [<c0224eb4>] (nfs40_discover_server_trunki > ng+0xec/0x148) > [ 41.507507] [<c0224eb4>] > (nfs40_discover_server_trunking+0xec/0x148) from [<c0226da0>] > (nfs4_discover_server > _trunking+0x94/0x1d4) > [ 41.519866] [<c0226da0>] (nfs4_discover_server_trunking+0x94/0x1d4) > from [<c022c664>] (nfs4_init_client+0x15 > 0/0x1b0) > [ 41.531036] [<c022c664>] (nfs4_init_client+0x150/0x1b0) from > [<c01fe954>] (nfs_get_client+0x2cc/0x320) > [ 41.540863] [<c01fe954>] (nfs_get_client+0x2cc/0x320) from > [<c022c080>] (nfs4_set_client+0x80/0xb0) > [ 41.550476] [<c022c080>] (nfs4_set_client+0x80/0xb0) from > [<c022cef8>] (nfs4_create_server+0xb0/0x21c) > [ 41.560333] [<c022cef8>] (nfs4_create_server+0xb0/0x21c) from > [<c0227524>] (nfs4_remote_mount+0x28/0x54) > [ 41.570373] [<c0227524>] (nfs4_remote_mount+0x28/0x54) from > [<c0113a8c>] (mount_fs+0x6c/0x160) > [ 41.579498] [<c0113a8c>] (mount_fs+0x6c/0x160) from [<c012a47c>] > (vfs_kern_mount+0x4c/0xc0) > [ 41.588378] [<c012a47c>] (vfs_kern_mount+0x4c/0xc0) from > [<c022734c>] (nfs_do_root_mount+0x74/0x90) > [ 41.597961] [<c022734c>] (nfs_do_root_mount+0x74/0x90) from > [<c0227574>] (nfs4_try_mount+0x24/0x3c) > [ 41.607513] [<c0227574>] (nfs4_try_mount+0x24/0x3c) from > [<c02070f8>] (nfs_fs_mount+0x6dc/0x7a0) > [ 41.616821] [<c02070f8>] (nfs_fs_mount+0x6dc/0x7a0) from > [<c0113a8c>] (mount_fs+0x6c/0x160) > [ 41.625701] [<c0113a8c>] (mount_fs+0x6c/0x160) from [<c012a47c>] > (vfs_kern_mount+0x4c/0xc0) > [ 41.634582] [<c012a47c>] (vfs_kern_mount+0x4c/0xc0) from > [<c012c330>] (do_mount+0x710/0x81c) > [ 41.643524] [<c012c330>] (do_mount+0x710/0x81c) from [<c012c4c0>] > (sys_mount+0x84/0xb8) > [ 41.652008] [<c012c4c0>] (sys_mount+0x84/0xb8) from [<c000d6c0>] > (ret_fast_syscall+0x0/0x48) > [ 41.715911] device: '0:28': device_add > [ 41.720062] PM: Adding info for No Bus:0:28 > [ 41.746887] device: '0:29': device_add > [ 41.751037] PM: Adding info for No Bus:0:29 > [ 41.780700] device: '0:28': device_unregister > [ 41.785400] PM: Removing info for No Bus:0:28 > [ 41.790344] device: '0:28': device_create_release > > > Thanks, > -- > Ming Lei ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-04 14:14 ` Myklebust, Trond @ 2013-03-04 14:23 ` Jeff Layton 2013-03-04 19:55 ` Mandeep Singh Baines 2013-03-04 14:40 ` Ming Lei 1 sibling, 1 reply; 61+ messages in thread From: Jeff Layton @ 2013-03-04 14:23 UTC (permalink / raw) To: Myklebust, Trond Cc: Ming Lei, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Mandeep Singh Baines On Mon, 4 Mar 2013 14:14:23 +0000 "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote: > On Mon, 2013-03-04 at 21:57 +0800, Ming Lei wrote: > > Hi, > > > > The below warning can be triggered each time when mount.nfs is > > running on 3.9-rc1. > > > > Not sure if freezable_schedule() inside rpc_wait_bit_killable should > > be changed to schedule() since nfs_clid_init_mutex is held in the path. > > Cc:ing Jeff, who added freezable_schedule(), and applied it to > rpc_wait_bit_killable. > > So this is occurring when the kernel enters the freeze state? > Why does it occur only with nfs_clid_init_mutex, and not with all the > other mutexes that we hold across RPC calls? We hold inode->i_mutex > across RPC calls all the time when doing renames, unlinks, file > creation,... > cc'ing Mandeep as well since his patch caused this to start popping up... We've also gotten some similar lockdep pops in the nfsd code recently. I responded to an email about it here on Friday, and I'll re-post what I said there below: -------------------------[snip]---------------------- Ok, I see... rpc_wait_bit_killable() calls freezable_schedule(). That calls freezer_count() which calls try_to_freeze(). try_to_freeze does this lockdep check now as of commit 6aa9707099. The assumption seems to be that freezing a thread while holding any sort of lock is bad. The rationale in that patch seems a bit sketchy to me though. We can be fairly certain that we're not going to deadlock by holding these locks, but I guess there could be something I've missed. Mandeep, can you elaborate on whether there's really a deadlock scenario here? If not, then is there some way to annotate these locks so this lockdep pop goes away? -------------------------[snip]---------------------- > > [ 41.387939] ===================================== > > [ 41.392913] [ BUG: mount.nfs/643 still has locks held! ] > > [ 41.398559] 3.9.0-rc1+ #1740 Not tainted > > [ 41.402709] ------------------------------------- > > [ 41.407714] 1 lock held by mount.nfs/643: > > [ 41.411956] #0: (nfs_clid_init_mutex){+.+...}, at: [<c0226d6c>] > > nfs4_discover_server_trunking+0x60/0x1d4 > > [ 41.422363] > > [ 41.422363] stack backtrace: > > [ 41.427032] [<c0014dd4>] (unwind_backtrace+0x0/0xe0) from > > [<c04a8300>] (rpc_wait_bit_killable+0x38/0xc8) > > [ 41.437103] [<c04a8300>] (rpc_wait_bit_killable+0x38/0xc8) from > > [<c054e454>] (__wait_on_bit+0x54/0x9c) > > [ 41.446990] [<c054e454>] (__wait_on_bit+0x54/0x9c) from > > [<c054e514>] (out_of_line_wait_on_bit+0x78/0x84) > > [ 41.457061] [<c054e514>] (out_of_line_wait_on_bit+0x78/0x84) from > > [<c04a8f88>] (__rpc_execute+0x170/0x348) > > [ 41.467407] [<c04a8f88>] (__rpc_execute+0x170/0x348) from > > [<c04a250c>] (rpc_run_task+0x9c/0xa4) > > [ 41.476715] [<c04a250c>] (rpc_run_task+0x9c/0xa4) from [<c04a265c>] > > (rpc_call_sync+0x70/0xb0) > > [ 41.485778] [<c04a265c>] (rpc_call_sync+0x70/0xb0) from > > [<c021af54>] (nfs4_proc_setclientid+0x1a0/0x1c8) > > [ 41.495819] [<c021af54>] (nfs4_proc_setclientid+0x1a0/0x1c8) from > > [<c0224eb4>] (nfs40_discover_server_trunki > > ng+0xec/0x148) > > [ 41.507507] [<c0224eb4>] > > (nfs40_discover_server_trunking+0xec/0x148) from [<c0226da0>] > > (nfs4_discover_server > > _trunking+0x94/0x1d4) > > [ 41.519866] [<c0226da0>] (nfs4_discover_server_trunking+0x94/0x1d4) > > from [<c022c664>] (nfs4_init_client+0x15 > > 0/0x1b0) > > [ 41.531036] [<c022c664>] (nfs4_init_client+0x150/0x1b0) from > > [<c01fe954>] (nfs_get_client+0x2cc/0x320) > > [ 41.540863] [<c01fe954>] (nfs_get_client+0x2cc/0x320) from > > [<c022c080>] (nfs4_set_client+0x80/0xb0) > > [ 41.550476] [<c022c080>] (nfs4_set_client+0x80/0xb0) from > > [<c022cef8>] (nfs4_create_server+0xb0/0x21c) > > [ 41.560333] [<c022cef8>] (nfs4_create_server+0xb0/0x21c) from > > [<c0227524>] (nfs4_remote_mount+0x28/0x54) > > [ 41.570373] [<c0227524>] (nfs4_remote_mount+0x28/0x54) from > > [<c0113a8c>] (mount_fs+0x6c/0x160) > > [ 41.579498] [<c0113a8c>] (mount_fs+0x6c/0x160) from [<c012a47c>] > > (vfs_kern_mount+0x4c/0xc0) > > [ 41.588378] [<c012a47c>] (vfs_kern_mount+0x4c/0xc0) from > > [<c022734c>] (nfs_do_root_mount+0x74/0x90) > > [ 41.597961] [<c022734c>] (nfs_do_root_mount+0x74/0x90) from > > [<c0227574>] (nfs4_try_mount+0x24/0x3c) > > [ 41.607513] [<c0227574>] (nfs4_try_mount+0x24/0x3c) from > > [<c02070f8>] (nfs_fs_mount+0x6dc/0x7a0) > > [ 41.616821] [<c02070f8>] (nfs_fs_mount+0x6dc/0x7a0) from > > [<c0113a8c>] (mount_fs+0x6c/0x160) > > [ 41.625701] [<c0113a8c>] (mount_fs+0x6c/0x160) from [<c012a47c>] > > (vfs_kern_mount+0x4c/0xc0) > > [ 41.634582] [<c012a47c>] (vfs_kern_mount+0x4c/0xc0) from > > [<c012c330>] (do_mount+0x710/0x81c) > > [ 41.643524] [<c012c330>] (do_mount+0x710/0x81c) from [<c012c4c0>] > > (sys_mount+0x84/0xb8) > > [ 41.652008] [<c012c4c0>] (sys_mount+0x84/0xb8) from [<c000d6c0>] > > (ret_fast_syscall+0x0/0x48) > > [ 41.715911] device: '0:28': device_add > > [ 41.720062] PM: Adding info for No Bus:0:28 > > [ 41.746887] device: '0:29': device_add > > [ 41.751037] PM: Adding info for No Bus:0:29 > > [ 41.780700] device: '0:28': device_unregister > > [ 41.785400] PM: Removing info for No Bus:0:28 > > [ 41.790344] device: '0:28': device_create_release > > > > > > Thanks, > > -- > > Ming Lei > -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-04 14:23 ` Jeff Layton @ 2013-03-04 19:55 ` Mandeep Singh Baines 2013-03-04 20:53 ` Oleg Nesterov 0 siblings, 1 reply; 61+ messages in thread From: Mandeep Singh Baines @ 2013-03-04 19:55 UTC (permalink / raw) To: Jeff Layton Cc: Myklebust, Trond, Ming Lei, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Tejun Heo, Ingo Molnar, Oleg Nesterov + rjw, akpm, tejun, mingo, oleg On Mon, Mar 4, 2013 at 6:23 AM, Jeff Layton <jlayton@redhat.com> wrote: > On Mon, 4 Mar 2013 14:14:23 +0000 > "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote: > >> On Mon, 2013-03-04 at 21:57 +0800, Ming Lei wrote: >> > Hi, >> > >> > The below warning can be triggered each time when mount.nfs is >> > running on 3.9-rc1. >> > >> > Not sure if freezable_schedule() inside rpc_wait_bit_killable should >> > be changed to schedule() since nfs_clid_init_mutex is held in the path. >> >> Cc:ing Jeff, who added freezable_schedule(), and applied it to >> rpc_wait_bit_killable. >> >> So this is occurring when the kernel enters the freeze state? >> Why does it occur only with nfs_clid_init_mutex, and not with all the >> other mutexes that we hold across RPC calls? We hold inode->i_mutex >> across RPC calls all the time when doing renames, unlinks, file >> creation,... >> > > cc'ing Mandeep as well since his patch caused this to start popping up... > > We've also gotten some similar lockdep pops in the nfsd code recently. > I responded to an email about it here on Friday, and I'll re-post what > I said there below: > > -------------------------[snip]---------------------- > Ok, I see... > > rpc_wait_bit_killable() calls freezable_schedule(). That calls > freezer_count() which calls try_to_freeze(). try_to_freeze does this > lockdep check now as of commit 6aa9707099. > > The assumption seems to be that freezing a thread while holding any > sort of lock is bad. The rationale in that patch seems a bit sketchy to > me though. We can be fairly certain that we're not going to deadlock by > holding these locks, but I guess there could be something I've missed. > > Mandeep, can you elaborate on whether there's really a deadlock > scenario here? If not, then is there some way to annotate these locks > so this lockdep pop goes away? We were seeing deadlocks on suspend that were root caused to locks held at freeze time. In this case, the locks were device locks. I wanted a way to detect if a process tried to freeze with a device_lock held. Then I thought about the cgroup_freezer subsystem (which I recently turned on in our system). If you were to freeze a cgroup and a process were holding a lock, you could deadlock. Seems reasonable that this code path could cause a deadlock that way. The problem is that freezer_count() calls try_to_freeze(). In this case, try_to_freeze() is not really adding any value. If we didn't try_to_freeze() in this instance of freezer_count() you'd avoid the potential deadlock and not block suspend. I think a lot of call sites are like that. They want to avoid blocking suspend but aren't really interested in trying to freeze at the moment they call freezer_count(). Regards, Mandeep > -------------------------[snip]---------------------- > > >> > [ 41.387939] ===================================== >> > [ 41.392913] [ BUG: mount.nfs/643 still has locks held! ] >> > [ 41.398559] 3.9.0-rc1+ #1740 Not tainted >> > [ 41.402709] ------------------------------------- >> > [ 41.407714] 1 lock held by mount.nfs/643: >> > [ 41.411956] #0: (nfs_clid_init_mutex){+.+...}, at: [<c0226d6c>] >> > nfs4_discover_server_trunking+0x60/0x1d4 >> > [ 41.422363] >> > [ 41.422363] stack backtrace: >> > [ 41.427032] [<c0014dd4>] (unwind_backtrace+0x0/0xe0) from >> > [<c04a8300>] (rpc_wait_bit_killable+0x38/0xc8) >> > [ 41.437103] [<c04a8300>] (rpc_wait_bit_killable+0x38/0xc8) from >> > [<c054e454>] (__wait_on_bit+0x54/0x9c) >> > [ 41.446990] [<c054e454>] (__wait_on_bit+0x54/0x9c) from >> > [<c054e514>] (out_of_line_wait_on_bit+0x78/0x84) >> > [ 41.457061] [<c054e514>] (out_of_line_wait_on_bit+0x78/0x84) from >> > [<c04a8f88>] (__rpc_execute+0x170/0x348) >> > [ 41.467407] [<c04a8f88>] (__rpc_execute+0x170/0x348) from >> > [<c04a250c>] (rpc_run_task+0x9c/0xa4) >> > [ 41.476715] [<c04a250c>] (rpc_run_task+0x9c/0xa4) from [<c04a265c>] >> > (rpc_call_sync+0x70/0xb0) >> > [ 41.485778] [<c04a265c>] (rpc_call_sync+0x70/0xb0) from >> > [<c021af54>] (nfs4_proc_setclientid+0x1a0/0x1c8) >> > [ 41.495819] [<c021af54>] (nfs4_proc_setclientid+0x1a0/0x1c8) from >> > [<c0224eb4>] (nfs40_discover_server_trunki >> > ng+0xec/0x148) >> > [ 41.507507] [<c0224eb4>] >> > (nfs40_discover_server_trunking+0xec/0x148) from [<c0226da0>] >> > (nfs4_discover_server >> > _trunking+0x94/0x1d4) >> > [ 41.519866] [<c0226da0>] (nfs4_discover_server_trunking+0x94/0x1d4) >> > from [<c022c664>] (nfs4_init_client+0x15 >> > 0/0x1b0) >> > [ 41.531036] [<c022c664>] (nfs4_init_client+0x150/0x1b0) from >> > [<c01fe954>] (nfs_get_client+0x2cc/0x320) >> > [ 41.540863] [<c01fe954>] (nfs_get_client+0x2cc/0x320) from >> > [<c022c080>] (nfs4_set_client+0x80/0xb0) >> > [ 41.550476] [<c022c080>] (nfs4_set_client+0x80/0xb0) from >> > [<c022cef8>] (nfs4_create_server+0xb0/0x21c) >> > [ 41.560333] [<c022cef8>] (nfs4_create_server+0xb0/0x21c) from >> > [<c0227524>] (nfs4_remote_mount+0x28/0x54) >> > [ 41.570373] [<c0227524>] (nfs4_remote_mount+0x28/0x54) from >> > [<c0113a8c>] (mount_fs+0x6c/0x160) >> > [ 41.579498] [<c0113a8c>] (mount_fs+0x6c/0x160) from [<c012a47c>] >> > (vfs_kern_mount+0x4c/0xc0) >> > [ 41.588378] [<c012a47c>] (vfs_kern_mount+0x4c/0xc0) from >> > [<c022734c>] (nfs_do_root_mount+0x74/0x90) >> > [ 41.597961] [<c022734c>] (nfs_do_root_mount+0x74/0x90) from >> > [<c0227574>] (nfs4_try_mount+0x24/0x3c) >> > [ 41.607513] [<c0227574>] (nfs4_try_mount+0x24/0x3c) from >> > [<c02070f8>] (nfs_fs_mount+0x6dc/0x7a0) >> > [ 41.616821] [<c02070f8>] (nfs_fs_mount+0x6dc/0x7a0) from >> > [<c0113a8c>] (mount_fs+0x6c/0x160) >> > [ 41.625701] [<c0113a8c>] (mount_fs+0x6c/0x160) from [<c012a47c>] >> > (vfs_kern_mount+0x4c/0xc0) >> > [ 41.634582] [<c012a47c>] (vfs_kern_mount+0x4c/0xc0) from >> > [<c012c330>] (do_mount+0x710/0x81c) >> > [ 41.643524] [<c012c330>] (do_mount+0x710/0x81c) from [<c012c4c0>] >> > (sys_mount+0x84/0xb8) >> > [ 41.652008] [<c012c4c0>] (sys_mount+0x84/0xb8) from [<c000d6c0>] >> > (ret_fast_syscall+0x0/0x48) >> > [ 41.715911] device: '0:28': device_add >> > [ 41.720062] PM: Adding info for No Bus:0:28 >> > [ 41.746887] device: '0:29': device_add >> > [ 41.751037] PM: Adding info for No Bus:0:29 >> > [ 41.780700] device: '0:28': device_unregister >> > [ 41.785400] PM: Removing info for No Bus:0:28 >> > [ 41.790344] device: '0:28': device_create_release >> > >> > >> > Thanks, >> > -- >> > Ming Lei >> > > > -- > Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-04 19:55 ` Mandeep Singh Baines @ 2013-03-04 20:53 ` Oleg Nesterov 2013-03-04 22:08 ` Myklebust, Trond 0 siblings, 1 reply; 61+ messages in thread From: Oleg Nesterov @ 2013-03-04 20:53 UTC (permalink / raw) To: Mandeep Singh Baines Cc: Jeff Layton, Myklebust, Trond, Ming Lei, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Tejun Heo, Ingo Molnar On 03/04, Mandeep Singh Baines wrote: > > The problem is that freezer_count() calls try_to_freeze(). In this > case, try_to_freeze() is not really adding any value. Well, I tend to agree. If a task calls __refrigerator() holding a lock which another freezable task can wait for, this is not freezer-friendly. freezable_schedule/freezer_do_not_count/etc not only means "I won't be active if freezing()", it should also mean "I won't block suspend/etc". OTOH, I understand that probably it is not trivial to change this code to make it freezer-friendly. But at least I disagree with "push your problems onto others". Oleg. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-04 20:53 ` Oleg Nesterov @ 2013-03-04 22:08 ` Myklebust, Trond 2013-03-05 13:23 ` Jeff Layton 0 siblings, 1 reply; 61+ messages in thread From: Myklebust, Trond @ 2013-03-04 22:08 UTC (permalink / raw) To: Oleg Nesterov Cc: Mandeep Singh Baines, Jeff Layton, Ming Lei, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Tejun Heo, Ingo Molnar On Mon, 2013-03-04 at 21:53 +0100, Oleg Nesterov wrote: > On 03/04, Mandeep Singh Baines wrote: > > > > The problem is that freezer_count() calls try_to_freeze(). In this > > case, try_to_freeze() is not really adding any value. > > Well, I tend to agree. > > If a task calls __refrigerator() holding a lock which another freezable > task can wait for, this is not freezer-friendly. > > freezable_schedule/freezer_do_not_count/etc not only means "I won't be > active if freezing()", it should also mean "I won't block suspend/etc". If suspend for some reason requires a re-entrant mount, then yes, I can see how it might be a problem that we're holding a mount-related lock. The question is why is that necessary? > OTOH, I understand that probably it is not trivial to change this code > to make it freezer-friendly. But at least I disagree with "push your > problems onto others". That code can't be made freezer-friendly if it isn't allowed to hold basic filesystem-related locks across RPC calls. A number of those RPC calls do things that need to be protected by VFS or MM-level locks. i.e.: lookups, file creation/deletion, page fault in/out, ... IOW: the problem would need to be solved differently, possibly by adding a new FIFREEZE-like call to allow the filesystem to quiesce itself _before_ NetworkManager pulls the rug out from underneath it. There would still be plenty of corner cases to keep people entertained (e.g. the server goes down before the quiesce call is invoked) but at least the top 90% of cases would be solved. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-04 22:08 ` Myklebust, Trond @ 2013-03-05 13:23 ` Jeff Layton 2013-03-05 17:46 ` Tejun Heo 0 siblings, 1 reply; 61+ messages in thread From: Jeff Layton @ 2013-03-05 13:23 UTC (permalink / raw) To: Myklebust, Trond Cc: Oleg Nesterov, Mandeep Singh Baines, Ming Lei, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Tejun Heo, Ingo Molnar On Mon, 4 Mar 2013 22:08:34 +0000 "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote: > On Mon, 2013-03-04 at 21:53 +0100, Oleg Nesterov wrote: > > On 03/04, Mandeep Singh Baines wrote: > > > > > > The problem is that freezer_count() calls try_to_freeze(). In this > > > case, try_to_freeze() is not really adding any value. > > > > Well, I tend to agree. > > > > If a task calls __refrigerator() holding a lock which another freezable > > task can wait for, this is not freezer-friendly. > > > > freezable_schedule/freezer_do_not_count/etc not only means "I won't be > > active if freezing()", it should also mean "I won't block suspend/etc". > > If suspend for some reason requires a re-entrant mount, then yes, I can > see how it might be a problem that we're holding a mount-related lock. > The question is why is that necessary? > > > OTOH, I understand that probably it is not trivial to change this code > > to make it freezer-friendly. But at least I disagree with "push your > > problems onto others". > > That code can't be made freezer-friendly if it isn't allowed to hold > basic filesystem-related locks across RPC calls. A number of those RPC > calls do things that need to be protected by VFS or MM-level locks. > i.e.: lookups, file creation/deletion, page fault in/out, ... > > IOW: the problem would need to be solved differently, possibly by adding > a new FIFREEZE-like call to allow the filesystem to quiesce itself > _before_ NetworkManager pulls the rug out from underneath it. There > would still be plenty of corner cases to keep people entertained (e.g. > the server goes down before the quiesce call is invoked) but at least > the top 90% of cases would be solved. > Ok, I think I'm starting to get it. It doesn't necessarily need a reentrant mount or anything like that. Consider this case (which is not even that unlikely): Suppose there are two tasks calling unlink() on files in the same NFS directory. First task takes the i_mutex on the parent directory and goes to ask the server to remove the file. Second task calls unlink just afterward and blocks on the parent's i_mutex. Now, a suspend event comes in and freezes the first task while it's waiting on the response. It still holds the parent's i_mutex. Freezer now gets to the second task and can't freeze it because the sleep on that mutex isn't freezable. So, not a deadlock per-se in this case but it does prevent the freezer from running to completion. I don't see any way to solve it though w/o making all mutexes freezable. Note that I don't think this is really limited to NFS either -- a lot of other filesystems will have similar problems: CIFS, some FUSE variants, etc... -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-05 13:23 ` Jeff Layton @ 2013-03-05 17:46 ` Tejun Heo 2013-03-05 17:49 ` Tejun Heo 0 siblings, 1 reply; 61+ messages in thread From: Tejun Heo @ 2013-03-05 17:46 UTC (permalink / raw) To: Jeff Layton Cc: Myklebust, Trond, Oleg Nesterov, Mandeep Singh Baines, Ming Lei, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Ingo Molnar Hello, guys. On Tue, Mar 05, 2013 at 08:23:08AM -0500, Jeff Layton wrote: > So, not a deadlock per-se in this case but it does prevent the freezer > from running to completion. I don't see any way to solve it though w/o > making all mutexes freezable. Note that I don't think this is really > limited to NFS either -- a lot of other filesystems will have similar > problems: CIFS, some FUSE variants, etc... So, I think this is why implementing freezer as a separate blocking mechanism isn't such a good idea. We're effectively introducing a completely new waiting state to a lot of unsuspecting paths which generates a lot of risks and eventually extra complexity to work around those. I think we really should update freezer to re-use the blocking points we already have - the ones used for signal delivery and ptracing. That way, other code paths don't have to worry about an extra stop state and we can confine most complexities to freezer proper. Thanks. -- tejun ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-05 17:46 ` Tejun Heo @ 2013-03-05 17:49 ` Tejun Heo 2013-03-05 19:03 ` Jeff Layton 2013-03-05 23:11 ` J. Bruce Fields 0 siblings, 2 replies; 61+ messages in thread From: Tejun Heo @ 2013-03-05 17:49 UTC (permalink / raw) To: Jeff Layton Cc: Myklebust, Trond, Oleg Nesterov, Mandeep Singh Baines, Ming Lei, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Ingo Molnar On Tue, Mar 05, 2013 at 09:46:48AM -0800, Tejun Heo wrote: > So, I think this is why implementing freezer as a separate blocking > mechanism isn't such a good idea. We're effectively introducing a > completely new waiting state to a lot of unsuspecting paths which > generates a lot of risks and eventually extra complexity to work > around those. I think we really should update freezer to re-use the > blocking points we already have - the ones used for signal delivery > and ptracing. That way, other code paths don't have to worry about an > extra stop state and we can confine most complexities to freezer > proper. Also, consolidating those wait states means that we can solve the event-to-response latency problem for all three cases - signal, ptrace and freezer, rather than adding separate backing-out strategy for freezer. -- tejun ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-05 17:49 ` Tejun Heo @ 2013-03-05 19:03 ` Jeff Layton 2013-03-05 19:09 ` Tejun Heo 2013-03-06 1:10 ` Myklebust, Trond 2013-03-05 23:11 ` J. Bruce Fields 1 sibling, 2 replies; 61+ messages in thread From: Jeff Layton @ 2013-03-05 19:03 UTC (permalink / raw) To: Tejun Heo Cc: Myklebust, Trond, Oleg Nesterov, Mandeep Singh Baines, Ming Lei, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Ingo Molnar On Tue, 5 Mar 2013 09:49:54 -0800 Tejun Heo <tj@kernel.org> wrote: > On Tue, Mar 05, 2013 at 09:46:48AM -0800, Tejun Heo wrote: > > So, I think this is why implementing freezer as a separate blocking > > mechanism isn't such a good idea. We're effectively introducing a > > completely new waiting state to a lot of unsuspecting paths which > > generates a lot of risks and eventually extra complexity to work > > around those. I think we really should update freezer to re-use the > > blocking points we already have - the ones used for signal delivery > > and ptracing. That way, other code paths don't have to worry about an > > extra stop state and we can confine most complexities to freezer > > proper. > > Also, consolidating those wait states means that we can solve the > event-to-response latency problem for all three cases - signal, ptrace > and freezer, rather than adding separate backing-out strategy for > freezer. > Sounds intriguing... I'm not sure what this really means for something like NFS though. How would you envision this working when we have long running syscalls that might sit waiting in the kernel indefinitely? Here's my blue-sky, poorly-thought-out idea... We could add a signal (e.g. SIGFREEZE) that allows the sleeps in NFS/RPC layer to be interrupted. Those would return back toward userland with a particular type of error (sort of like ERESTARTSYS). Before returning from the kernel though, we could freeze the process. When it wakes up, then we could go back down and retry the call again (much like an ERESTARTSYS kind of thing). The tricky part here is that we'd need to distinguish between the case where we caught SIGFREEZE before sending an RPC vs. after. If we sent the call before freezing, then we don't want to resend it again. It might be a non-idempotent operation. Sounds horrific to code up though... :) -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-05 19:03 ` Jeff Layton @ 2013-03-05 19:09 ` Tejun Heo 2013-03-05 23:39 ` Jeff Layton 2013-03-06 1:10 ` Myklebust, Trond 1 sibling, 1 reply; 61+ messages in thread From: Tejun Heo @ 2013-03-05 19:09 UTC (permalink / raw) To: Jeff Layton Cc: Myklebust, Trond, Oleg Nesterov, Mandeep Singh Baines, Ming Lei, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Ingo Molnar Hello, Jeff. On Tue, Mar 05, 2013 at 02:03:12PM -0500, Jeff Layton wrote: > Sounds intriguing... > > I'm not sure what this really means for something like NFS though. How > would you envision this working when we have long running syscalls that > might sit waiting in the kernel indefinitely? I think it is the same problem as being able to handle SIGKILL in responsive manner. It could be tricky to implement for nfs but it at least doesn't have to solve the problem twice. > Here's my blue-sky, poorly-thought-out idea... > > We could add a signal (e.g. SIGFREEZE) that allows the sleeps in > NFS/RPC layer to be interrupted. Those would return back toward > userland with a particular type of error (sort of like ERESTARTSYS). > > Before returning from the kernel though, we could freeze the process. > When it wakes up, then we could go back down and retry the call again > (much like an ERESTARTSYS kind of thing). > > The tricky part here is that we'd need to distinguish between the case > where we caught SIGFREEZE before sending an RPC vs. after. If we sent > the call before freezing, then we don't want to resend it again. It > might be a non-idempotent operation. So, yeah, you are thinking pretty much the same as I'm. > Sounds horrific to code up though... :) I don't know the details of nfs but those events could essentially be signaling that the system is gonna lose power. I think it would be a good idea to solve it. Thanks. -- tejun ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-05 19:09 ` Tejun Heo @ 2013-03-05 23:39 ` Jeff Layton 2013-03-05 23:47 ` Tejun Heo 2013-03-06 18:17 ` Oleg Nesterov 0 siblings, 2 replies; 61+ messages in thread From: Jeff Layton @ 2013-03-05 23:39 UTC (permalink / raw) To: Tejun Heo Cc: Myklebust, Trond, Oleg Nesterov, Mandeep Singh Baines, Ming Lei, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Al Viro On Tue, 5 Mar 2013 11:09:23 -0800 Tejun Heo <tj@kernel.org> wrote: > Hello, Jeff. > > On Tue, Mar 05, 2013 at 02:03:12PM -0500, Jeff Layton wrote: > > Sounds intriguing... > > > > I'm not sure what this really means for something like NFS though. How > > would you envision this working when we have long running syscalls that > > might sit waiting in the kernel indefinitely? > > I think it is the same problem as being able to handle SIGKILL in > responsive manner. It could be tricky to implement for nfs but it at > least doesn't have to solve the problem twice. > > > Here's my blue-sky, poorly-thought-out idea... > > > > We could add a signal (e.g. SIGFREEZE) that allows the sleeps in > > NFS/RPC layer to be interrupted. Those would return back toward > > userland with a particular type of error (sort of like ERESTARTSYS). > > > > Before returning from the kernel though, we could freeze the process. > > When it wakes up, then we could go back down and retry the call again > > (much like an ERESTARTSYS kind of thing). > > > > The tricky part here is that we'd need to distinguish between the case > > where we caught SIGFREEZE before sending an RPC vs. after. If we sent > > the call before freezing, then we don't want to resend it again. It > > might be a non-idempotent operation. > > So, yeah, you are thinking pretty much the same as I'm. > > > Sounds horrific to code up though... :) > > I don't know the details of nfs but those events could essentially be > signaling that the system is gonna lose power. I think it would be a > good idea to solve it. > > Thanks. > It would be... So, I briefly considered a similar approach when I was working on the retry-on-ESTALE error patches. It occurred to me that it was somewhat similar to the ERESTARTSYS case, so handling it at a higher level than in the syscall handlers themselves might make sense... Al was in the middle of his signal handling/execve rework though and I ran the idea past him. He pointedly told me that I was crazy for even considering it. This is rather non-trivial to handle since it means mucking around in a bunch of arch-specific code and dealing with all of the weirdo corner cases. Anyone up for working out how to handle a freeze event on a process that already has a pending signal, while it's being ptraced? It's probably best to chat with Al (cc'ed here) before you embark on this plan since he was just in that code recently. In any case, maybe there's also some code consolidation opportunity here too. I suspect at least some of this logic is in arch-specific code when it really needn't be.... -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-05 23:39 ` Jeff Layton @ 2013-03-05 23:47 ` Tejun Heo 2013-03-06 18:16 ` Oleg Nesterov 2013-03-06 18:17 ` Oleg Nesterov 1 sibling, 1 reply; 61+ messages in thread From: Tejun Heo @ 2013-03-05 23:47 UTC (permalink / raw) To: Jeff Layton Cc: Myklebust, Trond, Oleg Nesterov, Mandeep Singh Baines, Ming Lei, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Al Viro Hello, Jeff. On Tue, Mar 05, 2013 at 06:39:41PM -0500, Jeff Layton wrote: > Al was in the middle of his signal handling/execve rework though and I > ran the idea past him. He pointedly told me that I was crazy for even > considering it. This is rather non-trivial to handle since it means > mucking around in a bunch of arch-specific code and dealing with all of > the weirdo corner cases. Hmmm... I tried it a couple years ago while I was working on ptrace improvements to support checkpoint-restart in userland (PTRACE_SEIZE and friends) and had a half-working code. At the time, Oleg was against the idea and something else came up so I didn't push it all the way but IIRC it didn't need to touch any arch code. It should be able to just share the existing trap path with ptrace. > Anyone up for working out how to handle a freeze event on a process > that already has a pending signal, while it's being ptraced? It's > probably best to chat with Al (cc'ed here) before you embark on this > plan since he was just in that code recently. Maybe Al sees something that I don't but AFAICS it should be doable in generic code proper and I don't think it's gonna be that hard. Oleg, are you still opposed to the idea of making freezer share trap points with ptrace? Thanks. -- tejun ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-05 23:47 ` Tejun Heo @ 2013-03-06 18:16 ` Oleg Nesterov 2013-03-06 18:53 ` Tejun Heo 0 siblings, 1 reply; 61+ messages in thread From: Oleg Nesterov @ 2013-03-06 18:16 UTC (permalink / raw) To: Tejun Heo Cc: Jeff Layton, Myklebust, Trond, Mandeep Singh Baines, Ming Lei, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Al Viro On 03/05, Tejun Heo wrote: > > Oleg, are you still opposed to the idea of making freezer share trap > points with ptrace? My memory can fool me, but iirc I wasn't actually opposed... I guess you mean the previous discussion about vfork/ptrace/etc which I forgot completely. But I can recall the main problem with your idea (with me): I simply wasn't able to understand it ;) Likewise, I can't really understand the ideas discussed in this thread. At least when it come to this particular problem, rpc_wait_bit_killable() is not interruptible... And how SIGFREEZE can help? If we want to interrupt the sleeps in NFS/RPC layer we can simply add TASK_WAKEFREEZE (can be used with TASK_KILLABLE) and change freeze_task() to do signal_wake_up_state(TASK_WAKEFREEZE). But if we can do this, then it should be possible so simply make these sleeps TASK_INTERRUPTIBLE ? But it seems that we can't just because we can't always restart, so I am totally confused. Oleg. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-06 18:16 ` Oleg Nesterov @ 2013-03-06 18:53 ` Tejun Heo 2013-03-06 21:00 ` Linus Torvalds 0 siblings, 1 reply; 61+ messages in thread From: Tejun Heo @ 2013-03-06 18:53 UTC (permalink / raw) To: Oleg Nesterov Cc: Jeff Layton, Myklebust, Trond, Mandeep Singh Baines, Ming Lei, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Al Viro Hello, Oleg. On Wed, Mar 06, 2013 at 07:16:08PM +0100, Oleg Nesterov wrote: > And how SIGFREEZE can help? If we want to interrupt the sleeps in NFS/RPC > layer we can simply add TASK_WAKEFREEZE (can be used with TASK_KILLABLE) > and change freeze_task() to do signal_wake_up_state(TASK_WAKEFREEZE). Oh yeah, we don't need another signal. We just need sigpending state and a wakeup. I wasn't really going into details. The important point is that for code paths outside signal/ptrace, freezing could look and behave about the same as signal delivery. > But if we can do this, then it should be possible so simply make these > sleeps TASK_INTERRUPTIBLE ? But it seems that we can't just because we > can't always restart, so I am totally confused. Of course, switching to another freezing mechanism doesn't make issues like this go away in itself, but it would at least allow handling such issues in easier to grasp way rather than dealing with this giant pseudo lock and allows code paths outside jobctl to simply deal with one issue - signal delivery, rather than having to deal with freezing separately. Also, while not completely identical, frozen state would at least be an extension of jobctl trap and it would be possible to allow things like killing frozen tasks or even ptracing them in well-defined manner for cgroup_freezer use cases. As it currently stands, there is no well-defined abstraction for frozen state which we can expose in any way, which sucks. Thanks. -- tejun ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-06 18:53 ` Tejun Heo @ 2013-03-06 21:00 ` Linus Torvalds 2013-03-06 21:24 ` Tejun Heo 0 siblings, 1 reply; 61+ messages in thread From: Linus Torvalds @ 2013-03-06 21:00 UTC (permalink / raw) To: Tejun Heo Cc: Oleg Nesterov, Jeff Layton, Myklebust, Trond, Mandeep Singh Baines, Ming Lei, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Al Viro On Wed, Mar 6, 2013 at 10:53 AM, Tejun Heo <tj@kernel.org> wrote: > Hello, Oleg. > > On Wed, Mar 06, 2013 at 07:16:08PM +0100, Oleg Nesterov wrote: >> And how SIGFREEZE can help? If we want to interrupt the sleeps in NFS/RPC >> layer we can simply add TASK_WAKEFREEZE (can be used with TASK_KILLABLE) >> and change freeze_task() to do signal_wake_up_state(TASK_WAKEFREEZE). > > Oh yeah, we don't need another signal. We just need sigpending state > and a wakeup. I wasn't really going into details. The important > point is that for code paths outside signal/ptrace, freezing could > look and behave about the same as signal delivery. Don't we already do that? The whole "try_to_freeze()" in get_signal_to_deliver() is about exactly this. See fake_signal_wake_up(). You still have kernel threads (that don't do signals) to worry about, so it doesn't make things go away. And you still have issues with latency of disk wait, which is, I think, the reason for that "freezable_schedule()" in the NFS code to begin with. Linus ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-06 21:00 ` Linus Torvalds @ 2013-03-06 21:24 ` Tejun Heo 2013-03-06 21:31 ` Linus Torvalds 0 siblings, 1 reply; 61+ messages in thread From: Tejun Heo @ 2013-03-06 21:24 UTC (permalink / raw) To: Linus Torvalds Cc: Oleg Nesterov, Jeff Layton, Myklebust, Trond, Mandeep Singh Baines, Ming Lei, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Al Viro Hello, Linus. On Wed, Mar 06, 2013 at 01:00:02PM -0800, Linus Torvalds wrote: > > Oh yeah, we don't need another signal. We just need sigpending state > > and a wakeup. I wasn't really going into details. The important > > point is that for code paths outside signal/ptrace, freezing could > > look and behave about the same as signal delivery. > > Don't we already do that? The whole "try_to_freeze()" in > get_signal_to_deliver() is about exactly this. See > fake_signal_wake_up(). Yeap, that was what I had in mind too. Maybe we'll need to modify it slightly but we already have most of the basic stuff. > You still have kernel threads (that don't do signals) to worry about, > so it doesn't make things go away. And you still have issues with > latency of disk wait, which is, I think, the reason for that > "freezable_schedule()" in the NFS code to begin with. I haven't thought about it for quite some time so things are hazy, but here's what I can recall now. With syscall paths out of the way, the surface is reduced a lot. Another part is converting most freezable kthread users to freezable workqueue which provides natural resource boundaries (the duration of work item execution). kthread is already difficult to get the synchronization completely right and significant number of freezable + should_stop users are subtly broken the last time I went over the freezer users. I think we would be much better off converting most over to freezable workqueues which is easier to get right and likely to be less expensive. Freezing happens at work item boundary which in most cases could be made to coincide with the original freezer check point. There could be kthreads which can't be converted to workqueue for whatever reason (there shouldn't be many at this point) but most freezer usages in kthreads are pretty simple. It's usually single or a couple freezer check points in the main loop. While we may still need special handling for them, I don't think they're likely to have implications on issues like this. We probably would want to handle restart for freezable kthreads calling syscalls. Haven't thought about this one too much yet. Maybe freezable kthreads doing syscalls just need to be ready for -ERESTARTSYS? I'm not sure I follow the disk wait latency part. Are you saying that switching to jobctl trap based freezer implementation wouldn't help them? If so, right, it doesn't in itself. It's just changing the infrastructure used for freezing and can't make the underlying synchronization issues just disappear but at least it becomes the same problem as being responsive to SIGKILL rather than a completely separate problem. Thanks. -- tejun ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-06 21:24 ` Tejun Heo @ 2013-03-06 21:31 ` Linus Torvalds 2013-03-06 21:36 ` Tejun Heo 0 siblings, 1 reply; 61+ messages in thread From: Linus Torvalds @ 2013-03-06 21:31 UTC (permalink / raw) To: Tejun Heo Cc: Oleg Nesterov, Jeff Layton, Myklebust, Trond, Mandeep Singh Baines, Ming Lei, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Al Viro On Wed, Mar 6, 2013 at 1:24 PM, Tejun Heo <tj@kernel.org> wrote: > > With syscall paths out of the way, the surface is reduced a lot. So the issue is syscalls that don't react to signals, and that can potentially wait a long time. Like NFS with a network hickup. Which is not at all unlikely. Think wireless network, somebody trying to get on a network share, things not working, and closing the damn lid because you give up. So I do agree that we probably have *too* many of the stupid "let's check if we can freeze", and I suspect that the NFS code should get rid of the "freezable_schedule()" that is causing this warning (because I also agree that you should *not* freeze while holding locks, because it really can cause deadlocks), but I do suspect that network filesystems do need to have a few places where they check for freezing on their own... Exactly because freezing isn't *quite* like a signal. Linus ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-06 21:31 ` Linus Torvalds @ 2013-03-06 21:36 ` Tejun Heo 2013-03-06 21:40 ` Tejun Heo 2013-03-07 11:41 ` Jeff Layton 0 siblings, 2 replies; 61+ messages in thread From: Tejun Heo @ 2013-03-06 21:36 UTC (permalink / raw) To: Linus Torvalds Cc: Oleg Nesterov, Jeff Layton, Myklebust, Trond, Mandeep Singh Baines, Ming Lei, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Al Viro On Wed, Mar 06, 2013 at 01:31:10PM -0800, Linus Torvalds wrote: > So I do agree that we probably have *too* many of the stupid "let's > check if we can freeze", and I suspect that the NFS code should get > rid of the "freezable_schedule()" that is causing this warning > (because I also agree that you should *not* freeze while holding > locks, because it really can cause deadlocks), but I do suspect that > network filesystems do need to have a few places where they check for > freezing on their own... Exactly because freezing isn't *quite* like a > signal. Well, I don't really know much about nfs so I can't really tell, but for most other cases, dealing with freezing like a signal should work fine from what I've seen although I can't be sure before actually trying. Trond, Bruce, can you guys please chime in? Thanks. -- tejun ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-06 21:36 ` Tejun Heo @ 2013-03-06 21:40 ` Tejun Heo 2013-03-13 15:17 ` Jeff Layton 2013-03-07 11:41 ` Jeff Layton 1 sibling, 1 reply; 61+ messages in thread From: Tejun Heo @ 2013-03-06 21:40 UTC (permalink / raw) To: Linus Torvalds Cc: Oleg Nesterov, Jeff Layton, Myklebust, Trond, Mandeep Singh Baines, Ming Lei, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Al Viro On Wed, Mar 06, 2013 at 01:36:36PM -0800, Tejun Heo wrote: > On Wed, Mar 06, 2013 at 01:31:10PM -0800, Linus Torvalds wrote: > > So I do agree that we probably have *too* many of the stupid "let's > > check if we can freeze", and I suspect that the NFS code should get > > rid of the "freezable_schedule()" that is causing this warning > > (because I also agree that you should *not* freeze while holding > > locks, because it really can cause deadlocks), but I do suspect that > > network filesystems do need to have a few places where they check for > > freezing on their own... Exactly because freezing isn't *quite* like a > > signal. > > Well, I don't really know much about nfs so I can't really tell, but > for most other cases, dealing with freezing like a signal should work > fine from what I've seen although I can't be sure before actually > trying. Trond, Bruce, can you guys please chime in? So, I think the question here would be, in nfs, how many of the current freezer check points would be difficult to conver to signal handling model after excluding the ones which are performed while holding some locks which we need to get rid of anyway? Thanks. -- tejun ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-06 21:40 ` Tejun Heo @ 2013-03-13 15:17 ` Jeff Layton 2013-03-31 0:07 ` Paul Walmsley 0 siblings, 1 reply; 61+ messages in thread From: Jeff Layton @ 2013-03-13 15:17 UTC (permalink / raw) To: Tejun Heo Cc: Linus Torvalds, Oleg Nesterov, Myklebust, Trond, Mandeep Singh Baines, Ming Lei, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Al Viro, linux-cifs On Wed, 6 Mar 2013 13:40:16 -0800 Tejun Heo <tj@kernel.org> wrote: > On Wed, Mar 06, 2013 at 01:36:36PM -0800, Tejun Heo wrote: > > On Wed, Mar 06, 2013 at 01:31:10PM -0800, Linus Torvalds wrote: > > > So I do agree that we probably have *too* many of the stupid "let's > > > check if we can freeze", and I suspect that the NFS code should get > > > rid of the "freezable_schedule()" that is causing this warning > > > (because I also agree that you should *not* freeze while holding > > > locks, because it really can cause deadlocks), but I do suspect that > > > network filesystems do need to have a few places where they check for > > > freezing on their own... Exactly because freezing isn't *quite* like a > > > signal. > > > > Well, I don't really know much about nfs so I can't really tell, but > > for most other cases, dealing with freezing like a signal should work > > fine from what I've seen although I can't be sure before actually > > trying. Trond, Bruce, can you guys please chime in? > > So, I think the question here would be, in nfs, how many of the > current freezer check points would be difficult to conver to signal > handling model after excluding the ones which are performed while > holding some locks which we need to get rid of anyway? > I think we can do this, but it isn't trivial. Here's what I'd envision, but there are still many details that would need to be worked out... Basically what we'd need is a way to go into TASK_KILLABLE sleep, but still allow the freezer to wake these processes up. I guess that likely means we need a new sleep state (TASK_FREEZABLE?). We'd also need a way to return an -ERESTARTSYS like error (-EFREEZING?) that tells the upper syscall handling layers to freeze the task and then restart the syscall after waking back up. Maybe we don't need a new error at all and -ERESTARTSYS is fine here? We also need to consider the effects vs. audit code here, but that may be OK with the overhaul that Al merged a few months ago. Assuming we have those, then we need to fix up the NFS and RPC layers to use this stuff: 1/ Prior to submitting a new RPC, we'd look and see whether "current" is being frozen. If it is, then return -EFREEZING immediately without doing anything. 2/ We might also need to retrofit certain stages in the RPC FSM to return -EFREEZING too if it's a synchronous RPC and the task is being frozen. 3/ A task is waiting for an RPC reply on an async RPC, we'd need to use this new sleep state. If the process wakes up because something wants it to freeze, then have it go back to sleep for a short period of time to try and wait for the reply (up to 15s or so?). If we get the reply, great -- return to userland and freeze the task there. If the reply never comes in, give up on it and just return -EFREEZE and hope for the best. We might have to make this latter behavior contingent on a new mount option (maybe "-o slushy" like Trond recommended). The current "hard" and "soft" semantics don't really fit this situation correctly. Of course, this is all a lot of work, and not something we can shove into the kernel for 3.9 at this point. In the meantime, while Mandeep's warning is correctly pointing out a problem, I think we ought to back it out until we can fix this properly. We're already getting a ton of reports on the mailing lists and in the fedora bug tracker for this warning. Part of the problem is the verbiage -- "BUG" makes people think "Oops", but this is really just a warning. We should also note that this is a problem too in the CIFS code since it uses a similar mechanism for allowing the kernel to suspend while waiting on SMB replies. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-13 15:17 ` Jeff Layton @ 2013-03-31 0:07 ` Paul Walmsley 0 siblings, 0 replies; 61+ messages in thread From: Paul Walmsley @ 2013-03-31 0:07 UTC (permalink / raw) To: Jeff Layton, Linus Torvalds, Andrew Morton Cc: Tejun Heo, Oleg Nesterov, Myklebust, Trond, Mandeep Singh Baines, Ming Lei, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Ingo Molnar, Al Viro, linux-cifs, linux-omap On Wed, 13 Mar 2013, Jeff Layton wrote: > Of course, this is all a lot of work, and not something we can shove > into the kernel for 3.9 at this point. In the meantime, while Mandeep's > warning is correctly pointing out a problem, I think we ought to back > it out until we can fix this properly. Revert posted at: http://marc.info/?l=linux-kernel&m=136468829626184&w=2 Andrew or Linus, care to pick this up before for 3.9-rc? - Paul ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-06 21:36 ` Tejun Heo 2013-03-06 21:40 ` Tejun Heo @ 2013-03-07 11:41 ` Jeff Layton 2013-03-07 15:25 ` Tejun Heo 2013-03-07 15:55 ` Linus Torvalds 1 sibling, 2 replies; 61+ messages in thread From: Jeff Layton @ 2013-03-07 11:41 UTC (permalink / raw) To: Tejun Heo Cc: Linus Torvalds, Oleg Nesterov, Myklebust, Trond, Mandeep Singh Baines, Ming Lei, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Al Viro On Wed, 6 Mar 2013 13:36:36 -0800 Tejun Heo <tj@kernel.org> wrote: > On Wed, Mar 06, 2013 at 01:31:10PM -0800, Linus Torvalds wrote: > > So I do agree that we probably have *too* many of the stupid "let's > > check if we can freeze", and I suspect that the NFS code should get > > rid of the "freezable_schedule()" that is causing this warning > > (because I also agree that you should *not* freeze while holding > > locks, because it really can cause deadlocks), but I do suspect that > > network filesystems do need to have a few places where they check for > > freezing on their own... Exactly because freezing isn't *quite* like a > > signal. > > Well, I don't really know much about nfs so I can't really tell, but > for most other cases, dealing with freezing like a signal should work > fine from what I've seen although I can't be sure before actually > trying. Trond, Bruce, can you guys please chime in? > > Thanks. > (hopefully this isn't tl;dr) It's not quite that simple... The problem (as Trond already mentioned) is non-idempotent operations. You can't just restart certain operations from scratch once you reach a certain point. Here's an example: Suppose I call unlink("somefile"); on an NFS mount. We take all of the VFS locks, go down into the NFS layer. That marshals up the UNLINK call, sends it off to the server, and waits for the reply. While we're waiting, a freeze event comes in and we start returning from the kernel with our new -EFREEZE return code that works sort of like -ERESTARTSYS. Meanwhile, the server is processing the UNLINK call and removes the file. A little while later we wake up the machine and it goes to try and pick up where it left off. What do we do now? Suppose we pretend we never sent the call in the first place, marshal up a new RPC and send it again. This is problematic -- the server will probably send back the equivalent of ENOENT. How do we know whether the file never existed in the first place, or whether the server processed the original call and removed the file then? Do we instead try and keep track of whether the RPC has been sent and just wait for the reply on the original call? That's tricky too -- it means adding an extra codepath to check for these sorts of restarts in a bunch of different ops vectors into the filesystem. We also have to somehow keep track of this state too (I guess by hanging something off the task_struct). Note too that the above is the simple case. We're dropping the parent's i_mutex during the freeze. Suppose when we restart the call that the parent directory has changed in such a way that the original lookup we did to do the original RPC is no longer valid? I think Trond may be on the right track. We probably need some mechanism to quiesce the filesystem ahead of any sort of freezer event. That quiesce could simply wait on any in flight RPCs to come back, and not allow any new ones to go out. On syscalls where the RPC didn't go out, we'd just return -EFREEZE or whatever and let the upper layers restart the call after waking back up. Writeback would be tricky, but that can be handled too. The catch here is that it's quite possible that when we need to quiesce that we've lost communications with the server. We don't want to hold up the freezer at that point so the wait for replies has to be bounded in time somehow. If that times out, we probably just have to return all calls with our new -EFREEZE return and hope for the best when the machine wakes back up. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-07 11:41 ` Jeff Layton @ 2013-03-07 15:25 ` Tejun Heo 2013-03-07 15:55 ` Linus Torvalds 1 sibling, 0 replies; 61+ messages in thread From: Tejun Heo @ 2013-03-07 15:25 UTC (permalink / raw) To: Jeff Layton Cc: Linus Torvalds, Oleg Nesterov, Myklebust, Trond, Mandeep Singh Baines, Ming Lei, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Al Viro Hello, Jeff. On Thu, Mar 07, 2013 at 06:41:40AM -0500, Jeff Layton wrote: > Suppose I call unlink("somefile"); on an NFS mount. We take all of the > VFS locks, go down into the NFS layer. That marshals up the UNLINK > call, sends it off to the server, and waits for the reply. While we're > waiting, a freeze event comes in and we start returning from the > kernel with our new -EFREEZE return code that works sort of like > -ERESTARTSYS. Meanwhile, the server is processing the UNLINK call and > removes the file. A little while later we wake up the machine and it > goes to try and pick up where it left off. But you can't freeze regardless of the freezing mechanism in such cases, right? The current code which allows freezing while such operations are in progress is broken as it can lead to freezer deadlocks. They should go away no matter how we implement freezer, so the question is not whether we can move all the existing freezing points to signal mechanism but that, after removing the deadlock-prone ones, how many would be difficult to convert. I'm fully speculating but my suspicion is not too many if you remove (or update somehow) the ones which are being done with some locks held. > The catch here is that it's quite possible that when we need to quiesce > that we've lost communications with the server. We don't want to hold > up the freezer at that point so the wait for replies has to be bounded > in time somehow. If that times out, we probably just have to return all > calls with our new -EFREEZE return and hope for the best when the > machine wakes back up. Sure, a separate prep step may be helpful but assuming a user nfs-mounting stuff on a laptop, I'm not sure how reliable that can be made. People move around with laptops, wifi can be iffy and the lid can be shut at any moment. I don't think it's possible for nfs to be laptop friendly while staying completely correct. Designing such a network filesystem probably is possible with transactions and whatnot but AFAIU nfs isn't designed that way. If such use case is something nfs wants to support, I think it just should make do with some middleground - ie. just implement a mount switch which says "retry operations across network / power problems" and explain the implications. Thanks. -- tejun ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-07 11:41 ` Jeff Layton 2013-03-07 15:25 ` Tejun Heo @ 2013-03-07 15:55 ` Linus Torvalds 2013-03-07 15:59 ` Myklebust, Trond 2013-03-07 16:00 ` Tejun Heo 1 sibling, 2 replies; 61+ messages in thread From: Linus Torvalds @ 2013-03-07 15:55 UTC (permalink / raw) To: Jeff Layton Cc: Tejun Heo, Oleg Nesterov, Myklebust, Trond, Mandeep Singh Baines, Ming Lei, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Al Viro On Thu, Mar 7, 2013 at 3:41 AM, Jeff Layton <jlayton@redhat.com> wrote: > > I think Trond may be on the right track. We probably need some > mechanism to quiesce the filesystem ahead of any sort of freezer > event. No, guys. That cannot work. It's a completely moronic idea. Let me count the way: (a) it's just another form of saying "lock". But since other things are (by definition) going on when it happens, it will just cause deadlocks. (b) the freeze event might not even be system-global. So *some* processes (a cgroup) might freeze, others would not. You can't shut off the filesystem just because some processes migth freeze. (c) it just moves the same issue somewhere else. If you have some operation that must be done under the lock, then such an operation must be completed before you've quiesced the filesystem, which is your whole point of that "quiesce" event. BUT THAT'S THE EXACT SAME ISSUE AS NOT ALLOWING THE FREEZE TO HAPPEN DURING THAT TIME. In other words, that suggestion not only introduces new problems (a), it's fundamentally broken anyway (b) *AND* it doesn't even solve anything, it just moves it around. The solution is damn simple: if you're in some kind of "atomic region", then you cannot freeze. Seriously. SO DON'T CALL "freezable_schedule()", FOR CHRISSAKE! You clearly aren't freezable! Which is exactly what the new lockdep warning was all about. Don't try to move the problem around, when it's quite clear where the problem is. If you need to do something uninterruptible, you do not say "oh, I'm freezable". Because freezing is by definition an interruption. Seriously, it's that simple. Linus ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-07 15:55 ` Linus Torvalds @ 2013-03-07 15:59 ` Myklebust, Trond 2013-03-07 16:25 ` Linus Torvalds 2013-03-07 16:00 ` Tejun Heo 1 sibling, 1 reply; 61+ messages in thread From: Myklebust, Trond @ 2013-03-07 15:59 UTC (permalink / raw) To: Linus Torvalds Cc: Jeff Layton, Tejun Heo, Oleg Nesterov, Mandeep Singh Baines, Ming Lei, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Al Viro On Thu, 2013-03-07 at 07:55 -0800, Linus Torvalds wrote: > On Thu, Mar 7, 2013 at 3:41 AM, Jeff Layton <jlayton@redhat.com> wrote: > > > > I think Trond may be on the right track. We probably need some > > mechanism to quiesce the filesystem ahead of any sort of freezer > > event. > > No, guys. That cannot work. It's a completely moronic idea. Let me > count the way: > > (a) it's just another form of saying "lock". But since other things > are (by definition) going on when it happens, it will just cause > deadlocks. > > (b) the freeze event might not even be system-global. So *some* > processes (a cgroup) might freeze, others would not. You can't shut > off the filesystem just because some processes migth freeze. That's the whole bloody problem in a nutshell. We only want to freeze the filesystem when the network goes down, and in that case we want time to clean up first. That's why it needs to be initiated by something like NetworkManager _before_ the network is shut down. > (c) it just moves the same issue somewhere else. If you have some > operation that must be done under the lock, then such an operation > must be completed before you've quiesced the filesystem, which is your > whole point of that "quiesce" event. BUT THAT'S THE EXACT SAME ISSUE > AS NOT ALLOWING THE FREEZE TO HAPPEN DURING THAT TIME. > > In other words, that suggestion not only introduces new problems (a), > it's fundamentally broken anyway (b) *AND* it doesn't even solve > anything, it just moves it around. > > The solution is damn simple: if you're in some kind of "atomic > region", then you cannot freeze. Seriously. SO DON'T CALL > "freezable_schedule()", FOR CHRISSAKE! You clearly aren't freezable! > > Which is exactly what the new lockdep warning was all about. Don't try > to move the problem around, when it's quite clear where the problem > is. If you need to do something uninterruptible, you do not say "oh, > I'm freezable". Because freezing is by definition an interruption. > Seriously, it's that simple. It _shouldn't_ be an interruption unless the filesystem can't make progress. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-07 15:59 ` Myklebust, Trond @ 2013-03-07 16:25 ` Linus Torvalds 2013-03-07 16:45 ` Myklebust, Trond 2013-03-07 20:55 ` Rafael J. Wysocki 0 siblings, 2 replies; 61+ messages in thread From: Linus Torvalds @ 2013-03-07 16:25 UTC (permalink / raw) To: Myklebust, Trond Cc: Jeff Layton, Tejun Heo, Oleg Nesterov, Mandeep Singh Baines, Ming Lei, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Al Viro On Thu, Mar 7, 2013 at 7:59 AM, Myklebust, Trond <Trond.Myklebust@netapp.com> wrote: > > It _shouldn't_ be an interruption unless the filesystem can't make > progress. So how can we tell? Calling "freezable_schedule()" if you're not ready to be frozen is not good. And nobody but the NFS code can know. You might want to introduce some counter that counts number of outstanding non-interruptible events, and only call the "freezable" version if that counter is zero. A better alternative might be to *never* call the freezable version. Because those freezable_*() things are really quite disgusting, and are wrong - they don't actually freeze the process, they say "I don't care if you freeze me while I sleep", and you might actually wake up *while* the system is being frozen. I think the whole concept is broken. Rafaei - comments? The function really is crap, regardless of any unrelated NFS problems. So what NFS could do instead is actually check the "do I need to freeze" flag, and if you need to freeze you consider it an abort - and do *not* try to continue. Just freeze, and then act as if the machine got rebooted as far as NFS was concerned. That should work anyway, no? That does sound a lot more complex, though. Linus ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-07 16:25 ` Linus Torvalds @ 2013-03-07 16:45 ` Myklebust, Trond 2013-03-07 17:03 ` Linus Torvalds 2013-03-07 20:55 ` Rafael J. Wysocki 1 sibling, 1 reply; 61+ messages in thread From: Myklebust, Trond @ 2013-03-07 16:45 UTC (permalink / raw) To: Linus Torvalds Cc: Jeff Layton, Tejun Heo, Oleg Nesterov, Mandeep Singh Baines, Ming Lei, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Al Viro On Thu, 2013-03-07 at 08:25 -0800, Linus Torvalds wrote: > On Thu, Mar 7, 2013 at 7:59 AM, Myklebust, Trond > <Trond.Myklebust@netapp.com> wrote: > > > > It _shouldn't_ be an interruption unless the filesystem can't make > > progress. > > So how can we tell? Calling "freezable_schedule()" if you're not ready > to be frozen is not good. And nobody but the NFS code can know. > > You might want to introduce some counter that counts number of > outstanding non-interruptible events, and only call the "freezable" > version if that counter is zero. > > A better alternative might be to *never* call the freezable version. > Because those freezable_*() things are really quite disgusting, and > are wrong - they don't actually freeze the process, they say "I don't > care if you freeze me while I sleep", and you might actually wake up > *while* the system is being frozen. I think the whole concept is > broken. Rafaei - comments? The function really is crap, regardless of > any unrelated NFS problems. > > So what NFS could do instead is actually check the "do I need to > freeze" flag, and if you need to freeze you consider it an abort - and > do *not* try to continue. Just freeze, and then act as if the machine > got rebooted as far as NFS was concerned. That should work anyway, no? > > That does sound a lot more complex, though. The problem there is that we get into the whole 'hard' vs 'soft' mount problem. We're supposed to guarantee data integrity for 'hard' mounts, so no funny business is allowed. OTOH, 'soft' mounts time out and return EIO to the application anyway, and so shouldn't be a problem. Perhaps we could add a '-oslushy' mount option :-) that guarantees data integrity for all situations _except_ ENETDOWN/ENETUNREACH? -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-07 16:45 ` Myklebust, Trond @ 2013-03-07 17:03 ` Linus Torvalds 2013-03-07 17:16 ` Myklebust, Trond 2013-03-08 14:01 ` Ingo Molnar 0 siblings, 2 replies; 61+ messages in thread From: Linus Torvalds @ 2013-03-07 17:03 UTC (permalink / raw) To: Myklebust, Trond Cc: Jeff Layton, Tejun Heo, Oleg Nesterov, Mandeep Singh Baines, Ming Lei, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Al Viro On Thu, Mar 7, 2013 at 8:45 AM, Myklebust, Trond <Trond.Myklebust@netapp.com> wrote: > > The problem there is that we get into the whole 'hard' vs 'soft' mount > problem. We're supposed to guarantee data integrity for 'hard' mounts, > so no funny business is allowed. OTOH, 'soft' mounts time out and return > EIO to the application anyway, and so shouldn't be a problem. > > Perhaps we could add a '-oslushy' mount option :-) that guarantees data > integrity for all situations _except_ ENETDOWN/ENETUNREACH? I do think we are probably over-analyzing this. It's not like people who want freezing to work usually use flaky NFS. There's really two main groups: - the "freezer as a snapshot mechanism" that might use NFS because they are in a server environment. - the "freeezer for suspend/resume on a laptop" The first one does use NFS, and cares about it, and probably would prefer the freeze event to take longer and finish for all ongoing IO operations. End result: just ditch the "freezable_schedule()" entirely. The second one is unlikely to really use NFS anyway. End result: ditching the freezable_schedule() is probably perfectly fine, even if it would cause suspend failures if the network is being troublesome. So for now, why not just replace freezable_schedule() with plain schedule() in the NFS code, and ignore it until somebody actually complains about it, and then aim to try to do something more targeted for that particular complaint? Linus ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-07 17:03 ` Linus Torvalds @ 2013-03-07 17:16 ` Myklebust, Trond 2013-03-07 21:43 ` Jeff Layton 2013-03-08 14:01 ` Ingo Molnar 1 sibling, 1 reply; 61+ messages in thread From: Myklebust, Trond @ 2013-03-07 17:16 UTC (permalink / raw) To: Linus Torvalds Cc: Jeff Layton, Tejun Heo, Oleg Nesterov, Mandeep Singh Baines, Ming Lei, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Al Viro On Thu, 2013-03-07 at 09:03 -0800, Linus Torvalds wrote: > On Thu, Mar 7, 2013 at 8:45 AM, Myklebust, Trond > <Trond.Myklebust@netapp.com> wrote: > > > > The problem there is that we get into the whole 'hard' vs 'soft' mount > > problem. We're supposed to guarantee data integrity for 'hard' mounts, > > so no funny business is allowed. OTOH, 'soft' mounts time out and return > > EIO to the application anyway, and so shouldn't be a problem. > > > > Perhaps we could add a '-oslushy' mount option :-) that guarantees data > > integrity for all situations _except_ ENETDOWN/ENETUNREACH? > > I do think we are probably over-analyzing this. It's not like people > who want freezing to work usually use flaky NFS. There's really two > main groups: > > - the "freezer as a snapshot mechanism" that might use NFS because > they are in a server environment. > > - the "freeezer for suspend/resume on a laptop" > > The first one does use NFS, and cares about it, and probably would > prefer the freeze event to take longer and finish for all ongoing IO > operations. End result: just ditch the "freezable_schedule()" > entirely. > > The second one is unlikely to really use NFS anyway. End result: > ditching the freezable_schedule() is probably perfectly fine, even if > it would cause suspend failures if the network is being troublesome. > > So for now, why not just replace freezable_schedule() with plain > schedule() in the NFS code, and ignore it until somebody actually > complains about it, and then aim to try to do something more targeted > for that particular complaint? We _have_ had complaints about the laptop suspension problem; that was why Jeff introduced freezable_schedule() in the first place. We've never had complaints about any problems involvinf cgroup_freeze. This is why our focus tends to be on the former, and why I'm more worried about laptop suspend regressions for any short term fixes. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-07 17:16 ` Myklebust, Trond @ 2013-03-07 21:43 ` Jeff Layton 0 siblings, 0 replies; 61+ messages in thread From: Jeff Layton @ 2013-03-07 21:43 UTC (permalink / raw) To: Myklebust, Trond Cc: Linus Torvalds, Tejun Heo, Oleg Nesterov, Mandeep Singh Baines, Ming Lei, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Al Viro On Thu, 7 Mar 2013 17:16:12 +0000 "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote: > On Thu, 2013-03-07 at 09:03 -0800, Linus Torvalds wrote: > > On Thu, Mar 7, 2013 at 8:45 AM, Myklebust, Trond > > <Trond.Myklebust@netapp.com> wrote: > > > > > > The problem there is that we get into the whole 'hard' vs 'soft' mount > > > problem. We're supposed to guarantee data integrity for 'hard' mounts, > > > so no funny business is allowed. OTOH, 'soft' mounts time out and return > > > EIO to the application anyway, and so shouldn't be a problem. > > > > > > Perhaps we could add a '-oslushy' mount option :-) that guarantees data > > > integrity for all situations _except_ ENETDOWN/ENETUNREACH? > > > > I do think we are probably over-analyzing this. It's not like people > > who want freezing to work usually use flaky NFS. There's really two > > main groups: > > > > - the "freezer as a snapshot mechanism" that might use NFS because > > they are in a server environment. > > > > - the "freeezer for suspend/resume on a laptop" > > > > The first one does use NFS, and cares about it, and probably would > > prefer the freeze event to take longer and finish for all ongoing IO > > operations. End result: just ditch the "freezable_schedule()" > > entirely. > > > > The second one is unlikely to really use NFS anyway. End result: > > ditching the freezable_schedule() is probably perfectly fine, even if > > it would cause suspend failures if the network is being troublesome. > > > > So for now, why not just replace freezable_schedule() with plain > > schedule() in the NFS code, and ignore it until somebody actually > > complains about it, and then aim to try to do something more targeted > > for that particular complaint? > > We _have_ had complaints about the laptop suspension problem; that was > why Jeff introduced freezable_schedule() in the first place. We've never > had complaints about any problems involvinf cgroup_freeze. This is why > our focus tends to be on the former, and why I'm more worried about > laptop suspend regressions for any short term fixes. > Right. I don't know of anyone using the cgroup_freeze stuff. OTOH, I've seen *many* reports of people who complained because their machine didn't suspend because of a busy NFS mount. In fact, the problem is still not fully "solved" with the freezable_schedule stuff we have so far anyway. Even after this (admittedly crappy) solution went in, we still had reports of machines that failed to suspend because other tasks were stuck in wait_on_page_writeback() for NFS pages. So, in principle I think we need to abandon the current approach. The question is what should replace it. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-07 17:03 ` Linus Torvalds 2013-03-07 17:16 ` Myklebust, Trond @ 2013-03-08 14:01 ` Ingo Molnar 1 sibling, 0 replies; 61+ messages in thread From: Ingo Molnar @ 2013-03-08 14:01 UTC (permalink / raw) To: Linus Torvalds Cc: Myklebust, Trond, Jeff Layton, Tejun Heo, Oleg Nesterov, Mandeep Singh Baines, Ming Lei, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Al Viro * Linus Torvalds <torvalds@linux-foundation.org> wrote: > - the "freeezer for suspend/resume on a laptop" > > [...] > > The second one is unlikely to really use NFS anyway. [...] <me raises a hand> Incidentally I use NFS to a file server on my laptop, over wifi, and I close the lid for the night. It's NFS mounted soft. If it was mounted hard I wouldn't expect the lid close event to result in a successful suspend if wifi went down - but with soft I would be pretty sad in the morning if batteries drained if I closed the lid after there's a power outage which took down both wifi and the laptop power supply. Closing the lid while on battery is a pretty strong command from the user. Arguably this is a special case on several levels as on desktop distros it's not at all easy mount NFS shares (all point & click automation goes to Samba shares), just wanted to mention it. Thanks, Ingo ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-07 16:25 ` Linus Torvalds 2013-03-07 16:45 ` Myklebust, Trond @ 2013-03-07 20:55 ` Rafael J. Wysocki 1 sibling, 0 replies; 61+ messages in thread From: Rafael J. Wysocki @ 2013-03-07 20:55 UTC (permalink / raw) To: Linus Torvalds Cc: Myklebust, Trond, Jeff Layton, Tejun Heo, Oleg Nesterov, Mandeep Singh Baines, Ming Lei, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Andrew Morton, Ingo Molnar, Al Viro On Thursday, March 07, 2013 08:25:10 AM Linus Torvalds wrote: > On Thu, Mar 7, 2013 at 7:59 AM, Myklebust, Trond > <Trond.Myklebust@netapp.com> wrote: > > > > It _shouldn't_ be an interruption unless the filesystem can't make > > progress. > > So how can we tell? Calling "freezable_schedule()" if you're not ready > to be frozen is not good. And nobody but the NFS code can know. > > You might want to introduce some counter that counts number of > outstanding non-interruptible events, and only call the "freezable" > version if that counter is zero. > > A better alternative might be to *never* call the freezable version. > Because those freezable_*() things are really quite disgusting, and > are wrong - they don't actually freeze the process, they say "I don't > care if you freeze me while I sleep", and you might actually wake up > *while* the system is being frozen. I think the whole concept is > broken. Rafaei - comments? Well, the only comment I have is that the source of these functions was commit d310310c (Freezer / sunrpc / NFS: don't allow TASK_KILLABLE sleeps to block the freezer) and they were added because people were complaining that they couldn't suspend while their NFS servers were not responding, IIRC. I agree that they are not really nice. Thanks, Rafael -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-07 15:55 ` Linus Torvalds 2013-03-07 15:59 ` Myklebust, Trond @ 2013-03-07 16:00 ` Tejun Heo 1 sibling, 0 replies; 61+ messages in thread From: Tejun Heo @ 2013-03-07 16:00 UTC (permalink / raw) To: Linus Torvalds Cc: Jeff Layton, Oleg Nesterov, Myklebust, Trond, Mandeep Singh Baines, Ming Lei, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Al Viro Hello, Linus. On Thu, Mar 07, 2013 at 07:55:39AM -0800, Linus Torvalds wrote: > In other words, that suggestion not only introduces new problems (a), > it's fundamentally broken anyway (b) *AND* it doesn't even solve > anything, it just moves it around. I don't think it's gonna solve the problems we're talking about but the "moving around" part could be useful nonetheless - e.g. we don't want to shut down network before nfs while suspending. Thanks. -- tejun ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-05 23:39 ` Jeff Layton 2013-03-05 23:47 ` Tejun Heo @ 2013-03-06 18:17 ` Oleg Nesterov 2013-03-06 18:40 ` Jeff Layton 1 sibling, 1 reply; 61+ messages in thread From: Oleg Nesterov @ 2013-03-06 18:17 UTC (permalink / raw) To: Jeff Layton Cc: Tejun Heo, Myklebust, Trond, Mandeep Singh Baines, Ming Lei, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Al Viro On 03/05, Jeff Layton wrote: > > Anyone up for working out how to handle a freeze event on a process > that already has a pending signal, while it's being ptraced? Could you explain the problem? Oleg. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-06 18:17 ` Oleg Nesterov @ 2013-03-06 18:40 ` Jeff Layton 2013-03-06 18:45 ` Tejun Heo 0 siblings, 1 reply; 61+ messages in thread From: Jeff Layton @ 2013-03-06 18:40 UTC (permalink / raw) To: Oleg Nesterov Cc: Tejun Heo, Myklebust, Trond, Mandeep Singh Baines, Ming Lei, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Al Viro On Wed, 6 Mar 2013 19:17:35 +0100 Oleg Nesterov <oleg@redhat.com> wrote: > On 03/05, Jeff Layton wrote: > > > > Anyone up for working out how to handle a freeze event on a process > > that already has a pending signal, while it's being ptraced? > > Could you explain the problem? > Not very well. I was just saying that the signal/ptrace stuff is very complicated already. Now we're looking at mixing in the freezer too, which further increases the complexity. Though when I said that, it was before Tejun mentioned hooking this up to ptrace. I'll confess that I don't fully understand what he's proposing either though... -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-06 18:40 ` Jeff Layton @ 2013-03-06 18:45 ` Tejun Heo 0 siblings, 0 replies; 61+ messages in thread From: Tejun Heo @ 2013-03-06 18:45 UTC (permalink / raw) To: Jeff Layton Cc: Oleg Nesterov, Myklebust, Trond, Mandeep Singh Baines, Ming Lei, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Al Viro Hello, On Wed, Mar 06, 2013 at 01:40:04PM -0500, Jeff Layton wrote: > Though when I said that, it was before Tejun mentioned hooking this up > to ptrace. I'll confess that I don't fully understand what he's > proposing either though... Oh, they're all just pretty closely related. All signal and ptrace traps run off get_signal_to_deliver(). It is still a bit hairy but in much better shape after cleanups which accompanied PTRACE_SEIZE and I don't really think it'll be too difficult to extend it so that freezing is implemented as a type of jobctl trap. Thanks. -- tejun ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-05 19:03 ` Jeff Layton 2013-03-05 19:09 ` Tejun Heo @ 2013-03-06 1:10 ` Myklebust, Trond 2013-03-06 1:14 ` Tejun Heo 2013-03-06 12:00 ` Jeff Layton 1 sibling, 2 replies; 61+ messages in thread From: Myklebust, Trond @ 2013-03-06 1:10 UTC (permalink / raw) To: Jeff Layton Cc: Tejun Heo, Oleg Nesterov, Mandeep Singh Baines, Ming Lei, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Ingo Molnar On Tue, 2013-03-05 at 14:03 -0500, Jeff Layton wrote: > On Tue, 5 Mar 2013 09:49:54 -0800 > Tejun Heo <tj@kernel.org> wrote: > > > On Tue, Mar 05, 2013 at 09:46:48AM -0800, Tejun Heo wrote: > > > So, I think this is why implementing freezer as a separate blocking > > > mechanism isn't such a good idea. We're effectively introducing a > > > completely new waiting state to a lot of unsuspecting paths which > > > generates a lot of risks and eventually extra complexity to work > > > around those. I think we really should update freezer to re-use the > > > blocking points we already have - the ones used for signal delivery > > > and ptracing. That way, other code paths don't have to worry about an > > > extra stop state and we can confine most complexities to freezer > > > proper. > > > > Also, consolidating those wait states means that we can solve the > > event-to-response latency problem for all three cases - signal, ptrace > > and freezer, rather than adding separate backing-out strategy for > > freezer. > > > > Sounds intriguing... > > I'm not sure what this really means for something like NFS though. How > would you envision this working when we have long running syscalls that > might sit waiting in the kernel indefinitely? > > Here's my blue-sky, poorly-thought-out idea... > > We could add a signal (e.g. SIGFREEZE) that allows the sleeps in > NFS/RPC layer to be interrupted. Those would return back toward > userland with a particular type of error (sort of like ERESTARTSYS). > > Before returning from the kernel though, we could freeze the process. > When it wakes up, then we could go back down and retry the call again > (much like an ERESTARTSYS kind of thing). Two (three?) show stopper words for you: "non-idempotent operations". Not all RPC calls can just be interrupted and restarted. Something like an exclusive file create, unlink, file locking attempt, etc may give rise to different results when replayed in the above scenario. Interrupting an RPC call is not equivalent to cancelling its effects... -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-06 1:10 ` Myklebust, Trond @ 2013-03-06 1:14 ` Tejun Heo 2013-03-06 1:28 ` Tejun Heo 2013-03-06 12:00 ` Jeff Layton 1 sibling, 1 reply; 61+ messages in thread From: Tejun Heo @ 2013-03-06 1:14 UTC (permalink / raw) To: Myklebust, Trond Cc: Jeff Layton, Oleg Nesterov, Mandeep Singh Baines, Ming Lei, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Ingo Molnar Hello, Trond. On Wed, Mar 06, 2013 at 01:10:07AM +0000, Myklebust, Trond wrote: > Not all RPC calls can just be interrupted and restarted. Something like > an exclusive file create, unlink, file locking attempt, etc may give > rise to different results when replayed in the above scenario. > Interrupting an RPC call is not equivalent to cancelling its effects... Then, the operation simply isn't freezable while in progress and should be on the receiving end of failed-to-freeze error message and users who depend on suspend/hibernation working properly should be advised away from using nfs. It doesn't really changes anything. The current code is buggy. We're just not enforcing the rules strictly and people aren't hitting the bug often enough. Thanks. -- tejun ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-06 1:14 ` Tejun Heo @ 2013-03-06 1:28 ` Tejun Heo 0 siblings, 0 replies; 61+ messages in thread From: Tejun Heo @ 2013-03-06 1:28 UTC (permalink / raw) To: Myklebust, Trond Cc: Jeff Layton, Oleg Nesterov, Mandeep Singh Baines, Ming Lei, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Ingo Molnar On Tue, Mar 05, 2013 at 05:14:23PM -0800, Tejun Heo wrote: > Then, the operation simply isn't freezable while in progress and > should be on the receiving end of failed-to-freeze error message and > users who depend on suspend/hibernation working properly should be > advised away from using nfs. > > It doesn't really changes anything. The current code is buggy. We're > just not enforcing the rules strictly and people aren't hitting the > bug often enough. Just one more thing. Also, not being able to do retries without side-effect doesn't mean it can't do retries at all. There are syscalls which need to do things differently on re-entry (forgot which one but there are several). They record the necessary state in the restart block and resume from where they left off on re-entry. It sure is hairy but is doable and we already have supporting infrastructure. Not sure whether that would be worthwhile tho. Thanks. -- tejun ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-06 1:10 ` Myklebust, Trond 2013-03-06 1:14 ` Tejun Heo @ 2013-03-06 12:00 ` Jeff Layton 1 sibling, 0 replies; 61+ messages in thread From: Jeff Layton @ 2013-03-06 12:00 UTC (permalink / raw) To: Myklebust, Trond Cc: Tejun Heo, Oleg Nesterov, Mandeep Singh Baines, Ming Lei, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Ingo Molnar On Wed, 6 Mar 2013 01:10:07 +0000 "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote: > On Tue, 2013-03-05 at 14:03 -0500, Jeff Layton wrote: > > On Tue, 5 Mar 2013 09:49:54 -0800 > > Tejun Heo <tj@kernel.org> wrote: > > > > > On Tue, Mar 05, 2013 at 09:46:48AM -0800, Tejun Heo wrote: > > > > So, I think this is why implementing freezer as a separate blocking > > > > mechanism isn't such a good idea. We're effectively introducing a > > > > completely new waiting state to a lot of unsuspecting paths which > > > > generates a lot of risks and eventually extra complexity to work > > > > around those. I think we really should update freezer to re-use the > > > > blocking points we already have - the ones used for signal delivery > > > > and ptracing. That way, other code paths don't have to worry about an > > > > extra stop state and we can confine most complexities to freezer > > > > proper. > > > > > > Also, consolidating those wait states means that we can solve the > > > event-to-response latency problem for all three cases - signal, ptrace > > > and freezer, rather than adding separate backing-out strategy for > > > freezer. > > > > > > > Sounds intriguing... > > > > I'm not sure what this really means for something like NFS though. How > > would you envision this working when we have long running syscalls that > > might sit waiting in the kernel indefinitely? > > > > Here's my blue-sky, poorly-thought-out idea... > > > > We could add a signal (e.g. SIGFREEZE) that allows the sleeps in > > NFS/RPC layer to be interrupted. Those would return back toward > > userland with a particular type of error (sort of like ERESTARTSYS). > > > > Before returning from the kernel though, we could freeze the process. > > When it wakes up, then we could go back down and retry the call again > > (much like an ERESTARTSYS kind of thing). > > Two (three?) show stopper words for you: "non-idempotent operations". > > Not all RPC calls can just be interrupted and restarted. Something like > an exclusive file create, unlink, file locking attempt, etc may give > rise to different results when replayed in the above scenario. > Interrupting an RPC call is not equivalent to cancelling its effects... > Right -- that's the part where we have to take great care to save the state of the syscall at the time we returned back up toward userland on a freeze event. I suppose we'd need to hang something off the task_struct to keep track of that. In most cases, it would be sufficient to keep track of whether an RPC had been sent during the call for non-idempotent operations. If it was sent, then we'd just re-enter the wait for the reply. If it wasn't then we'd go ahead and send the call. Still, I'm sure there are details I'm overlooking here. The whole point of holding these mutexes is to ensure that operations that the directories don't change while we're doing these operations. If we release the locks in order to go to sleep, then there's no guarantee that things haven't changed when we reacquire them. Maybe it's best to give up and just tell people that suspending your laptop with a NFS mount is not allowed :P -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-05 17:49 ` Tejun Heo 2013-03-05 19:03 ` Jeff Layton @ 2013-03-05 23:11 ` J. Bruce Fields 2013-03-06 0:02 ` Rafael J. Wysocki ` (2 more replies) 1 sibling, 3 replies; 61+ messages in thread From: J. Bruce Fields @ 2013-03-05 23:11 UTC (permalink / raw) To: Tejun Heo Cc: Jeff Layton, Myklebust, Trond, Oleg Nesterov, Mandeep Singh Baines, Ming Lei, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Ingo Molnar On Tue, Mar 05, 2013 at 09:49:54AM -0800, Tejun Heo wrote: > On Tue, Mar 05, 2013 at 09:46:48AM -0800, Tejun Heo wrote: > > So, I think this is why implementing freezer as a separate blocking > > mechanism isn't such a good idea. We're effectively introducing a > > completely new waiting state to a lot of unsuspecting paths which > > generates a lot of risks and eventually extra complexity to work > > around those. I think we really should update freezer to re-use the > > blocking points we already have - the ones used for signal delivery > > and ptracing. That way, other code paths don't have to worry about an > > extra stop state and we can confine most complexities to freezer > > proper. > > Also, consolidating those wait states means that we can solve the > event-to-response latency problem for all three cases - signal, ptrace > and freezer, rather than adding separate backing-out strategy for > freezer. Meanwhile, as none of this sounds likely to be done this time around--are we backing out the new lockdep warnings? --b. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-05 23:11 ` J. Bruce Fields @ 2013-03-06 0:02 ` Rafael J. Wysocki 2013-03-06 0:30 ` [PATCH] lockdep: make lock held while freezing check optional Mandeep Singh Baines 2013-03-06 0:59 ` LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! Mandeep Singh Baines 2 siblings, 0 replies; 61+ messages in thread From: Rafael J. Wysocki @ 2013-03-06 0:02 UTC (permalink / raw) To: J. Bruce Fields Cc: Tejun Heo, Jeff Layton, Myklebust, Trond, Oleg Nesterov, Mandeep Singh Baines, Ming Lei, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Andrew Morton, Ingo Molnar On Tuesday, March 05, 2013 06:11:10 PM J. Bruce Fields wrote: > On Tue, Mar 05, 2013 at 09:49:54AM -0800, Tejun Heo wrote: > > On Tue, Mar 05, 2013 at 09:46:48AM -0800, Tejun Heo wrote: > > > So, I think this is why implementing freezer as a separate blocking > > > mechanism isn't such a good idea. We're effectively introducing a > > > completely new waiting state to a lot of unsuspecting paths which > > > generates a lot of risks and eventually extra complexity to work > > > around those. I think we really should update freezer to re-use the > > > blocking points we already have - the ones used for signal delivery > > > and ptracing. That way, other code paths don't have to worry about an > > > extra stop state and we can confine most complexities to freezer > > > proper. > > > > Also, consolidating those wait states means that we can solve the > > event-to-response latency problem for all three cases - signal, ptrace > > and freezer, rather than adding separate backing-out strategy for > > freezer. > > Meanwhile, as none of this sounds likely to be done this time > around--are we backing out the new lockdep warnings? I think we should do that. Thanks, Rafael -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH] lockdep: make lock held while freezing check optional 2013-03-05 23:11 ` J. Bruce Fields 2013-03-06 0:02 ` Rafael J. Wysocki @ 2013-03-06 0:30 ` Mandeep Singh Baines 2013-03-07 12:03 ` Maarten Lankhorst 2013-03-06 0:59 ` LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! Mandeep Singh Baines 2 siblings, 1 reply; 61+ messages in thread From: Mandeep Singh Baines @ 2013-03-06 0:30 UTC (permalink / raw) To: linux-kernel, linux-nfs Cc: Mandeep Singh Baines, Tejun Heo, Jeff Layton, Myklebust, Trond, Oleg Nesterov, Ming Lei, Rafael J. Wysocki, Andrew Morton, Ingo Molnar This check is turning up a lot of code paths which need to be fixed so while those paths are fixed, let's make this check optional so that folks can still use lockdep. CC: Tejun Heo <tj@kernel.org> CC: Jeff Layton <jlayton@redhat.com> CC: "Myklebust, Trond" <Trond.Myklebust@netapp.com> CC: Oleg Nesterov <oleg@redhat.com> CC: Ming Lei <ming.lei@canonical.com> CC: "Rafael J. Wysocki" <rjw@sisk.pl> CC: Andrew Morton <akpm@linux-foundation.org> CC: Ingo Molnar <mingo@redhat.com> --- include/linux/freezer.h | 2 ++ lib/Kconfig.debug | 12 ++++++++++++ 2 files changed, 14 insertions(+) diff --git a/include/linux/freezer.h b/include/linux/freezer.h index 043a5cf..03bdc54 100644 --- a/include/linux/freezer.h +++ b/include/linux/freezer.h @@ -49,8 +49,10 @@ extern void thaw_kernel_threads(void); static inline bool try_to_freeze(void) { +#ifdef CONFIG_DEBUG_LOCK_HELD_FREEZING if (!(current->flags & PF_NOFREEZE)) debug_check_no_locks_held(); +#endif might_sleep(); if (likely(!freezing(current))) return false; diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 28be08c..bddda5f 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -547,6 +547,18 @@ config DEBUG_MUTEXES This feature allows mutex semantics violations to be detected and reported. +config DEBUG_LOCK_HELD_FREEZING + bool "Lock debugging: detect when locks are held during freeze" + depends on DEBUG_KERNEL && TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT + select DEBUG_SPINLOCK + select DEBUG_MUTEXES + select LOCKDEP + help + This feature will check whether any lock is incorrectly held + while freezing. If a task freezes with a lock held it will + block any other task that is waiting on that lock from freezing. + In the case of cgroup_freezer, this can cause a deadlock. + config DEBUG_LOCK_ALLOC bool "Lock debugging: detect incorrect freeing of live locks" depends on DEBUG_KERNEL && TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT -- 1.7.12.4 ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Re: [PATCH] lockdep: make lock held while freezing check optional 2013-03-06 0:30 ` [PATCH] lockdep: make lock held while freezing check optional Mandeep Singh Baines @ 2013-03-07 12:03 ` Maarten Lankhorst 0 siblings, 0 replies; 61+ messages in thread From: Maarten Lankhorst @ 2013-03-07 12:03 UTC (permalink / raw) To: Mandeep Singh Baines Cc: linux-kernel, linux-nfs, Tejun Heo, Jeff Layton, Myklebust, Trond, Oleg Nesterov, Ming Lei, Rafael J. Wysocki, Andrew Morton, Ingo Molnar Op 06-03-13 01:30, Mandeep Singh Baines schreef: > This check is turning up a lot of code paths which need to be > fixed so while those paths are fixed, let's make this check > optional so that folks can still use lockdep. I think the config option should be inverted, and make it more clear that you're not just not reporting some real bugs by disabling a check that should be on by default. > CC: Tejun Heo <tj@kernel.org> > CC: Jeff Layton <jlayton@redhat.com> > CC: "Myklebust, Trond" <Trond.Myklebust@netapp.com> > CC: Oleg Nesterov <oleg@redhat.com> > CC: Ming Lei <ming.lei@canonical.com> > CC: "Rafael J. Wysocki" <rjw@sisk.pl> > CC: Andrew Morton <akpm@linux-foundation.org> > CC: Ingo Molnar <mingo@redhat.com> > --- > include/linux/freezer.h | 2 ++ > lib/Kconfig.debug | 12 ++++++++++++ > 2 files changed, 14 insertions(+) > > diff --git a/include/linux/freezer.h b/include/linux/freezer.h > index 043a5cf..03bdc54 100644 > --- a/include/linux/freezer.h > +++ b/include/linux/freezer.h > @@ -49,8 +49,10 @@ extern void thaw_kernel_threads(void); > > static inline bool try_to_freeze(void) > { > +#ifdef CONFIG_DEBUG_LOCK_HELD_FREEZING > if (!(current->flags & PF_NOFREEZE)) > debug_check_no_locks_held(); > +#endif > might_sleep(); > if (likely(!freezing(current))) > return false; > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug > index 28be08c..bddda5f 100644 > --- a/lib/Kconfig.debug > +++ b/lib/Kconfig.debug > @@ -547,6 +547,18 @@ config DEBUG_MUTEXES > This feature allows mutex semantics violations to be detected and > reported. > > +config DEBUG_LOCK_HELD_FREEZING > + bool "Lock debugging: detect when locks are held during freeze" > + depends on DEBUG_KERNEL && TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT > + select DEBUG_SPINLOCK > + select DEBUG_MUTEXES > + select LOCKDEP > + help > + This feature will check whether any lock is incorrectly held > + while freezing. If a task freezes with a lock held it will > + block any other task that is waiting on that lock from freezing. > + In the case of cgroup_freezer, this can cause a deadlock. > + > config DEBUG_LOCK_ALLOC > bool "Lock debugging: detect incorrect freeing of live locks" > depends on DEBUG_KERNEL && TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-05 23:11 ` J. Bruce Fields 2013-03-06 0:02 ` Rafael J. Wysocki 2013-03-06 0:30 ` [PATCH] lockdep: make lock held while freezing check optional Mandeep Singh Baines @ 2013-03-06 0:59 ` Mandeep Singh Baines 2013-03-06 1:05 ` J. Bruce Fields 2 siblings, 1 reply; 61+ messages in thread From: Mandeep Singh Baines @ 2013-03-06 0:59 UTC (permalink / raw) To: J. Bruce Fields Cc: Tejun Heo, Jeff Layton, Myklebust, Trond, Oleg Nesterov, Ming Lei, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Ingo Molnar On Tue, Mar 5, 2013 at 3:11 PM, J. Bruce Fields <bfields@fieldses.org> wrote: > On Tue, Mar 05, 2013 at 09:49:54AM -0800, Tejun Heo wrote: >> On Tue, Mar 05, 2013 at 09:46:48AM -0800, Tejun Heo wrote: >> > So, I think this is why implementing freezer as a separate blocking >> > mechanism isn't such a good idea. We're effectively introducing a >> > completely new waiting state to a lot of unsuspecting paths which >> > generates a lot of risks and eventually extra complexity to work >> > around those. I think we really should update freezer to re-use the >> > blocking points we already have - the ones used for signal delivery >> > and ptracing. That way, other code paths don't have to worry about an >> > extra stop state and we can confine most complexities to freezer >> > proper. >> >> Also, consolidating those wait states means that we can solve the >> event-to-response latency problem for all three cases - signal, ptrace >> and freezer, rather than adding separate backing-out strategy for >> freezer. > > Meanwhile, as none of this sounds likely to be done this time > around--are we backing out the new lockdep warnings? > > --b. What if we hide it behind a Kconfig? Its finding real bugs. http://lkml.org/lkml/2013/3/5/583 ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-06 0:59 ` LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! Mandeep Singh Baines @ 2013-03-06 1:05 ` J. Bruce Fields 2013-03-06 1:16 ` Tejun Heo 0 siblings, 1 reply; 61+ messages in thread From: J. Bruce Fields @ 2013-03-06 1:05 UTC (permalink / raw) To: Mandeep Singh Baines Cc: Tejun Heo, Jeff Layton, Myklebust, Trond, Oleg Nesterov, Ming Lei, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Ingo Molnar On Tue, Mar 05, 2013 at 04:59:00PM -0800, Mandeep Singh Baines wrote: > On Tue, Mar 5, 2013 at 3:11 PM, J. Bruce Fields <bfields@fieldses.org> wrote: > > On Tue, Mar 05, 2013 at 09:49:54AM -0800, Tejun Heo wrote: > >> On Tue, Mar 05, 2013 at 09:46:48AM -0800, Tejun Heo wrote: > >> > So, I think this is why implementing freezer as a separate blocking > >> > mechanism isn't such a good idea. We're effectively introducing a > >> > completely new waiting state to a lot of unsuspecting paths which > >> > generates a lot of risks and eventually extra complexity to work > >> > around those. I think we really should update freezer to re-use the > >> > blocking points we already have - the ones used for signal delivery > >> > and ptracing. That way, other code paths don't have to worry about an > >> > extra stop state and we can confine most complexities to freezer > >> > proper. > >> > >> Also, consolidating those wait states means that we can solve the > >> event-to-response latency problem for all three cases - signal, ptrace > >> and freezer, rather than adding separate backing-out strategy for > >> freezer. > > > > Meanwhile, as none of this sounds likely to be done this time > > around--are we backing out the new lockdep warnings? > > > > --b. > > What if we hide it behind a Kconfig? Its finding real bugs. > > http://lkml.org/lkml/2013/3/5/583 If it's really just a 2-line patch to try_to_freeze(), could it just be carried out-of-tree by people that are specifically working on tracking down these problems? But I don't have strong feelings about it--as long as it doesn't result in the same known issues getting reported again and again.... --b. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-06 1:05 ` J. Bruce Fields @ 2013-03-06 1:16 ` Tejun Heo 2013-03-06 3:11 ` Mandeep Singh Baines 0 siblings, 1 reply; 61+ messages in thread From: Tejun Heo @ 2013-03-06 1:16 UTC (permalink / raw) To: J. Bruce Fields Cc: Mandeep Singh Baines, Jeff Layton, Myklebust, Trond, Oleg Nesterov, Ming Lei, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Ingo Molnar On Tue, Mar 05, 2013 at 08:05:07PM -0500, J. Bruce Fields wrote: > If it's really just a 2-line patch to try_to_freeze(), could it just be > carried out-of-tree by people that are specifically working on tracking > down these problems? > > But I don't have strong feelings about it--as long as it doesn't result > in the same known issues getting reported again and again.... Agreed, I don't think a Kconfig option is justified for this. If this is really important, annotate broken paths so that it doesn't trigger spuriously; otherwise, please just remove it. Thanks. -- tejun ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-06 1:16 ` Tejun Heo @ 2013-03-06 3:11 ` Mandeep Singh Baines 2013-03-06 9:09 ` Ingo Molnar 0 siblings, 1 reply; 61+ messages in thread From: Mandeep Singh Baines @ 2013-03-06 3:11 UTC (permalink / raw) To: Tejun Heo Cc: J. Bruce Fields, Jeff Layton, Myklebust, Trond, Oleg Nesterov, Ming Lei, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Ingo Molnar On Tue, Mar 5, 2013 at 5:16 PM, Tejun Heo <tj@kernel.org> wrote: > On Tue, Mar 05, 2013 at 08:05:07PM -0500, J. Bruce Fields wrote: >> If it's really just a 2-line patch to try_to_freeze(), could it just be >> carried out-of-tree by people that are specifically working on tracking >> down these problems? >> >> But I don't have strong feelings about it--as long as it doesn't result >> in the same known issues getting reported again and again.... > > Agreed, I don't think a Kconfig option is justified for this. If this > is really important, annotate broken paths so that it doesn't trigger > spuriously; otherwise, please just remove it. > Fair enough. Let's revert then. I'll rework to use a lockdep annotation. Maybe, add a new lockdep API: lockdep_set_held_during_freeze(lock); Then when we do the check, ignore any locks that set this bit. Ingo, does this seem like a reasonable design to you? Regards, Mandeep > Thanks. > > -- > tejun ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-06 3:11 ` Mandeep Singh Baines @ 2013-03-06 9:09 ` Ingo Molnar 2013-03-06 12:06 ` Jeff Layton 0 siblings, 1 reply; 61+ messages in thread From: Ingo Molnar @ 2013-03-06 9:09 UTC (permalink / raw) To: Mandeep Singh Baines Cc: Tejun Heo, J. Bruce Fields, Jeff Layton, Myklebust, Trond, Oleg Nesterov, Ming Lei, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Linus Torvalds, Peter Zijlstra, Thomas Gleixner * Mandeep Singh Baines <msb@chromium.org> wrote: > On Tue, Mar 5, 2013 at 5:16 PM, Tejun Heo <tj@kernel.org> wrote: > > On Tue, Mar 05, 2013 at 08:05:07PM -0500, J. Bruce Fields wrote: > >> If it's really just a 2-line patch to try_to_freeze(), could it just be > >> carried out-of-tree by people that are specifically working on tracking > >> down these problems? > >> > >> But I don't have strong feelings about it--as long as it doesn't result > >> in the same known issues getting reported again and again.... > > > > Agreed, I don't think a Kconfig option is justified for this. If this > > is really important, annotate broken paths so that it doesn't trigger > > spuriously; otherwise, please just remove it. > > > > Fair enough. Let's revert then. I'll rework to use a lockdep annotation. > > Maybe, add a new lockdep API: > > lockdep_set_held_during_freeze(lock); > > Then when we do the check, ignore any locks that set this bit. > > Ingo, does this seem like a reasonable design to you? Am I reading the discussion correctly that the new warnings show REAL potential deadlock scenarios, which can hit real users and can lock their box up in entirely real usage scenarios? If yes then guys we _really_ don't want to use lockdep annotation to _HIDE_ bugs. We typically use them to teach lockdep about things it does not know about. How about fixing the deadlocks instead? Thanks, Ingo ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-06 9:09 ` Ingo Molnar @ 2013-03-06 12:06 ` Jeff Layton 2013-03-06 15:59 ` Mandeep Singh Baines 0 siblings, 1 reply; 61+ messages in thread From: Jeff Layton @ 2013-03-06 12:06 UTC (permalink / raw) To: Ingo Molnar Cc: Mandeep Singh Baines, Tejun Heo, J. Bruce Fields, Myklebust, Trond, Oleg Nesterov, Ming Lei, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Linus Torvalds, Peter Zijlstra, Thomas Gleixner On Wed, 6 Mar 2013 10:09:14 +0100 Ingo Molnar <mingo@kernel.org> wrote: > > * Mandeep Singh Baines <msb@chromium.org> wrote: > > > On Tue, Mar 5, 2013 at 5:16 PM, Tejun Heo <tj@kernel.org> wrote: > > > On Tue, Mar 05, 2013 at 08:05:07PM -0500, J. Bruce Fields wrote: > > >> If it's really just a 2-line patch to try_to_freeze(), could it just be > > >> carried out-of-tree by people that are specifically working on tracking > > >> down these problems? > > >> > > >> But I don't have strong feelings about it--as long as it doesn't result > > >> in the same known issues getting reported again and again.... > > > > > > Agreed, I don't think a Kconfig option is justified for this. If this > > > is really important, annotate broken paths so that it doesn't trigger > > > spuriously; otherwise, please just remove it. > > > > > > > Fair enough. Let's revert then. I'll rework to use a lockdep annotation. > > > > Maybe, add a new lockdep API: > > > > lockdep_set_held_during_freeze(lock); > > > > Then when we do the check, ignore any locks that set this bit. > > > > Ingo, does this seem like a reasonable design to you? > > Am I reading the discussion correctly that the new warnings show REAL potential > deadlock scenarios, which can hit real users and can lock their box up in entirely > real usage scenarios? > > If yes then guys we _really_ don't want to use lockdep annotation to _HIDE_ bugs. > We typically use them to teach lockdep about things it does not know about. > > How about fixing the deadlocks instead? > I do see how the freezer might fail to suspend certain tasks, but I don't see the deadlock scenario here in the NFS/RPC case. Can someone outline a situation where this might end up deadlocking? If not, then I'd be inclined to say that while this may be a problem, the warning is excessive... -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-06 12:06 ` Jeff Layton @ 2013-03-06 15:59 ` Mandeep Singh Baines 2013-03-06 18:23 ` Jeff Layton 0 siblings, 1 reply; 61+ messages in thread From: Mandeep Singh Baines @ 2013-03-06 15:59 UTC (permalink / raw) To: Jeff Layton Cc: Ingo Molnar, Tejun Heo, J. Bruce Fields, Myklebust, Trond, Oleg Nesterov, Ming Lei, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Linus Torvalds, Peter Zijlstra, Thomas Gleixner On Wed, Mar 6, 2013 at 4:06 AM, Jeff Layton <jlayton@redhat.com> wrote: > On Wed, 6 Mar 2013 10:09:14 +0100 > Ingo Molnar <mingo@kernel.org> wrote: > >> >> * Mandeep Singh Baines <msb@chromium.org> wrote: >> >> > On Tue, Mar 5, 2013 at 5:16 PM, Tejun Heo <tj@kernel.org> wrote: >> > > On Tue, Mar 05, 2013 at 08:05:07PM -0500, J. Bruce Fields wrote: >> > >> If it's really just a 2-line patch to try_to_freeze(), could it just be >> > >> carried out-of-tree by people that are specifically working on tracking >> > >> down these problems? >> > >> >> > >> But I don't have strong feelings about it--as long as it doesn't result >> > >> in the same known issues getting reported again and again.... >> > > >> > > Agreed, I don't think a Kconfig option is justified for this. If this >> > > is really important, annotate broken paths so that it doesn't trigger >> > > spuriously; otherwise, please just remove it. >> > > >> > >> > Fair enough. Let's revert then. I'll rework to use a lockdep annotation. >> > >> > Maybe, add a new lockdep API: >> > >> > lockdep_set_held_during_freeze(lock); >> > >> > Then when we do the check, ignore any locks that set this bit. >> > >> > Ingo, does this seem like a reasonable design to you? >> >> Am I reading the discussion correctly that the new warnings show REAL potential >> deadlock scenarios, which can hit real users and can lock their box up in entirely >> real usage scenarios? >> >> If yes then guys we _really_ don't want to use lockdep annotation to _HIDE_ bugs. >> We typically use them to teach lockdep about things it does not know about. >> >> How about fixing the deadlocks instead? >> > > I do see how the freezer might fail to suspend certain tasks, but I > don't see the deadlock scenario here in the NFS/RPC case. Can someone > outline a situation where this might end up deadlocking? If not, then > I'd be inclined to say that while this may be a problem, the warning is > excessive... > In general, holding a lock and freezing can cause a deadlock if: 1) you froze via the cgroup_freezer subsystem and a task in another cgroup tried to acquire the same lock 2) the lock was needed later is suspend/hibernate. For example, if the lock was needed in dpm_suspend by one of the device callbacks. For hibernate, you also need to worry about any locks that need to be acquired in order to write to the swap device. 3) another freezing task blocked on this lock and held other locks needed later in suspend. If that task were skipped by the freezer, you would deadlock You will block/prevent suspend if: 4) another freezing task blocked on this lock and was unable to freeze I think 1) and 4) can happen for the NFS/RPC case. Case 1) requires cgroup freezer. Case 4) while not causing a deadlock could prevent your laptop/phone from sleeping and end up burning all your battery. If suspend is initiated via lid close you won't even know about the failure. Regards, Mandeep > -- > Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-06 15:59 ` Mandeep Singh Baines @ 2013-03-06 18:23 ` Jeff Layton 2013-03-06 18:37 ` Myklebust, Trond 0 siblings, 1 reply; 61+ messages in thread From: Jeff Layton @ 2013-03-06 18:23 UTC (permalink / raw) To: Mandeep Singh Baines Cc: Ingo Molnar, Tejun Heo, J. Bruce Fields, Myklebust, Trond, Oleg Nesterov, Ming Lei, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Linus Torvalds, Peter Zijlstra, Thomas Gleixner On Wed, 6 Mar 2013 07:59:01 -0800 Mandeep Singh Baines <msb@chromium.org> wrote: > On Wed, Mar 6, 2013 at 4:06 AM, Jeff Layton <jlayton@redhat.com> wrote: > > On Wed, 6 Mar 2013 10:09:14 +0100 > > Ingo Molnar <mingo@kernel.org> wrote: > > > >> > >> * Mandeep Singh Baines <msb@chromium.org> wrote: > >> > >> > On Tue, Mar 5, 2013 at 5:16 PM, Tejun Heo <tj@kernel.org> wrote: > >> > > On Tue, Mar 05, 2013 at 08:05:07PM -0500, J. Bruce Fields wrote: > >> > >> If it's really just a 2-line patch to try_to_freeze(), could it just be > >> > >> carried out-of-tree by people that are specifically working on tracking > >> > >> down these problems? > >> > >> > >> > >> But I don't have strong feelings about it--as long as it doesn't result > >> > >> in the same known issues getting reported again and again.... > >> > > > >> > > Agreed, I don't think a Kconfig option is justified for this. If this > >> > > is really important, annotate broken paths so that it doesn't trigger > >> > > spuriously; otherwise, please just remove it. > >> > > > >> > > >> > Fair enough. Let's revert then. I'll rework to use a lockdep annotation. > >> > > >> > Maybe, add a new lockdep API: > >> > > >> > lockdep_set_held_during_freeze(lock); > >> > > >> > Then when we do the check, ignore any locks that set this bit. > >> > > >> > Ingo, does this seem like a reasonable design to you? > >> > >> Am I reading the discussion correctly that the new warnings show REAL potential > >> deadlock scenarios, which can hit real users and can lock their box up in entirely > >> real usage scenarios? > >> > >> If yes then guys we _really_ don't want to use lockdep annotation to _HIDE_ bugs. > >> We typically use them to teach lockdep about things it does not know about. > >> > >> How about fixing the deadlocks instead? > >> > > > > I do see how the freezer might fail to suspend certain tasks, but I > > don't see the deadlock scenario here in the NFS/RPC case. Can someone > > outline a situation where this might end up deadlocking? If not, then > > I'd be inclined to say that while this may be a problem, the warning is > > excessive... > > > > In general, holding a lock and freezing can cause a deadlock if: > > 1) you froze via the cgroup_freezer subsystem and a task in another > cgroup tried to acquire the same lock > 2) the lock was needed later is suspend/hibernate. For example, if the > lock was needed in dpm_suspend by one of the device callbacks. For > hibernate, you also need to worry about any locks that need to be > acquired in order to write to the swap device. > 3) another freezing task blocked on this lock and held other locks > needed later in suspend. If that task were skipped by the freezer, you > would deadlock > > You will block/prevent suspend if: > > 4) another freezing task blocked on this lock and was unable to freeze > > I think 1) and 4) can happen for the NFS/RPC case. Case 1) requires > cgroup freezer. Case 4) while not causing a deadlock could prevent > your laptop/phone from sleeping and end up burning all your battery. > If suspend is initiated via lid close you won't even know about the > failure. > We're aware of #4. That was the intent of adding try_to_freeze() into this codepath in the first place. It's not a great solution for obvious reasons, but we don't have another at the moment. For #1 I'm not sure what to do. I'm that familiar with cgroups or how the freezer works. The bottom line is that we have a choice -- we can either rip out this new lockdep warning, or rip out the code that causes it to fire. If we rip out the warning we may miss some legit cases where we might possibly have hit a deadlock. If we rip out the code that causes it to fire, then we exacerbate the #4 problem above. That will effectively make it so that you can't suspend the host whenever NFS is doing anything moderately active. Ripping out the warning seems like the best course of action in the near term, but it's not my call... -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-06 18:23 ` Jeff Layton @ 2013-03-06 18:37 ` Myklebust, Trond 2013-03-06 20:15 ` Mandeep Singh Baines 0 siblings, 1 reply; 61+ messages in thread From: Myklebust, Trond @ 2013-03-06 18:37 UTC (permalink / raw) To: Jeff Layton Cc: Mandeep Singh Baines, Ingo Molnar, Tejun Heo, J. Bruce Fields, Oleg Nesterov, Ming Lei, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Linus Torvalds, Peter Zijlstra, Thomas Gleixner On Wed, 2013-03-06 at 13:23 -0500, Jeff Layton wrote: > On Wed, 6 Mar 2013 07:59:01 -0800 > Mandeep Singh Baines <msb@chromium.org> wrote: > > In general, holding a lock and freezing can cause a deadlock if: > > > > 1) you froze via the cgroup_freezer subsystem and a task in another > > cgroup tried to acquire the same lock > > 2) the lock was needed later is suspend/hibernate. For example, if the > > lock was needed in dpm_suspend by one of the device callbacks. For > > hibernate, you also need to worry about any locks that need to be > > acquired in order to write to the swap device. > > 3) another freezing task blocked on this lock and held other locks > > needed later in suspend. If that task were skipped by the freezer, you > > would deadlock > > > > You will block/prevent suspend if: > > > > 4) another freezing task blocked on this lock and was unable to freeze > > > > I think 1) and 4) can happen for the NFS/RPC case. Case 1) requires > > cgroup freezer. Case 4) while not causing a deadlock could prevent > > your laptop/phone from sleeping and end up burning all your battery. > > If suspend is initiated via lid close you won't even know about the > > failure. > > > > We're aware of #4. That was the intent of adding try_to_freeze() into > this codepath in the first place. It's not a great solution for obvious > reasons, but we don't have another at the moment. > > For #1 I'm not sure what to do. I'm that familiar with cgroups or how > the freezer works. > > The bottom line is that we have a choice -- we can either rip out this > new lockdep warning, or rip out the code that causes it to fire. > > If we rip out the warning we may miss some legit cases where we might > possibly have hit a deadlock. > > If we rip out the code that causes it to fire, then we exacerbate the > #4 problem above. That will effectively make it so that you can't > suspend the host whenever NFS is doing anything moderately active. #4 is probably the only case where we might want to freeze. Unless we're in a situation where the network is going down, we can usually always make progress with completing the RPC call and finishing the system call. So in the case of cgroup_freezer, we only care if the freezing cgroup also owns the network device (or net namespace) that NFS is using to talk to the server. As I said, the alternative is to notify NFS that the device is going down, and to give it a chance to quiesce itself before that happens. This is also the only way to ensure that processes which own locks on the server (e.g. posix file locks) have a chance to release them before being suspended. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-06 18:37 ` Myklebust, Trond @ 2013-03-06 20:15 ` Mandeep Singh Baines 0 siblings, 0 replies; 61+ messages in thread From: Mandeep Singh Baines @ 2013-03-06 20:15 UTC (permalink / raw) To: Myklebust, Trond Cc: Jeff Layton, Ingo Molnar, Tejun Heo, J. Bruce Fields, Oleg Nesterov, Ming Lei, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Andrew Morton, Linus Torvalds, Peter Zijlstra, Thomas Gleixner On Wed, Mar 6, 2013 at 10:37 AM, Myklebust, Trond <Trond.Myklebust@netapp.com> wrote: > On Wed, 2013-03-06 at 13:23 -0500, Jeff Layton wrote: >> On Wed, 6 Mar 2013 07:59:01 -0800 >> Mandeep Singh Baines <msb@chromium.org> wrote: >> > In general, holding a lock and freezing can cause a deadlock if: >> > >> > 1) you froze via the cgroup_freezer subsystem and a task in another >> > cgroup tried to acquire the same lock >> > 2) the lock was needed later is suspend/hibernate. For example, if the >> > lock was needed in dpm_suspend by one of the device callbacks. For >> > hibernate, you also need to worry about any locks that need to be >> > acquired in order to write to the swap device. >> > 3) another freezing task blocked on this lock and held other locks >> > needed later in suspend. If that task were skipped by the freezer, you >> > would deadlock >> > >> > You will block/prevent suspend if: >> > >> > 4) another freezing task blocked on this lock and was unable to freeze >> > >> > I think 1) and 4) can happen for the NFS/RPC case. Case 1) requires >> > cgroup freezer. Case 4) while not causing a deadlock could prevent >> > your laptop/phone from sleeping and end up burning all your battery. >> > If suspend is initiated via lid close you won't even know about the >> > failure. >> > >> >> We're aware of #4. That was the intent of adding try_to_freeze() into >> this codepath in the first place. It's not a great solution for obvious >> reasons, but we don't have another at the moment. >> >> For #1 I'm not sure what to do. I'm that familiar with cgroups or how >> the freezer works. >> >> The bottom line is that we have a choice -- we can either rip out this >> new lockdep warning, or rip out the code that causes it to fire. >> >> If we rip out the warning we may miss some legit cases where we might >> possibly have hit a deadlock. >> >> If we rip out the code that causes it to fire, then we exacerbate the >> #4 problem above. That will effectively make it so that you can't >> suspend the host whenever NFS is doing anything moderately active. > > #4 is probably the only case where we might want to freeze. > > Unless we're in a situation where the network is going down, we can > usually always make progress with completing the RPC call and finishing > the system call. So in the case of cgroup_freezer, we only care if the > freezing cgroup also owns the network device (or net namespace) that NFS > is using to talk to the server. > > As I said, the alternative is to notify NFS that the device is going > down, and to give it a chance to quiesce itself before that happens. > This is also the only way to ensure that processes which own locks on > the server (e.g. posix file locks) have a chance to release them before > being suspended. > So if I'm understanding correctly, the locks are held for bounded time so under normal situation you don't really need to freeze here. But if the task that is going to wake you were to freeze then this code path could block suspend if PF_FREEZER_SKIP were not set. For this particular case (not in general), instead of calling try_to_freeze(), what if you only cleared PF_FREEZER_SKIP. For case #1, you wouldn't block suspend if you're stuck waiting. If you received the event while processes were freezing, you'd just try_to_freeze() some place else. Probably on the way back up from the system call. For case #4, you'd eventually complete the rpc and then freeze some place else. If you're stuck waiting for the bit because the process that was going to wake you also got frozen, then the container would be stuck in the THAWING state. Not catastrophic and something the administrator could workaround by sequencing things right. Does this sound like a reasonable workaround? Regards, Mandeep > > -- > Trond Myklebust > Linux NFS client maintainer > > NetApp > Trond.Myklebust@netapp.com > www.netapp.com ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-04 14:14 ` Myklebust, Trond 2013-03-04 14:23 ` Jeff Layton @ 2013-03-04 14:40 ` Ming Lei 2013-03-04 15:04 ` Jeff Layton 1 sibling, 1 reply; 61+ messages in thread From: Ming Lei @ 2013-03-04 14:40 UTC (permalink / raw) To: Myklebust, Trond Cc: Jeff Layton, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org On Mon, Mar 4, 2013 at 10:14 PM, Myklebust, Trond <Trond.Myklebust@netapp.com> wrote: > On Mon, 2013-03-04 at 21:57 +0800, Ming Lei wrote: >> Hi, >> >> The below warning can be triggered each time when mount.nfs is >> running on 3.9-rc1. >> >> Not sure if freezable_schedule() inside rpc_wait_bit_killable should >> be changed to schedule() since nfs_clid_init_mutex is held in the path. > > Cc:ing Jeff, who added freezable_schedule(), and applied it to > rpc_wait_bit_killable. > > So this is occurring when the kernel enters the freeze state? No, but the situation can really be triggered in freeze case, so lockdep forecasts the problem correctly, :-) > Why does it occur only with nfs_clid_init_mutex, and not with all the > other mutexes that we hold across RPC calls? We hold inode->i_mutex > across RPC calls all the time when doing renames, unlinks, file > creation,... At least in the mount.nfs context, only nfs_clid_init_mutex is held. IMO, if locks might be held in the path, it isn't wise to call freezable_schedule inside rpc_wait_bit_killable(). Thanks, -- Ming Lei ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-04 14:40 ` Ming Lei @ 2013-03-04 15:04 ` Jeff Layton 2013-03-04 15:33 ` Ming Lei 0 siblings, 1 reply; 61+ messages in thread From: Jeff Layton @ 2013-03-04 15:04 UTC (permalink / raw) To: Ming Lei Cc: Myklebust, Trond, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org On Mon, 4 Mar 2013 22:40:02 +0800 Ming Lei <ming.lei@canonical.com> wrote: > On Mon, Mar 4, 2013 at 10:14 PM, Myklebust, Trond > <Trond.Myklebust@netapp.com> wrote: > > On Mon, 2013-03-04 at 21:57 +0800, Ming Lei wrote: > >> Hi, > >> > >> The below warning can be triggered each time when mount.nfs is > >> running on 3.9-rc1. > >> > >> Not sure if freezable_schedule() inside rpc_wait_bit_killable should > >> be changed to schedule() since nfs_clid_init_mutex is held in the path. > > > > Cc:ing Jeff, who added freezable_schedule(), and applied it to > > rpc_wait_bit_killable. > > > > So this is occurring when the kernel enters the freeze state? > > No, but the situation can really be triggered in freeze case, so > lockdep forecasts the problem correctly, :-) > > > Why does it occur only with nfs_clid_init_mutex, and not with all the > > other mutexes that we hold across RPC calls? We hold inode->i_mutex > > across RPC calls all the time when doing renames, unlinks, file > > creation,... > > At least in the mount.nfs context, only nfs_clid_init_mutex is held. > > IMO, if locks might be held in the path, it isn't wise to call > freezable_schedule > inside rpc_wait_bit_killable(). > I don't get it -- why is it bad to hold a lock across a freeze event? The problem that we have is that we must often hold locks across long-running syscalls (consider something like sync()). In the event that there is a lot of dirty data, it might take a long time for that to finish. There's also the problem that it's not uncommon for the freezer to take down userland processes (such as NetworkManager) which in turn take down network interfaces that we need to talk to the server. The fix from a couple of years ago (which admittedly needs more work) was to allow the freezing of tasks that are waiting on a reply from the server. That sort of necessitates that we are allowed to hold our locks across the try_to_freeze call though. If that's no longer allowed then we're back to square one with laptops that fail to suspend when they have NFS mounts. Is there some other solution we should pursue instead? -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-04 15:04 ` Jeff Layton @ 2013-03-04 15:33 ` Ming Lei 2013-03-04 15:53 ` Myklebust, Trond 0 siblings, 1 reply; 61+ messages in thread From: Ming Lei @ 2013-03-04 15:33 UTC (permalink / raw) To: Jeff Layton Cc: Myklebust, Trond, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Mandeep Singh Baines, Rafael J. Wysocki, Ben Chan, Oleg Nesterov, Ingo Molnar Hi, CC guys who introduced the lockdep change. On Mon, Mar 4, 2013 at 11:04 PM, Jeff Layton <jlayton@redhat.com> wrote: > > I don't get it -- why is it bad to hold a lock across a freeze event? At least this may deadlock another mount.nfs during freezing, :-) See detailed explanation in the commit log: commit 6aa9707099c4b25700940eb3d016f16c4434360d Author: Mandeep Singh Baines <msb@chromium.org> Date: Wed Feb 27 17:03:18 2013 -0800 lockdep: check that no locks held at freeze time We shouldn't try_to_freeze if locks are held. Holding a lock can cause a deadlock if the lock is later acquired in the suspend or hibernate path (e.g. by dpm). Holding a lock can also cause a deadlock in the case of cgroup_freezer if a lock is held inside a frozen cgroup that is later acquired by a process outside that group. > The problem that we have is that we must often hold locks across > long-running syscalls (consider something like sync()). In the event > that there is a lot of dirty data, it might take a long time for that > to finish. > > There's also the problem that it's not uncommon for the freezer to take > down userland processes (such as NetworkManager) which in turn take > down network interfaces that we need to talk to the server. > > The fix from a couple of years ago (which admittedly needs more work) > was to allow the freezing of tasks that are waiting on a reply from the > server. That sort of necessitates that we are allowed to hold our locks > across the try_to_freeze call though. > > If that's no longer allowed then we're back to square one with laptops > that fail to suspend when they have NFS mounts. Is there some other > solution we should pursue instead? Thanks, -- Ming Lei ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-04 15:33 ` Ming Lei @ 2013-03-04 15:53 ` Myklebust, Trond 2013-03-04 20:09 ` Mandeep Singh Baines 0 siblings, 1 reply; 61+ messages in thread From: Myklebust, Trond @ 2013-03-04 15:53 UTC (permalink / raw) To: Ming Lei, Linus Torvalds, Al Viro Cc: Jeff Layton, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Mandeep Singh Baines, Rafael J. Wysocki, Ben Chan, Oleg Nesterov, Ingo Molnar On Mon, 2013-03-04 at 23:33 +0800, Ming Lei wrote: > Hi, > > CC guys who introduced the lockdep change. > > On Mon, Mar 4, 2013 at 11:04 PM, Jeff Layton <jlayton@redhat.com> wrote: > > > > > I don't get it -- why is it bad to hold a lock across a freeze event? > > At least this may deadlock another mount.nfs during freezing, :-) > > See detailed explanation in the commit log: > > commit 6aa9707099c4b25700940eb3d016f16c4434360d > Author: Mandeep Singh Baines <msb@chromium.org> > Date: Wed Feb 27 17:03:18 2013 -0800 > > lockdep: check that no locks held at freeze time > > We shouldn't try_to_freeze if locks are held. Holding a lock can cause a > deadlock if the lock is later acquired in the suspend or hibernate path > (e.g. by dpm). Holding a lock can also cause a deadlock in the case of > cgroup_freezer if a lock is held inside a frozen cgroup that is later > acquired by a process outside that group. > This is bloody ridiculous... If you want to add functionality to implement cgroup or per-process freezing, then do it through some other api instead of trying to push your problems onto others by adding new global locking rules. Filesystems are a shared resource that have _nothing_ to do with process cgroups. They need to be suspended when the network goes down or other resources that they depend on are suspended. At that point, there is no "what if I launch a new mount command?" scenario. Trond -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-04 15:53 ` Myklebust, Trond @ 2013-03-04 20:09 ` Mandeep Singh Baines 2013-03-04 20:10 ` Mandeep Singh Baines 0 siblings, 1 reply; 61+ messages in thread From: Mandeep Singh Baines @ 2013-03-04 20:09 UTC (permalink / raw) To: Myklebust, Trond Cc: Ming Lei, Linus Torvalds, Al Viro, Jeff Layton, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Ben Chan, Oleg Nesterov, Ingo Molnar On Mon, Mar 4, 2013 at 7:53 AM, Myklebust, Trond <Trond.Myklebust@netapp.com> wrote: > On Mon, 2013-03-04 at 23:33 +0800, Ming Lei wrote: >> Hi, >> >> CC guys who introduced the lockdep change. >> >> On Mon, Mar 4, 2013 at 11:04 PM, Jeff Layton <jlayton@redhat.com> wrote: >> >> > >> > I don't get it -- why is it bad to hold a lock across a freeze event? >> >> At least this may deadlock another mount.nfs during freezing, :-) >> >> See detailed explanation in the commit log: >> >> commit 6aa9707099c4b25700940eb3d016f16c4434360d >> Author: Mandeep Singh Baines <msb@chromium.org> >> Date: Wed Feb 27 17:03:18 2013 -0800 >> >> lockdep: check that no locks held at freeze time >> >> We shouldn't try_to_freeze if locks are held. Holding a lock can cause a >> deadlock if the lock is later acquired in the suspend or hibernate path >> (e.g. by dpm). Holding a lock can also cause a deadlock in the case of >> cgroup_freezer if a lock is held inside a frozen cgroup that is later >> acquired by a process outside that group. >> > > This is bloody ridiculous... If you want to add functionality to > implement cgroup or per-process freezing, then do it through some other > api instead of trying to push your problems onto others by adding new > global locking rules. > > Filesystems are a shared resource that have _nothing_ to do with process > cgroups. They need to be suspended when the network goes down or other > resources that they depend on are suspended. At that point, there is no > "what if I launch a new mount command?" scenario. > Hi Trond, My intention was to introduce new rules. My change simply introduces a check for a deadlock case that can already happen. I think a deadlock could happen under the following scenario: 1) An administrator wants to freeze a container. Perhaps to checkpoint it and it migrate it some place else. 2) An nfs mount was in progress so we hit this code path and freeze with a lock held. 3) Another container tries to nfs mount. 4) Deadlock. Regards, Mandeep > Trond > -- > Trond Myklebust > Linux NFS client maintainer > > NetApp > Trond.Myklebust@netapp.com > www.netapp.com ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! 2013-03-04 20:09 ` Mandeep Singh Baines @ 2013-03-04 20:10 ` Mandeep Singh Baines 0 siblings, 0 replies; 61+ messages in thread From: Mandeep Singh Baines @ 2013-03-04 20:10 UTC (permalink / raw) To: Myklebust, Trond Cc: Ming Lei, Linus Torvalds, Al Viro, Jeff Layton, J. Bruce Fields, Linux Kernel Mailing List, linux-nfs@vger.kernel.org, Rafael J. Wysocki, Ben Chan, Oleg Nesterov, Ingo Molnar On Mon, Mar 4, 2013 at 12:09 PM, Mandeep Singh Baines <msb@chromium.org> wrote: > On Mon, Mar 4, 2013 at 7:53 AM, Myklebust, Trond > <Trond.Myklebust@netapp.com> wrote: >> On Mon, 2013-03-04 at 23:33 +0800, Ming Lei wrote: >>> Hi, >>> >>> CC guys who introduced the lockdep change. >>> >>> On Mon, Mar 4, 2013 at 11:04 PM, Jeff Layton <jlayton@redhat.com> wrote: >>> >>> > >>> > I don't get it -- why is it bad to hold a lock across a freeze event? >>> >>> At least this may deadlock another mount.nfs during freezing, :-) >>> >>> See detailed explanation in the commit log: >>> >>> commit 6aa9707099c4b25700940eb3d016f16c4434360d >>> Author: Mandeep Singh Baines <msb@chromium.org> >>> Date: Wed Feb 27 17:03:18 2013 -0800 >>> >>> lockdep: check that no locks held at freeze time >>> >>> We shouldn't try_to_freeze if locks are held. Holding a lock can cause a >>> deadlock if the lock is later acquired in the suspend or hibernate path >>> (e.g. by dpm). Holding a lock can also cause a deadlock in the case of >>> cgroup_freezer if a lock is held inside a frozen cgroup that is later >>> acquired by a process outside that group. >>> >> >> This is bloody ridiculous... If you want to add functionality to >> implement cgroup or per-process freezing, then do it through some other >> api instead of trying to push your problems onto others by adding new >> global locking rules. >> >> Filesystems are a shared resource that have _nothing_ to do with process >> cgroups. They need to be suspended when the network goes down or other >> resources that they depend on are suspended. At that point, there is no >> "what if I launch a new mount command?" scenario. >> > > Hi Trond, > > My intention was to introduce new rules. My change simply introduces a D'oh. s/was/was not/ Regards, Mandeep > check for a deadlock case that can already happen. > > I think a deadlock could happen under the following scenario: > > 1) An administrator wants to freeze a container. Perhaps to checkpoint > it and it migrate it some place else. > 2) An nfs mount was in progress so we hit this code path and freeze > with a lock held. > 3) Another container tries to nfs mount. > 4) Deadlock. > > Regards, > Mandeep > >> Trond >> -- >> Trond Myklebust >> Linux NFS client maintainer >> >> NetApp >> Trond.Myklebust@netapp.com >> www.netapp.com ^ permalink raw reply [flat|nested] 61+ messages in thread
end of thread, other threads:[~2013-03-31 0:07 UTC | newest] Thread overview: 61+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-03-04 13:57 LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! Ming Lei 2013-03-04 14:14 ` Myklebust, Trond 2013-03-04 14:23 ` Jeff Layton 2013-03-04 19:55 ` Mandeep Singh Baines 2013-03-04 20:53 ` Oleg Nesterov 2013-03-04 22:08 ` Myklebust, Trond 2013-03-05 13:23 ` Jeff Layton 2013-03-05 17:46 ` Tejun Heo 2013-03-05 17:49 ` Tejun Heo 2013-03-05 19:03 ` Jeff Layton 2013-03-05 19:09 ` Tejun Heo 2013-03-05 23:39 ` Jeff Layton 2013-03-05 23:47 ` Tejun Heo 2013-03-06 18:16 ` Oleg Nesterov 2013-03-06 18:53 ` Tejun Heo 2013-03-06 21:00 ` Linus Torvalds 2013-03-06 21:24 ` Tejun Heo 2013-03-06 21:31 ` Linus Torvalds 2013-03-06 21:36 ` Tejun Heo 2013-03-06 21:40 ` Tejun Heo 2013-03-13 15:17 ` Jeff Layton 2013-03-31 0:07 ` Paul Walmsley 2013-03-07 11:41 ` Jeff Layton 2013-03-07 15:25 ` Tejun Heo 2013-03-07 15:55 ` Linus Torvalds 2013-03-07 15:59 ` Myklebust, Trond 2013-03-07 16:25 ` Linus Torvalds 2013-03-07 16:45 ` Myklebust, Trond 2013-03-07 17:03 ` Linus Torvalds 2013-03-07 17:16 ` Myklebust, Trond 2013-03-07 21:43 ` Jeff Layton 2013-03-08 14:01 ` Ingo Molnar 2013-03-07 20:55 ` Rafael J. Wysocki 2013-03-07 16:00 ` Tejun Heo 2013-03-06 18:17 ` Oleg Nesterov 2013-03-06 18:40 ` Jeff Layton 2013-03-06 18:45 ` Tejun Heo 2013-03-06 1:10 ` Myklebust, Trond 2013-03-06 1:14 ` Tejun Heo 2013-03-06 1:28 ` Tejun Heo 2013-03-06 12:00 ` Jeff Layton 2013-03-05 23:11 ` J. Bruce Fields 2013-03-06 0:02 ` Rafael J. Wysocki 2013-03-06 0:30 ` [PATCH] lockdep: make lock held while freezing check optional Mandeep Singh Baines 2013-03-07 12:03 ` Maarten Lankhorst 2013-03-06 0:59 ` LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held! Mandeep Singh Baines 2013-03-06 1:05 ` J. Bruce Fields 2013-03-06 1:16 ` Tejun Heo 2013-03-06 3:11 ` Mandeep Singh Baines 2013-03-06 9:09 ` Ingo Molnar 2013-03-06 12:06 ` Jeff Layton 2013-03-06 15:59 ` Mandeep Singh Baines 2013-03-06 18:23 ` Jeff Layton 2013-03-06 18:37 ` Myklebust, Trond 2013-03-06 20:15 ` Mandeep Singh Baines 2013-03-04 14:40 ` Ming Lei 2013-03-04 15:04 ` Jeff Layton 2013-03-04 15:33 ` Ming Lei 2013-03-04 15:53 ` Myklebust, Trond 2013-03-04 20:09 ` Mandeep Singh Baines 2013-03-04 20:10 ` Mandeep Singh Baines
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).