Re: [PATCH v2 4/7] fs/proc/task_mmu.c: shift mm_access() from m_start() to proc_maps

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH v2 4/7] fs/proc/task_mmu.c: shift mm_access() from m_start() to proc_maps_open()
       [not found] ` <20140805194655.GA30728@redhat.com>
@ 2014-12-03 14:14   ` Kirill A. Shutemov
  2014-12-03 16:59     ` Eric W. Biederman
  2014-12-03 17:34     ` Oleg Nesterov
  0 siblings, 2 replies; 4+ messages in thread
From: Kirill A. Shutemov @ 2014-12-03 14:14 UTC (permalink / raw)
  To: Oleg Nesterov, David S. Miller, Linus Torvalds
  Cc: Andrew Morton, Alexander Viro, Cyrill Gorcunov, David Howells,
	Eric W. Biederman, Kirill A. Shutemov, Peter Zijlstra,
	Sasha Levin, linux-fsdevel, linux-kernel, Alexey Dobriyan, netdev

On Tue, Aug 05, 2014 at 09:46:55PM +0200, Oleg Nesterov wrote:
> A simple test-case from Kirill Shutemov
> 
> 	cat /proc/self/maps >/dev/null
> 	chmod +x /proc/self/net/packet
> 	exec /proc/self/net/packet
> 
> makes lockdep unhappy, cat/exec take seq_file->lock + cred_guard_mutex in
> the opposite order.

Oleg, I see it again with almost the same test-case:

	cat /proc/self/stack >/dev/null
	chmod +x /proc/self/net/packet
	exec /proc/self/net/packet

Looks like bunch of proc files were converted to use seq_file by Alexey
Dobriyan around the same time you've fixed the issue for /proc/pid/maps.

More generic test-case:

	find /proc/self/ -type f -exec dd if='{}' of=/dev/null bs=1 count=1 ';' 2>/dev/null
	chmod +x /proc/self/net/packet
	exec /proc/self/net/packet

David, any justification for allowing chmod +x for files under
/proc/pid/net?

[    2.042212] ======================================================
[    2.042930] [ INFO: possible circular locking dependency detected ]
[    2.043648] 3.18.0-rc7-00003-g3a18ca061311-dirty #237 Not tainted
[    2.044350] -------------------------------------------------------
[    2.045054] sh/94 is trying to acquire lock:
[    2.045546]  (&p->lock){+.+.+.}, at: [<ffffffff811e12fd>] seq_read+0x3d/0x3e0
[    2.045781] 
[    2.045781] but task is already holding lock:
[    2.045781]  (&sig->cred_guard_mutex){+.+.+.}, at: [<ffffffff811c0e3d>] prepare_bprm_creds+0x2d/0x90
[    2.045781] 
[    2.045781] which lock already depends on the new lock.
[    2.045781] 
[    2.045781] 
[    2.045781] the existing dependency chain (in reverse order) is:
[    2.045781] 
-> #1 (&sig->cred_guard_mutex){+.+.+.}:
[    2.045781]        [<ffffffff810a6e99>] __lock_acquire+0x4d9/0xd40
[    2.045781]        [<ffffffff810a7ff2>] lock_acquire+0xd2/0x2a0
[    2.045781]        [<ffffffff81849da6>] mutex_lock_killable_nested+0x66/0x460
[    2.045781]        [<ffffffff81229de4>] lock_trace+0x24/0x70
[    2.045781]        [<ffffffff81229e8f>] proc_pid_stack+0x5f/0xe0
[    2.045781]        [<ffffffff81227244>] proc_single_show+0x54/0xa0
[    2.045781]        [<ffffffff811e13a0>] seq_read+0xe0/0x3e0
[    2.045781]        [<ffffffff811b9377>] vfs_read+0x97/0x180
[    2.045781]        [<ffffffff811b9f5d>] SyS_read+0x4d/0xc0
[    2.045781]        [<ffffffff8184e492>] system_call_fastpath+0x12/0x17
[    2.045781] 
-> #0 (&p->lock){+.+.+.}:
[    2.045781]        [<ffffffff810a389f>] validate_chain.isra.36+0xfff/0x1400
[    2.045781]        [<ffffffff810a6e99>] __lock_acquire+0x4d9/0xd40
[    2.045781]        [<ffffffff810a7ff2>] lock_acquire+0xd2/0x2a0
[    2.045781]        [<ffffffff81849629>] mutex_lock_nested+0x69/0x3c0
[    2.045781]        [<ffffffff811e12fd>] seq_read+0x3d/0x3e0
[    2.045781]        [<ffffffff81226428>] proc_reg_read+0x48/0x70
[    2.045781]        [<ffffffff811b9377>] vfs_read+0x97/0x180
[    2.045781]        [<ffffffff811bf1a8>] kernel_read+0x48/0x60
[    2.045781]        [<ffffffff811bfb2c>] prepare_binprm+0xdc/0x180
[    2.045781]        [<ffffffff811c139a>] do_execve_common.isra.29+0x4fa/0x960
[    2.045781]        [<ffffffff811c1818>] do_execve+0x18/0x20
[    2.045781]        [<ffffffff811c1b05>] SyS_execve+0x25/0x30
[    2.045781]        [<ffffffff8184ea49>] stub_execve+0x69/0xa0
[    2.045781] 
[    2.045781] other info that might help us debug this:
[    2.045781] 
[    2.045781]  Possible unsafe locking scenario:
[    2.045781] 
[    2.045781]        CPU0                    CPU1
[    2.045781]        ----                    ----
[    2.045781]   lock(&sig->cred_guard_mutex);
[    2.045781]                                lock(&p->lock);
[    2.045781]                                lock(&sig->cred_guard_mutex);
[    2.045781]   lock(&p->lock);
[    2.045781] 
[    2.045781]  *** DEADLOCK ***
[    2.045781] 
[    2.045781] 1 lock held by sh/94:
[    2.045781]  #0:  (&sig->cred_guard_mutex){+.+.+.}, at: [<ffffffff811c0e3d>] prepare_bprm_creds+0x2d/0x90
[    2.045781] 
[    2.045781] stack backtrace:
[    2.045781] CPU: 0 PID: 94 Comm: sh Not tainted 3.18.0-rc7-00003-g3a18ca061311-dirty #237
[    2.045781] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[    2.045781]  ffffffff82a48d50 ffff88085427bad8 ffffffff81844a85 0000000000000cac
[    2.045781]  ffffffff82a654a0 ffff88085427bb28 ffffffff810a1b03 0000000000000000
[    2.045781]  ffff88085427bb68 ffff88085427bb28 ffff8808547f1500 ffff8808547f1c40
[    2.045781] Call Trace:
[    2.045781]  [<ffffffff81844a85>] dump_stack+0x4e/0x68
[    2.045781]  [<ffffffff810a1b03>] print_circular_bug+0x203/0x310
[    2.045781]  [<ffffffff810a389f>] validate_chain.isra.36+0xfff/0x1400
[    2.045781]  [<ffffffff8108fa76>] ? local_clock+0x16/0x30
[    2.045781]  [<ffffffff810a6e99>] __lock_acquire+0x4d9/0xd40
[    2.045781]  [<ffffffff810a7ff2>] lock_acquire+0xd2/0x2a0
[    2.045781]  [<ffffffff811e12fd>] ? seq_read+0x3d/0x3e0
[    2.045781]  [<ffffffff81849629>] mutex_lock_nested+0x69/0x3c0
[    2.045781]  [<ffffffff811e12fd>] ? seq_read+0x3d/0x3e0
[    2.045781]  [<ffffffff8108f9f8>] ? sched_clock_cpu+0x98/0xc0
[    2.045781]  [<ffffffff811e12fd>] ? seq_read+0x3d/0x3e0
[    2.045781]  [<ffffffff814050b9>] ? lockref_put_or_lock+0x29/0x40
[    2.045781]  [<ffffffff811e12fd>] seq_read+0x3d/0x3e0
[    2.045781]  [<ffffffff814050b9>] ? lockref_put_or_lock+0x29/0x40
[    2.045781]  [<ffffffff81226428>] proc_reg_read+0x48/0x70
[    2.045781]  [<ffffffff811b9377>] vfs_read+0x97/0x180
[    2.045781]  [<ffffffff811bf1a8>] kernel_read+0x48/0x60
[    2.045781]  [<ffffffff811bfb2c>] prepare_binprm+0xdc/0x180
[    2.045781]  [<ffffffff811c139a>] do_execve_common.isra.29+0x4fa/0x960
[    2.092142] tsc: Refined TSC clocksource calibration: 2693.484 MHz
[    2.045781]  [<ffffffff811c0fd3>] ? do_execve_common.isra.29+0x133/0x960
[    2.045781]  [<ffffffff8184f04d>] ? retint_swapgs+0xe/0x13
[    2.045781]  [<ffffffff811c1818>] do_execve+0x18/0x20
[    2.045781]  [<ffffffff811c1b05>] SyS_execve+0x25/0x30
[    2.045781]  [<ffffffff8184ea49>] stub_execve+0x69/0xa0
-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2 4/7] fs/proc/task_mmu.c: shift mm_access() from m_start() to proc_maps_open()
  2014-12-03 14:14   ` [PATCH v2 4/7] fs/proc/task_mmu.c: shift mm_access() from m_start() to proc_maps_open() Kirill A. Shutemov
@ 2014-12-03 16:59     ` Eric W. Biederman
  2014-12-04 16:17       ` Kirill A. Shutemov
  2014-12-03 17:34     ` Oleg Nesterov
  1 sibling, 1 reply; 4+ messages in thread
From: Eric W. Biederman @ 2014-12-03 16:59 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Oleg Nesterov, David S. Miller, Linus Torvalds, Andrew Morton,
	Alexander Viro, Cyrill Gorcunov, David Howells,
	Kirill A. Shutemov, Peter Zijlstra, Sasha Levin, linux-fsdevel,
	linux-kernel, Alexey Dobriyan, netdev

"Kirill A. Shutemov" <kirill@shutemov.name> writes:

> On Tue, Aug 05, 2014 at 09:46:55PM +0200, Oleg Nesterov wrote:
>> A simple test-case from Kirill Shutemov
>> 
>> 	cat /proc/self/maps >/dev/null
>> 	chmod +x /proc/self/net/packet
>> 	exec /proc/self/net/packet
>> 
>> makes lockdep unhappy, cat/exec take seq_file->lock + cred_guard_mutex in
>> the opposite order.
>
> Oleg, I see it again with almost the same test-case:
>
> 	cat /proc/self/stack >/dev/null
> 	chmod +x /proc/self/net/packet
> 	exec /proc/self/net/packet
>
> Looks like bunch of proc files were converted to use seq_file by Alexey
> Dobriyan around the same time you've fixed the issue for /proc/pid/maps.
>
> More generic test-case:
>
> 	find /proc/self/ -type f -exec dd if='{}' of=/dev/null bs=1 count=1 ';' 2>/dev/null
> 	chmod +x /proc/self/net/packet
> 	exec /proc/self/net/packet
>
> David, any justification for allowing chmod +x for files under
> /proc/pid/net?

I don't think there are any good reasons for allowing chmod +x for the
proc generic files.   Certainly executing any of them is nonsense.

I do recall some weird conner cases existing.  I think they resulted
in a need to preserve chmod if not chmod +x.  This is just me saying
tread carefully before you change anything.

It really should be safe to tweak proc_notify_change to not allow
messing with the executable bits of proc files.

> [    2.042212] ======================================================
> [    2.042930] [ INFO: possible circular locking dependency detected ]
> [    2.043648] 3.18.0-rc7-00003-g3a18ca061311-dirty #237 Not tainted
> [    2.044350] -------------------------------------------------------
> [    2.045054] sh/94 is trying to acquire lock:
> [    2.045546]  (&p->lock){+.+.+.}, at: [<ffffffff811e12fd>] seq_read+0x3d/0x3e0
> [    2.045781] 
> [    2.045781] but task is already holding lock:
> [    2.045781]  (&sig->cred_guard_mutex){+.+.+.}, at: [<ffffffff811c0e3d>] prepare_bprm_creds+0x2d/0x90
> [    2.045781] 
> [    2.045781] which lock already depends on the new lock.
> [    2.045781] 
> [    2.045781] 
> [    2.045781] the existing dependency chain (in reverse order) is:
> [    2.045781] 
> -> #1 (&sig->cred_guard_mutex){+.+.+.}:
> [    2.045781]        [<ffffffff810a6e99>] __lock_acquire+0x4d9/0xd40
> [    2.045781]        [<ffffffff810a7ff2>] lock_acquire+0xd2/0x2a0
> [    2.045781]        [<ffffffff81849da6>] mutex_lock_killable_nested+0x66/0x460
> [    2.045781]        [<ffffffff81229de4>] lock_trace+0x24/0x70
> [    2.045781]        [<ffffffff81229e8f>] proc_pid_stack+0x5f/0xe0
> [    2.045781]        [<ffffffff81227244>] proc_single_show+0x54/0xa0
> [    2.045781]        [<ffffffff811e13a0>] seq_read+0xe0/0x3e0
> [    2.045781]        [<ffffffff811b9377>] vfs_read+0x97/0x180
> [    2.045781]        [<ffffffff811b9f5d>] SyS_read+0x4d/0xc0
> [    2.045781]        [<ffffffff8184e492>] system_call_fastpath+0x12/0x17
> [    2.045781] 
> -> #0 (&p->lock){+.+.+.}:
> [    2.045781]        [<ffffffff810a389f>] validate_chain.isra.36+0xfff/0x1400
> [    2.045781]        [<ffffffff810a6e99>] __lock_acquire+0x4d9/0xd40
> [    2.045781]        [<ffffffff810a7ff2>] lock_acquire+0xd2/0x2a0
> [    2.045781]        [<ffffffff81849629>] mutex_lock_nested+0x69/0x3c0
> [    2.045781]        [<ffffffff811e12fd>] seq_read+0x3d/0x3e0
> [    2.045781]        [<ffffffff81226428>] proc_reg_read+0x48/0x70
> [    2.045781]        [<ffffffff811b9377>] vfs_read+0x97/0x180
> [    2.045781]        [<ffffffff811bf1a8>] kernel_read+0x48/0x60
> [    2.045781]        [<ffffffff811bfb2c>] prepare_binprm+0xdc/0x180
> [    2.045781]        [<ffffffff811c139a>] do_execve_common.isra.29+0x4fa/0x960
> [    2.045781]        [<ffffffff811c1818>] do_execve+0x18/0x20
> [    2.045781]        [<ffffffff811c1b05>] SyS_execve+0x25/0x30
> [    2.045781]        [<ffffffff8184ea49>] stub_execve+0x69/0xa0
> [    2.045781] 
> [    2.045781] other info that might help us debug this:
> [    2.045781] 
> [    2.045781]  Possible unsafe locking scenario:
> [    2.045781] 
> [    2.045781]        CPU0                    CPU1
> [    2.045781]        ----                    ----
> [    2.045781]   lock(&sig->cred_guard_mutex);
> [    2.045781]                                lock(&p->lock);
> [    2.045781]                                lock(&sig->cred_guard_mutex);
> [    2.045781]   lock(&p->lock);
> [    2.045781] 
> [    2.045781]  *** DEADLOCK ***
> [    2.045781] 
> [    2.045781] 1 lock held by sh/94:
> [    2.045781]  #0:  (&sig->cred_guard_mutex){+.+.+.}, at: [<ffffffff811c0e3d>] prepare_bprm_creds+0x2d/0x90
> [    2.045781] 
> [    2.045781] stack backtrace:
> [    2.045781] CPU: 0 PID: 94 Comm: sh Not tainted 3.18.0-rc7-00003-g3a18ca061311-dirty #237
> [    2.045781] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
> [    2.045781]  ffffffff82a48d50 ffff88085427bad8 ffffffff81844a85 0000000000000cac
> [    2.045781]  ffffffff82a654a0 ffff88085427bb28 ffffffff810a1b03 0000000000000000
> [    2.045781]  ffff88085427bb68 ffff88085427bb28 ffff8808547f1500 ffff8808547f1c40
> [    2.045781] Call Trace:
> [    2.045781]  [<ffffffff81844a85>] dump_stack+0x4e/0x68
> [    2.045781]  [<ffffffff810a1b03>] print_circular_bug+0x203/0x310
> [    2.045781]  [<ffffffff810a389f>] validate_chain.isra.36+0xfff/0x1400
> [    2.045781]  [<ffffffff8108fa76>] ? local_clock+0x16/0x30
> [    2.045781]  [<ffffffff810a6e99>] __lock_acquire+0x4d9/0xd40
> [    2.045781]  [<ffffffff810a7ff2>] lock_acquire+0xd2/0x2a0
> [    2.045781]  [<ffffffff811e12fd>] ? seq_read+0x3d/0x3e0
> [    2.045781]  [<ffffffff81849629>] mutex_lock_nested+0x69/0x3c0
> [    2.045781]  [<ffffffff811e12fd>] ? seq_read+0x3d/0x3e0
> [    2.045781]  [<ffffffff8108f9f8>] ? sched_clock_cpu+0x98/0xc0
> [    2.045781]  [<ffffffff811e12fd>] ? seq_read+0x3d/0x3e0
> [    2.045781]  [<ffffffff814050b9>] ? lockref_put_or_lock+0x29/0x40
> [    2.045781]  [<ffffffff811e12fd>] seq_read+0x3d/0x3e0
> [    2.045781]  [<ffffffff814050b9>] ? lockref_put_or_lock+0x29/0x40
> [    2.045781]  [<ffffffff81226428>] proc_reg_read+0x48/0x70
> [    2.045781]  [<ffffffff811b9377>] vfs_read+0x97/0x180
> [    2.045781]  [<ffffffff811bf1a8>] kernel_read+0x48/0x60
> [    2.045781]  [<ffffffff811bfb2c>] prepare_binprm+0xdc/0x180
> [    2.045781]  [<ffffffff811c139a>] do_execve_common.isra.29+0x4fa/0x960
> [    2.092142] tsc: Refined TSC clocksource calibration: 2693.484 MHz
> [    2.045781]  [<ffffffff811c0fd3>] ? do_execve_common.isra.29+0x133/0x960
> [    2.045781]  [<ffffffff8184f04d>] ? retint_swapgs+0xe/0x13
> [    2.045781]  [<ffffffff811c1818>] do_execve+0x18/0x20
> [    2.045781]  [<ffffffff811c1b05>] SyS_execve+0x25/0x30
> [    2.045781]  [<ffffffff8184ea49>] stub_execve+0x69/0xa0

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2 4/7] fs/proc/task_mmu.c: shift mm_access() from m_start() to proc_maps_open()
  2014-12-03 14:14   ` [PATCH v2 4/7] fs/proc/task_mmu.c: shift mm_access() from m_start() to proc_maps_open() Kirill A. Shutemov
  2014-12-03 16:59     ` Eric W. Biederman
@ 2014-12-03 17:34     ` Oleg Nesterov
  1 sibling, 0 replies; 4+ messages in thread
From: Oleg Nesterov @ 2014-12-03 17:34 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: David S. Miller, Linus Torvalds, Andrew Morton, Alexander Viro,
	Cyrill Gorcunov, David Howells, Eric W. Biederman,
	Kirill A. Shutemov, Peter Zijlstra, Sasha Levin, linux-fsdevel,
	linux-kernel, Alexey Dobriyan, netdev

On 12/03, Kirill A. Shutemov wrote:
>
> On Tue, Aug 05, 2014 at 09:46:55PM +0200, Oleg Nesterov wrote:
> > A simple test-case from Kirill Shutemov
> >
> > 	cat /proc/self/maps >/dev/null
> > 	chmod +x /proc/self/net/packet
> > 	exec /proc/self/net/packet
> >
> > makes lockdep unhappy, cat/exec take seq_file->lock + cred_guard_mutex in
> > the opposite order.
>
> Oleg, I see it again with almost the same test-case:
>
> 	cat /proc/self/stack >/dev/null
> 	chmod +x /proc/self/net/packet
> 	exec /proc/self/net/packet

Yes, there are more lock_trace/mm_access (ab)users. Fortunately, they
are much simpler than proc/pid/maps (which also asked for other cleanups
and fixes).

I'll try to take a look, thanks for reminding.

And I agree with Eric, chmod+x probably makes no sense. Still I think
this code deserves some cleanups regardless. To the point I think that
lock_trace() should probably die.

Thanks!

Oleg.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2 4/7] fs/proc/task_mmu.c: shift mm_access() from m_start() to proc_maps_open()
  2014-12-03 16:59     ` Eric W. Biederman
@ 2014-12-04 16:17       ` Kirill A. Shutemov
  0 siblings, 0 replies; 4+ messages in thread
From: Kirill A. Shutemov @ 2014-12-04 16:17 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Oleg Nesterov, David S. Miller, Linus Torvalds, Andrew Morton,
	Alexander Viro, Cyrill Gorcunov, David Howells,
	Kirill A. Shutemov, Peter Zijlstra, Sasha Levin, linux-fsdevel,
	linux-kernel, Alexey Dobriyan, netdev

On Wed, Dec 03, 2014 at 10:59:57AM -0600, Eric W. Biederman wrote:
> "Kirill A. Shutemov" <kirill@shutemov.name> writes:
> 
> > On Tue, Aug 05, 2014 at 09:46:55PM +0200, Oleg Nesterov wrote:
> >> A simple test-case from Kirill Shutemov
> >> 
> >> 	cat /proc/self/maps >/dev/null
> >> 	chmod +x /proc/self/net/packet
> >> 	exec /proc/self/net/packet
> >> 
> >> makes lockdep unhappy, cat/exec take seq_file->lock + cred_guard_mutex in
> >> the opposite order.
> >
> > Oleg, I see it again with almost the same test-case:
> >
> > 	cat /proc/self/stack >/dev/null
> > 	chmod +x /proc/self/net/packet
> > 	exec /proc/self/net/packet
> >
> > Looks like bunch of proc files were converted to use seq_file by Alexey
> > Dobriyan around the same time you've fixed the issue for /proc/pid/maps.
> >
> > More generic test-case:
> >
> > 	find /proc/self/ -type f -exec dd if='{}' of=/dev/null bs=1 count=1 ';' 2>/dev/null
> > 	chmod +x /proc/self/net/packet
> > 	exec /proc/self/net/packet
> >
> > David, any justification for allowing chmod +x for files under
> > /proc/pid/net?
> 
> I don't think there are any good reasons for allowing chmod +x for the
> proc generic files.   Certainly executing any of them is nonsense.
> 
> I do recall some weird conner cases existing.  I think they resulted
> in a need to preserve chmod if not chmod +x.  This is just me saying
> tread carefully before you change anything.
> 
> It really should be safe to tweak proc_notify_change to not allow
> messing with the executable bits of proc files.

BTW, we have MS_NOSUID and MS_NOEXEC set in ->s_flags for procfs since
2006 -- see 92d032855e64.

But there's no code which would translate them into vfsmount->mnt_flags |=
MNT_NOSUID/MNT_NOEXEC and we bypast nosuid/noexec checks on exec path.

Hm?..

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-12-04 16:17 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20140805194627.GA30693@redhat.com>
     [not found] ` <20140805194655.GA30728@redhat.com>
2014-12-03 14:14   ` [PATCH v2 4/7] fs/proc/task_mmu.c: shift mm_access() from m_start() to proc_maps_open() Kirill A. Shutemov
2014-12-03 16:59     ` Eric W. Biederman
2014-12-04 16:17       ` Kirill A. Shutemov
2014-12-03 17:34     ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).