netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: selinux networking: sleeping functin called from invalid context in 2.6.20-rc[12]
       [not found] <20061225052124.A10323@freya>
@ 2006-12-25  0:25 ` Andrew Morton
  2007-01-02  7:58   ` Adam J. Richter
                     ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Andrew Morton @ 2006-12-25  0:25 UTC (permalink / raw)
  To: Adam J. Richter; +Cc: linux-kernel, netdev, Paul Moore

On Mon, 25 Dec 2006 05:21:24 +0800
"Adam J. Richter" <adam@yggdrasil.com> wrote:

> 	Under 2.6.20-rc1 and 2.6.20-rc2, I get the following complaint
> for several network programs running on my system:
> 
> [  156.381868] BUG: sleeping function called from invalid context at net/core/sock.c:1523
> [  156.381876] in_atomic():1, irqs_disabled():0
> [  156.381881] no locks held by kio_http/9693.
> [  156.381886]  [<c01057a2>] show_trace_log_lvl+0x1a/0x2f
> [  156.381900]  [<c0105dab>] show_trace+0x12/0x14
> [  156.381908]  [<c0105e48>] dump_stack+0x16/0x18
> [  156.381917]  [<c011e30f>] __might_sleep+0xe5/0xeb
> [  156.381926]  [<c025942a>] lock_sock_nested+0x1d/0xc4
> [  156.381937]  [<c01cc570>] selinux_netlbl_inode_permission+0x5a/0x8e
> [  156.381946]  [<c01c2505>] selinux_file_permission+0x96/0x9b
> [  156.381954]  [<c0175a0a>] vfs_write+0x8d/0x167
> [  156.381962]  [<c017605a>] sys_write+0x3f/0x63
> [  156.381971]  [<c01040c0>] syscall_call+0x7/0xb
> [  156.381980]  =======================
> 

There's a glaring bug in selinux_netlbl_inode_permission() - taking
lock_sock() inside rcu_read_lock().

I would again draw attention to Documentation/SubmitChecklist.  In
particular please always always always enable all kernel debugging options
when developing and testing new kernel code.  And everything else in that
file, too.

<guesses that this was tested on ia64>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: selinux networking: sleeping functin called from invalid context in 2.6.20-rc[12]
  2006-12-25  0:25 ` selinux networking: sleeping functin called from invalid context in 2.6.20-rc[12] Andrew Morton
@ 2007-01-02  7:58   ` Adam J. Richter
  2007-01-02 21:25     ` Paul Moore
  2007-01-02 20:09   ` [patch] selinux: fix selinux_netlbl_inode_permission() locking Ingo Molnar
  2007-01-02 21:14   ` selinux networking: sleeping functin called from invalid context in 2.6.20-rc[12] Paul Moore
  2 siblings, 1 reply; 7+ messages in thread
From: Adam J. Richter @ 2007-01-02  7:58 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, netdev, Paul Moore

[-- Attachment #1: Type: text/plain, Size: 2131 bytes --]

On Sun, Dec 24, 2006 at 04:25:11PM -0800, Andrew Morton wrote:
> On Mon, 25 Dec 2006 05:21:24 +0800
> "Adam J. Richter" <adam@yggdrasil.com> wrote:
> 
>> 	Under 2.6.20-rc1 and 2.6.20-rc2, I get the following complaint
>> for several network programs running on my system:
>> 
>> [  156.381868] BUG: sleeping function called from invalid context at net/core/sock.c:1523
[...]
> There's a glaring bug in selinux_netlbl_inode_permission() - taking
> lock_sock() inside rcu_read_lock().
> 
> I would again draw attention to Documentation/SubmitChecklist.  In
> particular please always always always enable all kernel debugging options
> when developing and testing new kernel code.  And everything else in that
> file, too.
> 
> <guesses that this was tested on ia64>

	I have not yet performed the 21 steps of
linux-2.6.20-rc3/Documentation/SubmitChecklist, which I think is a
great objectives list for future automation or some kind of community
web site.  I hope to find time to make progress through that
checklist, but, in the meantime, I think the world may nevertheless be
infinitesmally better off if I post the patch that I'm currently
using that seems to fix the problem, seeing as how rc3 has passed
with no fix incorporated.

	I think the intent of the offending code was to avoid doing
a lock_sock() in a presumably common case where there was no need to
take the lock.  So, I have kept the presumably fast test to exit
early.

	When it turns out to be necessary to take lock_sock(), RCU is
unlocked, then lock_sock is taken, the RCU is locked again, and
the test is repeated.

	If I am wrong about lock_sock being expensive, I can
delete the lines that do the early return.

	By the way, in a change not included in this patch,
I also tried consolidating the RCU locking in this file into a macro
IF_NLBL_REQUIRE(sksec, action), where "action" is the code
fragment to be executed with rcu_read_lock() held, although this
required splitting a couple of functions in half.

	Anyhow, here is my current patch as MIME attachment.
Comments and labor in getting it through SubmitChecklist would
both be welcome.

Adam Richter

[-- Attachment #2: selinux.diff --]
[-- Type: text/plain, Size: 621 bytes --]

--- linux-2.6.20-rc3/security/selinux/ss/services.c	2007-01-02 01:47:40.000000000 +0800
+++ linux/security/selinux/ss/services.c	2007-01-02 15:36:30.000000000 +0800
@@ -2658,14 +2658,22 @@
 	rcu_read_lock();
 	if (sksec->nlbl_state != NLBL_REQUIRE) {
 		rcu_read_unlock();
 		return 0;
 	}
+	rcu_read_unlock();
+
+
+	rc = 0;
 	lock_sock(sock->sk);
-	rc = selinux_netlbl_socket_setsid(sock, sksec->sid);
-	release_sock(sock->sk);
+	rcu_read_lock();
+
+	if (sksec->nlbl_state == NLBL_REQUIRE)
+		rc = selinux_netlbl_socket_setsid(sock, sksec->sid);
+
 	rcu_read_unlock();
+	release_sock(sock->sk);
 
 	return rc;
 }
 
 /**

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [patch] selinux: fix selinux_netlbl_inode_permission() locking
  2006-12-25  0:25 ` selinux networking: sleeping functin called from invalid context in 2.6.20-rc[12] Andrew Morton
  2007-01-02  7:58   ` Adam J. Richter
@ 2007-01-02 20:09   ` Ingo Molnar
  2007-01-02 21:14   ` selinux networking: sleeping functin called from invalid context in 2.6.20-rc[12] Paul Moore
  2 siblings, 0 replies; 7+ messages in thread
From: Ingo Molnar @ 2007-01-02 20:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Adam J. Richter, linux-kernel, netdev, Paul Moore, Linus Torvalds


* Andrew Morton <akpm@osdl.org> wrote:

> There's a glaring bug in selinux_netlbl_inode_permission() - taking 
> lock_sock() inside rcu_read_lock().

Note that the bug is still in -rc3, and is easily triggerable via a 
default FC6 bootup. It's fixed by the (slightly modified) patch from 
Parag Warudkar below that i have in the -rt tree.

Note that this bug became visible due to the recent __resched_legal() 
fix, which bug made most of our atomicity debugging checks ineffective. 
About half a dozen separate atomicity bugs triggered in -rt when i fixed 
the __resched_legal() bug, so i'd expect some more to trigger upstream 
too.

	Ingo

------------------------>
Subject: [patch] selinux: fix selinux_netlbl_inode_permission() locking
From: Parag Warudkar <paragw@paragw.zapto.org>

do not call a sleeping lock API in an RCU read section.
lock_sock_nested can sleep, its BH counterpart doesn't. 
selinux_netlbl_inode_permission() needs to use the BH counterpart
unconditionally.

Compile tested.

From: Ingo Molnar <mingo@elte.hu>

added BH disabling, because this function can be called from non-atomic
contexts too, so a naked bh_lock_sock() would be deadlock-prone.

Boot-tested the resulting kernel.

Signed-off-by: Parag Warudkar <paragw@paragw.zapto.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 security/selinux/ss/services.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Index: linux/security/selinux/ss/services.c
===================================================================
--- linux.orig/security/selinux/ss/services.c
+++ linux/security/selinux/ss/services.c
@@ -2660,9 +2660,11 @@ int selinux_netlbl_inode_permission(stru
 		rcu_read_unlock();
 		return 0;
 	}
-	lock_sock(sock->sk);
+	local_bh_disable();
+	bh_lock_sock_nested(sock->sk);
 	rc = selinux_netlbl_socket_setsid(sock, sksec->sid);
-	release_sock(sock->sk);
+	bh_unlock_sock(sock->sk);
+	local_bh_enable();
 	rcu_read_unlock();
 
 	return rc;

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: selinux networking: sleeping functin called from invalid context in 2.6.20-rc[12]
  2006-12-25  0:25 ` selinux networking: sleeping functin called from invalid context in 2.6.20-rc[12] Andrew Morton
  2007-01-02  7:58   ` Adam J. Richter
  2007-01-02 20:09   ` [patch] selinux: fix selinux_netlbl_inode_permission() locking Ingo Molnar
@ 2007-01-02 21:14   ` Paul Moore
  2 siblings, 0 replies; 7+ messages in thread
From: Paul Moore @ 2007-01-02 21:14 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Adam J. Richter, linux-kernel, netdev, Ingo Molnar

On Sunday, December 24 2006 7:25 pm, Andrew Morton wrote:
> On Mon, 25 Dec 2006 05:21:24 +0800
>
> "Adam J. Richter" <adam@yggdrasil.com> wrote:
> > 	Under 2.6.20-rc1 and 2.6.20-rc2, I get the following complaint
> > for several network programs running on my system:
> >
> > [  156.381868] BUG: sleeping function called from invalid context at
> > net/core/sock.c:1523 [  156.381876] in_atomic():1, irqs_disabled():0
> > [  156.381881] no locks held by kio_http/9693.
> > [  156.381886]  [<c01057a2>] show_trace_log_lvl+0x1a/0x2f
> > [  156.381900]  [<c0105dab>] show_trace+0x12/0x14
> > [  156.381908]  [<c0105e48>] dump_stack+0x16/0x18
> > [  156.381917]  [<c011e30f>] __might_sleep+0xe5/0xeb
> > [  156.381926]  [<c025942a>] lock_sock_nested+0x1d/0xc4
> > [  156.381937]  [<c01cc570>] selinux_netlbl_inode_permission+0x5a/0x8e
> > [  156.381946]  [<c01c2505>] selinux_file_permission+0x96/0x9b
> > [  156.381954]  [<c0175a0a>] vfs_write+0x8d/0x167
> > [  156.381962]  [<c017605a>] sys_write+0x3f/0x63
> > [  156.381971]  [<c01040c0>] syscall_call+0x7/0xb
> > [  156.381980]  =======================
>
> There's a glaring bug in selinux_netlbl_inode_permission() - taking
> lock_sock() inside rcu_read_lock().

Sorry for the delay, I'm finally back at a machine where I can look at the 
code.

I've been thinking about Parag Warudkar's and Ingo Molnar's patches as well as 
what the selinux_netlbl_inode_permission() function actually needs to do; I 
think the best answer isn't so much to change the socket locking calls, but 
to restructure the function a bit.

Currently the function does the following (in order):

 1. do some quick sanity checks (is the inode a socket, etc)
 2. rcu_read_lock()
 3. check the nlbl_state is set to NLBL_REQUIRE (otherwise return)
 4. lock_sock()
 5. netlabel magic
 6. release_sock()
 7. rcu_read_unlock()

I propose changing it to the following (in order):

  1. do some quick sanity checks (is the inode a socket, etc)
  2. rcu_read_lock()
  3. check the nlbl_state is set to NLBL_REQUIRE (otherwise return)
  4. rcu_read_unlock()
  5. lock_sock()
  6. rcu_read_lock()
  7. verify that nlbl_state is still set to NLBL_REQUIRE (otherwise return)
  8. netlabel magic
  9. rcu_read_unlock()
 10. release_sock()

This way we no longer need to worry about any special socket locking.  I 
realize this adds a bit of duplicated work but it is my understanding that 
RCU lock/unlock operations are *very* fast so the extra RCU lock operations 
shouldn't be too bad and the extra nlbl_state check should be of minimal 
cost.

However, I'm not the expert here, just a guy learning as he goes so any 
comments/feedback on the above proposal are welcome.  If it turns out this 
approach has some merit I'll put together a patch and send it out.

Once again, sorry for the regression.

-- 
paul moore
linux security @ hp

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: selinux networking: sleeping functin called from invalid context in 2.6.20-rc[12]
  2007-01-02  7:58   ` Adam J. Richter
@ 2007-01-02 21:25     ` Paul Moore
  2007-01-02 23:37       ` David Miller
  0 siblings, 1 reply; 7+ messages in thread
From: Paul Moore @ 2007-01-02 21:25 UTC (permalink / raw)
  To: Adam J. Richter; +Cc: Andrew Morton, linux-kernel, netdev

On Tuesday, January 2 2007 2:58 am, Adam J. Richter wrote:
> 	I have not yet performed the 21 steps of
> linux-2.6.20-rc3/Documentation/SubmitChecklist, which I think is a
> great objectives list for future automation or some kind of community
> web site.  I hope to find time to make progress through that
> checklist, but, in the meantime, I think the world may nevertheless be
> infinitesmally better off if I post the patch that I'm currently
> using that seems to fix the problem, seeing as how rc3 has passed
> with no fix incorporated.
>
> 	I think the intent of the offending code was to avoid doing
> a lock_sock() in a presumably common case where there was no need to
> take the lock.  So, I have kept the presumably fast test to exit
> early.
>
> 	When it turns out to be necessary to take lock_sock(), RCU is
> unlocked, then lock_sock is taken, the RCU is locked again, and
> the test is repeated.

Hi Adam,

I'm sorry I just saw this mail (mail not sent directly to me get shuffled off 
to a folder).  I agree with your patch, I think dropping and then re-taking 
the RCU lock is the best way to go, although I'm curious to see what others 
have to say.

The only real comment I have with the patch is that there is some extra 
whitespace which could probably be removed, but that is more of a style nit 
than anything substantial.

> 	By the way, in a change not included in this patch,
> I also tried consolidating the RCU locking in this file into a macro
> IF_NLBL_REQUIRE(sksec, action), where "action" is the code
> fragment to be executed with rcu_read_lock() held, although this
> required splitting a couple of functions in half.

>From your description above I'm not sure I like that approach so much, 
however, I could be misunderstanding something.  Do you have a small example 
you could send?

-- 
paul moore
linux security @ hp

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: selinux networking: sleeping functin called from invalid context in 2.6.20-rc[12]
  2007-01-02 21:25     ` Paul Moore
@ 2007-01-02 23:37       ` David Miller
  2007-01-03 20:46         ` Paul Moore
  0 siblings, 1 reply; 7+ messages in thread
From: David Miller @ 2007-01-02 23:37 UTC (permalink / raw)
  To: paul.moore; +Cc: adam, akpm, linux-kernel, netdev

From: Paul Moore <paul.moore@hp.com>
Date: Tue, 2 Jan 2007 16:25:24 -0500

> I'm sorry I just saw this mail (mail not sent directly to me get
> shuffled off to a folder).  I agree with your patch, I think
> dropping and then re-taking the RCU lock is the best way to go,
> although I'm curious to see what others have to say.

I think this is fine too.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: selinux networking: sleeping functin called from invalid context in 2.6.20-rc[12]
  2007-01-02 23:37       ` David Miller
@ 2007-01-03 20:46         ` Paul Moore
  0 siblings, 0 replies; 7+ messages in thread
From: Paul Moore @ 2007-01-03 20:46 UTC (permalink / raw)
  To: David Miller; +Cc: adam, akpm, Ingo Molnar, netdev

On Tuesday, January 2 2007 6:37 pm, David Miller wrote:
> From: Paul Moore <paul.moore@hp.com>
> Date: Tue, 2 Jan 2007 16:25:24 -0500
>
> > I'm sorry I just saw this mail (mail not sent directly to me get
> > shuffled off to a folder).  I agree with your patch, I think
> > dropping and then re-taking the RCU lock is the best way to go,
> > although I'm curious to see what others have to say.
>
> I think this is fine too.

[NOTE: dropped linux-kernel as I think this discussion is now strictly related 
to socket locking so netdev is probably the best list]

I've been looking some more at Adam's and Ingo's patches for this as well as a 
recent bug against a FC test kernel:

 * https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=220966

For those who don't follow the link here is the meat of the bug report:

****
[ INFO: soft-safe -> soft-unsafe lock order detected ]
2.6.19-1.2891.fc7 #1
------------------------------------------------------
cupsd/1884 [HC0[0]:SC0[1]:HE1:SE0] is trying to acquire:
 (&ssec->nlbl_lock){--..}, at: [<c04cec37>]
selinux_netlbl_socket_setsid+0xbb/0x123

and this task is already holding:
 (af_callback_keys + sk->sk_family#3){-.-+}, at: [<c05daa1c>]
inet_accept+0x70/0xb5
which would create a new lock dependency:
 (af_callback_keys + sk->sk_family#3){-.-+} -> (&ssec->nlbl_lock){--..}

but this new dependency connects a soft-irq-safe lock:
 (af_callback_keys + sk->sk_family#3){-.-+}
... which became soft-irq-safe at:
 [<c043fff1>] __lock_acquire+0x37d/0x9f8
 [<c044094d>] lock_acquire+0x56/0x6f
 [<c05fbdb6>] _read_lock_bh+0x30/0x3d
 [<c04c687e>] selinux_socket_sock_rcv_skb+0xbd/0x252
 [<c05d0645>] tcp_v4_rcv+0x37a/0x909
 [<c05b7593>] ip_local_deliver+0x185/0x22e
 [<c05b73d6>] ip_rcv+0x418/0x450
 [<c059ae9c>] netif_receive_skb+0x2db/0x35a
 [<c059c85f>] process_backlog+0x95/0xf6
 [<c059ca46>] net_rx_action+0xa1/0x1a8
 [<c042bf5a>] __do_softirq+0x6f/0xe2
 [<c04063a1>] do_softirq+0x61/0xc7
 [<ffffffff>] 0xffffffff

to a soft-irq-unsafe lock:
 (&ssec->nlbl_lock){--..}
... which became soft-irq-unsafe at:
...  [<c044007d>] __lock_acquire+0x409/0x9f8
 [<c044094d>] lock_acquire+0x56/0x6f
 [<c05fbc89>] _spin_lock+0x2b/0x38
 [<c04cec37>] selinux_netlbl_socket_setsid+0xbb/0x123
 [<c04d0c92>] selinux_netlbl_socket_post_create+0x2d/0x2f
 [<c04c807b>] selinux_socket_post_create+0x156/0x15c
 [<c059213c>] __sock_create+0x179/0x1b2
 [<c05921ae>] sock_create+0x1a/0x1f
 [<c0592435>] sys_socket+0x1b/0x3c
 [<c0592cba>] sys_socketcall+0x77/0x241
 [<c0404050>] syscall_call+0x7/0xb
 [<ffffffff>] 0xffffffff
****

This makes me believe that Ingo's patch (which I see is now in Linus' and 
Andrew's trees) is the way to go and not the lock re-order approach in Adam's 
patch.  I'm going to continue to look into this, almost more for my own 
education than anything else, but I thought I would mention this lock 
dependency message as it seemed relevant to the discussion.

-- 
paul moore
linux security @ hp

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2007-01-03 20:46 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20061225052124.A10323@freya>
2006-12-25  0:25 ` selinux networking: sleeping functin called from invalid context in 2.6.20-rc[12] Andrew Morton
2007-01-02  7:58   ` Adam J. Richter
2007-01-02 21:25     ` Paul Moore
2007-01-02 23:37       ` David Miller
2007-01-03 20:46         ` Paul Moore
2007-01-02 20:09   ` [patch] selinux: fix selinux_netlbl_inode_permission() locking Ingo Molnar
2007-01-02 21:14   ` selinux networking: sleeping functin called from invalid context in 2.6.20-rc[12] Paul Moore

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).