netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: selinux networking: sleeping functin called from invalid context in 2.6.20-rc[12]
       [not found] <20061225052124.A10323@freya>
@ 2006-12-25  0:25 ` Andrew Morton
  2007-01-02  7:58   ` Adam J. Richter
  2007-01-02 21:14   ` Paul Moore
  0 siblings, 2 replies; 8+ messages in thread
From: Andrew Morton @ 2006-12-25  0:25 UTC (permalink / raw)
  To: Adam J. Richter; +Cc: linux-kernel, netdev, Paul Moore

On Mon, 25 Dec 2006 05:21:24 +0800
"Adam J. Richter" <adam@yggdrasil.com> wrote:

> 	Under 2.6.20-rc1 and 2.6.20-rc2, I get the following complaint
> for several network programs running on my system:
> 
> [  156.381868] BUG: sleeping function called from invalid context at net/core/sock.c:1523
> [  156.381876] in_atomic():1, irqs_disabled():0
> [  156.381881] no locks held by kio_http/9693.
> [  156.381886]  [<c01057a2>] show_trace_log_lvl+0x1a/0x2f
> [  156.381900]  [<c0105dab>] show_trace+0x12/0x14
> [  156.381908]  [<c0105e48>] dump_stack+0x16/0x18
> [  156.381917]  [<c011e30f>] __might_sleep+0xe5/0xeb
> [  156.381926]  [<c025942a>] lock_sock_nested+0x1d/0xc4
> [  156.381937]  [<c01cc570>] selinux_netlbl_inode_permission+0x5a/0x8e
> [  156.381946]  [<c01c2505>] selinux_file_permission+0x96/0x9b
> [  156.381954]  [<c0175a0a>] vfs_write+0x8d/0x167
> [  156.381962]  [<c017605a>] sys_write+0x3f/0x63
> [  156.381971]  [<c01040c0>] syscall_call+0x7/0xb
> [  156.381980]  =======================
> 

There's a glaring bug in selinux_netlbl_inode_permission() - taking
lock_sock() inside rcu_read_lock().

I would again draw attention to Documentation/SubmitChecklist.  In
particular please always always always enable all kernel debugging options
when developing and testing new kernel code.  And everything else in that
file, too.

<guesses that this was tested on ia64>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: selinux networking: sleeping functin called from invalid  context in 2.6.20-rc[12]
@ 2006-12-26  5:30 Paul Moore
  0 siblings, 0 replies; 8+ messages in thread
From: Paul Moore @ 2006-12-26  5:30 UTC (permalink / raw)
  To: akpm; +Cc: adam, linux-kernel, netdev

-----Original Message-----
From: Andrew Morton <akpm@osdl.org>
Date: Sunday, Dec 24, 2006 7:25 pm
Subject: Re: selinux networking: sleeping functin called from invalid  context in 2.6.20-rc[12]

On Mon, 25 Dec 2006 05:21:24 +0800
>"Adam J. Richter" <adam@yggdrasil.com> wrote:
>
>> 	Under 2.6.20-rc1 and 2.6.20-rc2, I get the following complaint
> for several network programs running on my system:
> 
> [  156.381868] BUG: sleeping function called from invalid context at net/core/sock.c:1523
> [  156.381876] in_atomic():1, irqs_disabled():0
> [  156.381881] no locks held by kio_http/9693.
> [  156.381886]  [<c01057a2>] show_trace_log_lvl+0x1a/0x2f
> [  156.381900]  [<c0105dab>] show_trace+0x12/0x14
> [  156.381908]  [<c0105e48>] dump_stack+0x16/0x18
> [  156.381917]  [<c011e30f>] __might_sleep+0xe5/0xeb
> [  156.381926]  [<c025942a>] lock_sock_nested+0x1d/0xc4
> [  156.381937]  [<c01cc570>] selinux_netlbl_inode_permission+0x5a/0x8e
> [  156.381946]  [<c01c2505>] selinux_file_permission+0x96/0x9b
> [  156.381954]  [<c0175a0a>] vfs_write+0x8d/0x167
> [  156.381962]  [<c017605a>] sys_write+0x3f/0x63
> [  156.381971]  [<c01040c0>] syscall_call+0x7/0xb
> [  156.381980]  =======================
> 
>
>There's a glaring bug in selinux_netlbl_inode_permission() - taking lock_sock() inside rcu_read_lock().
>
>I would again draw attention to Documentation/SubmitChecklist.  In
>particular please always always always enable all kernel debugging options when developing and testing new kernel code.  And everything else in that file, too.
>

I apologize for the mistake - I'm still trying to get a firm grasp on some of the finer points of Linux kernel development and I obviously missed something here.  Unfortunately, due to the holiday I won't be able to write/test/submit a patch until after the first of the year but I promise to do so as soon as I am able.

. paul moore
. linux security @ hp


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: selinux networking: sleeping functin called from invalid context in 2.6.20-rc[12]
  2006-12-25  0:25 ` Andrew Morton
@ 2007-01-02  7:58   ` Adam J. Richter
  2007-01-02 21:25     ` Paul Moore
  2007-01-02 21:14   ` Paul Moore
  1 sibling, 1 reply; 8+ messages in thread
From: Adam J. Richter @ 2007-01-02  7:58 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, netdev, Paul Moore

[-- Attachment #1: Type: text/plain, Size: 2131 bytes --]

On Sun, Dec 24, 2006 at 04:25:11PM -0800, Andrew Morton wrote:
> On Mon, 25 Dec 2006 05:21:24 +0800
> "Adam J. Richter" <adam@yggdrasil.com> wrote:
> 
>> 	Under 2.6.20-rc1 and 2.6.20-rc2, I get the following complaint
>> for several network programs running on my system:
>> 
>> [  156.381868] BUG: sleeping function called from invalid context at net/core/sock.c:1523
[...]
> There's a glaring bug in selinux_netlbl_inode_permission() - taking
> lock_sock() inside rcu_read_lock().
> 
> I would again draw attention to Documentation/SubmitChecklist.  In
> particular please always always always enable all kernel debugging options
> when developing and testing new kernel code.  And everything else in that
> file, too.
> 
> <guesses that this was tested on ia64>

	I have not yet performed the 21 steps of
linux-2.6.20-rc3/Documentation/SubmitChecklist, which I think is a
great objectives list for future automation or some kind of community
web site.  I hope to find time to make progress through that
checklist, but, in the meantime, I think the world may nevertheless be
infinitesmally better off if I post the patch that I'm currently
using that seems to fix the problem, seeing as how rc3 has passed
with no fix incorporated.

	I think the intent of the offending code was to avoid doing
a lock_sock() in a presumably common case where there was no need to
take the lock.  So, I have kept the presumably fast test to exit
early.

	When it turns out to be necessary to take lock_sock(), RCU is
unlocked, then lock_sock is taken, the RCU is locked again, and
the test is repeated.

	If I am wrong about lock_sock being expensive, I can
delete the lines that do the early return.

	By the way, in a change not included in this patch,
I also tried consolidating the RCU locking in this file into a macro
IF_NLBL_REQUIRE(sksec, action), where "action" is the code
fragment to be executed with rcu_read_lock() held, although this
required splitting a couple of functions in half.

	Anyhow, here is my current patch as MIME attachment.
Comments and labor in getting it through SubmitChecklist would
both be welcome.

Adam Richter

[-- Attachment #2: selinux.diff --]
[-- Type: text/plain, Size: 621 bytes --]

--- linux-2.6.20-rc3/security/selinux/ss/services.c	2007-01-02 01:47:40.000000000 +0800
+++ linux/security/selinux/ss/services.c	2007-01-02 15:36:30.000000000 +0800
@@ -2658,14 +2658,22 @@
 	rcu_read_lock();
 	if (sksec->nlbl_state != NLBL_REQUIRE) {
 		rcu_read_unlock();
 		return 0;
 	}
+	rcu_read_unlock();
+
+
+	rc = 0;
 	lock_sock(sock->sk);
-	rc = selinux_netlbl_socket_setsid(sock, sksec->sid);
-	release_sock(sock->sk);
+	rcu_read_lock();
+
+	if (sksec->nlbl_state == NLBL_REQUIRE)
+		rc = selinux_netlbl_socket_setsid(sock, sksec->sid);
+
 	rcu_read_unlock();
+	release_sock(sock->sk);
 
 	return rc;
 }
 
 /**

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: selinux networking: sleeping functin called from invalid context in 2.6.20-rc[12]
  2006-12-25  0:25 ` Andrew Morton
  2007-01-02  7:58   ` Adam J. Richter
@ 2007-01-02 21:14   ` Paul Moore
  1 sibling, 0 replies; 8+ messages in thread
From: Paul Moore @ 2007-01-02 21:14 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Adam J. Richter, linux-kernel, netdev, Ingo Molnar

On Sunday, December 24 2006 7:25 pm, Andrew Morton wrote:
> On Mon, 25 Dec 2006 05:21:24 +0800
>
> "Adam J. Richter" <adam@yggdrasil.com> wrote:
> > 	Under 2.6.20-rc1 and 2.6.20-rc2, I get the following complaint
> > for several network programs running on my system:
> >
> > [  156.381868] BUG: sleeping function called from invalid context at
> > net/core/sock.c:1523 [  156.381876] in_atomic():1, irqs_disabled():0
> > [  156.381881] no locks held by kio_http/9693.
> > [  156.381886]  [<c01057a2>] show_trace_log_lvl+0x1a/0x2f
> > [  156.381900]  [<c0105dab>] show_trace+0x12/0x14
> > [  156.381908]  [<c0105e48>] dump_stack+0x16/0x18
> > [  156.381917]  [<c011e30f>] __might_sleep+0xe5/0xeb
> > [  156.381926]  [<c025942a>] lock_sock_nested+0x1d/0xc4
> > [  156.381937]  [<c01cc570>] selinux_netlbl_inode_permission+0x5a/0x8e
> > [  156.381946]  [<c01c2505>] selinux_file_permission+0x96/0x9b
> > [  156.381954]  [<c0175a0a>] vfs_write+0x8d/0x167
> > [  156.381962]  [<c017605a>] sys_write+0x3f/0x63
> > [  156.381971]  [<c01040c0>] syscall_call+0x7/0xb
> > [  156.381980]  =======================
>
> There's a glaring bug in selinux_netlbl_inode_permission() - taking
> lock_sock() inside rcu_read_lock().

Sorry for the delay, I'm finally back at a machine where I can look at the 
code.

I've been thinking about Parag Warudkar's and Ingo Molnar's patches as well as 
what the selinux_netlbl_inode_permission() function actually needs to do; I 
think the best answer isn't so much to change the socket locking calls, but 
to restructure the function a bit.

Currently the function does the following (in order):

 1. do some quick sanity checks (is the inode a socket, etc)
 2. rcu_read_lock()
 3. check the nlbl_state is set to NLBL_REQUIRE (otherwise return)
 4. lock_sock()
 5. netlabel magic
 6. release_sock()
 7. rcu_read_unlock()

I propose changing it to the following (in order):

  1. do some quick sanity checks (is the inode a socket, etc)
  2. rcu_read_lock()
  3. check the nlbl_state is set to NLBL_REQUIRE (otherwise return)
  4. rcu_read_unlock()
  5. lock_sock()
  6. rcu_read_lock()
  7. verify that nlbl_state is still set to NLBL_REQUIRE (otherwise return)
  8. netlabel magic
  9. rcu_read_unlock()
 10. release_sock()

This way we no longer need to worry about any special socket locking.  I 
realize this adds a bit of duplicated work but it is my understanding that 
RCU lock/unlock operations are *very* fast so the extra RCU lock operations 
shouldn't be too bad and the extra nlbl_state check should be of minimal 
cost.

However, I'm not the expert here, just a guy learning as he goes so any 
comments/feedback on the above proposal are welcome.  If it turns out this 
approach has some merit I'll put together a patch and send it out.

Once again, sorry for the regression.

-- 
paul moore
linux security @ hp

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: selinux networking: sleeping functin called from invalid context in 2.6.20-rc[12]
  2007-01-02  7:58   ` Adam J. Richter
@ 2007-01-02 21:25     ` Paul Moore
  2007-01-02 23:37       ` David Miller
  0 siblings, 1 reply; 8+ messages in thread
From: Paul Moore @ 2007-01-02 21:25 UTC (permalink / raw)
  To: Adam J. Richter; +Cc: Andrew Morton, linux-kernel, netdev

On Tuesday, January 2 2007 2:58 am, Adam J. Richter wrote:
> 	I have not yet performed the 21 steps of
> linux-2.6.20-rc3/Documentation/SubmitChecklist, which I think is a
> great objectives list for future automation or some kind of community
> web site.  I hope to find time to make progress through that
> checklist, but, in the meantime, I think the world may nevertheless be
> infinitesmally better off if I post the patch that I'm currently
> using that seems to fix the problem, seeing as how rc3 has passed
> with no fix incorporated.
>
> 	I think the intent of the offending code was to avoid doing
> a lock_sock() in a presumably common case where there was no need to
> take the lock.  So, I have kept the presumably fast test to exit
> early.
>
> 	When it turns out to be necessary to take lock_sock(), RCU is
> unlocked, then lock_sock is taken, the RCU is locked again, and
> the test is repeated.

Hi Adam,

I'm sorry I just saw this mail (mail not sent directly to me get shuffled off 
to a folder).  I agree with your patch, I think dropping and then re-taking 
the RCU lock is the best way to go, although I'm curious to see what others 
have to say.

The only real comment I have with the patch is that there is some extra 
whitespace which could probably be removed, but that is more of a style nit 
than anything substantial.

> 	By the way, in a change not included in this patch,
> I also tried consolidating the RCU locking in this file into a macro
> IF_NLBL_REQUIRE(sksec, action), where "action" is the code
> fragment to be executed with rcu_read_lock() held, although this
> required splitting a couple of functions in half.

>From your description above I'm not sure I like that approach so much, 
however, I could be misunderstanding something.  Do you have a small example 
you could send?

-- 
paul moore
linux security @ hp

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: selinux networking: sleeping functin called from invalid context in 2.6.20-rc[12]
  2007-01-02 21:25     ` Paul Moore
@ 2007-01-02 23:37       ` David Miller
  2007-01-03 20:46         ` Paul Moore
  0 siblings, 1 reply; 8+ messages in thread
From: David Miller @ 2007-01-02 23:37 UTC (permalink / raw)
  To: paul.moore; +Cc: adam, akpm, linux-kernel, netdev

From: Paul Moore <paul.moore@hp.com>
Date: Tue, 2 Jan 2007 16:25:24 -0500

> I'm sorry I just saw this mail (mail not sent directly to me get
> shuffled off to a folder).  I agree with your patch, I think
> dropping and then re-taking the RCU lock is the best way to go,
> although I'm curious to see what others have to say.

I think this is fine too.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: selinux networking: sleeping functin called from invalid context in 2.6.20-rc[12]
  2007-01-02 23:37       ` David Miller
@ 2007-01-03 20:46         ` Paul Moore
  0 siblings, 0 replies; 8+ messages in thread
From: Paul Moore @ 2007-01-03 20:46 UTC (permalink / raw)
  To: David Miller; +Cc: adam, akpm, Ingo Molnar, netdev

On Tuesday, January 2 2007 6:37 pm, David Miller wrote:
> From: Paul Moore <paul.moore@hp.com>
> Date: Tue, 2 Jan 2007 16:25:24 -0500
>
> > I'm sorry I just saw this mail (mail not sent directly to me get
> > shuffled off to a folder).  I agree with your patch, I think
> > dropping and then re-taking the RCU lock is the best way to go,
> > although I'm curious to see what others have to say.
>
> I think this is fine too.

[NOTE: dropped linux-kernel as I think this discussion is now strictly related 
to socket locking so netdev is probably the best list]

I've been looking some more at Adam's and Ingo's patches for this as well as a 
recent bug against a FC test kernel:

 * https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=220966

For those who don't follow the link here is the meat of the bug report:

****
[ INFO: soft-safe -> soft-unsafe lock order detected ]
2.6.19-1.2891.fc7 #1
------------------------------------------------------
cupsd/1884 [HC0[0]:SC0[1]:HE1:SE0] is trying to acquire:
 (&ssec->nlbl_lock){--..}, at: [<c04cec37>]
selinux_netlbl_socket_setsid+0xbb/0x123

and this task is already holding:
 (af_callback_keys + sk->sk_family#3){-.-+}, at: [<c05daa1c>]
inet_accept+0x70/0xb5
which would create a new lock dependency:
 (af_callback_keys + sk->sk_family#3){-.-+} -> (&ssec->nlbl_lock){--..}

but this new dependency connects a soft-irq-safe lock:
 (af_callback_keys + sk->sk_family#3){-.-+}
... which became soft-irq-safe at:
 [<c043fff1>] __lock_acquire+0x37d/0x9f8
 [<c044094d>] lock_acquire+0x56/0x6f
 [<c05fbdb6>] _read_lock_bh+0x30/0x3d
 [<c04c687e>] selinux_socket_sock_rcv_skb+0xbd/0x252
 [<c05d0645>] tcp_v4_rcv+0x37a/0x909
 [<c05b7593>] ip_local_deliver+0x185/0x22e
 [<c05b73d6>] ip_rcv+0x418/0x450
 [<c059ae9c>] netif_receive_skb+0x2db/0x35a
 [<c059c85f>] process_backlog+0x95/0xf6
 [<c059ca46>] net_rx_action+0xa1/0x1a8
 [<c042bf5a>] __do_softirq+0x6f/0xe2
 [<c04063a1>] do_softirq+0x61/0xc7
 [<ffffffff>] 0xffffffff

to a soft-irq-unsafe lock:
 (&ssec->nlbl_lock){--..}
... which became soft-irq-unsafe at:
...  [<c044007d>] __lock_acquire+0x409/0x9f8
 [<c044094d>] lock_acquire+0x56/0x6f
 [<c05fbc89>] _spin_lock+0x2b/0x38
 [<c04cec37>] selinux_netlbl_socket_setsid+0xbb/0x123
 [<c04d0c92>] selinux_netlbl_socket_post_create+0x2d/0x2f
 [<c04c807b>] selinux_socket_post_create+0x156/0x15c
 [<c059213c>] __sock_create+0x179/0x1b2
 [<c05921ae>] sock_create+0x1a/0x1f
 [<c0592435>] sys_socket+0x1b/0x3c
 [<c0592cba>] sys_socketcall+0x77/0x241
 [<c0404050>] syscall_call+0x7/0xb
 [<ffffffff>] 0xffffffff
****

This makes me believe that Ingo's patch (which I see is now in Linus' and 
Andrew's trees) is the way to go and not the lock re-order approach in Adam's 
patch.  I'm going to continue to look into this, almost more for my own 
education than anything else, but I thought I would mention this lock 
dependency message as it seemed relevant to the discussion.

-- 
paul moore
linux security @ hp

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: selinux networking: sleeping functin called from invalid context in 2.6.20-rc[12]
@ 2007-01-04 11:32 Adam J. Richter
  0 siblings, 0 replies; 8+ messages in thread
From: Adam J. Richter @ 2007-01-04 11:32 UTC (permalink / raw)
  To: paul.moore; +Cc: akpm, davem, mingo, netdev

On Wed, 3 Jan 2007 15:46:31 -0500, Paul Moore wrote:
>This makes me believe that Ingo's patch (which I see is now in Linus' and 
>Andrew's trees) is the way to go and not the lock re-order approach in Adam's 
>patch.  I'm going to continue to look into this, almost more for my own 
>education than anything else, but I thought I would mention this lock 
>dependency message as it seemed relevant to the discussion.

	Both Ingo's patch and mine preserved the lock order, in a
sense: take the read_read_lock first, and avoid locking the socket if
possible.  Ingo's patch avoided unlocking and relocking by using some
facilities that I must admit my prior ignorance of.  So, I agree that
if both patches are "correct", then Ingo's is better.

	I have been running Ingo's patch for the past ~1.5 days, and
as far as I can tell it seems to be no worse than mine.

	The reason for such a guarded description as "no worse than
mine" is that I have been experiencing occasoinal system hangs that
started in 2.6.20-rc1 without any patch, which I have observed in
2.6.20-rc3 with my patch, and also 2.6.20-rc3 with Ingo's patch.
These hangs may be completly unrelated to this selinux issue though
and it may be days before get around to studying it more carefully, so
I'm happy to see Ingo's patch integrated now.

Adam

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2007-01-04 11:57 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-12-26  5:30 selinux networking: sleeping functin called from invalid context in 2.6.20-rc[12] Paul Moore
  -- strict thread matches above, loose matches on Subject: below --
2007-01-04 11:32 Adam J. Richter
     [not found] <20061225052124.A10323@freya>
2006-12-25  0:25 ` Andrew Morton
2007-01-02  7:58   ` Adam J. Richter
2007-01-02 21:25     ` Paul Moore
2007-01-02 23:37       ` David Miller
2007-01-03 20:46         ` Paul Moore
2007-01-02 21:14   ` Paul Moore

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).