* Re: selinux networking: sleeping functin called from invalid context in 2.6.20-rc[12]
2006-12-25 0:25 ` selinux networking: sleeping functin called from invalid context in 2.6.20-rc[12] Andrew Morton
@ 2007-01-02 7:58 ` Adam J. Richter
2007-01-02 21:25 ` Paul Moore
2007-01-02 20:09 ` [patch] selinux: fix selinux_netlbl_inode_permission() locking Ingo Molnar
2007-01-02 21:14 ` selinux networking: sleeping functin called from invalid context in 2.6.20-rc[12] Paul Moore
2 siblings, 1 reply; 7+ messages in thread
From: Adam J. Richter @ 2007-01-02 7:58 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, netdev, Paul Moore
[-- Attachment #1: Type: text/plain, Size: 2131 bytes --]
On Sun, Dec 24, 2006 at 04:25:11PM -0800, Andrew Morton wrote:
> On Mon, 25 Dec 2006 05:21:24 +0800
> "Adam J. Richter" <adam@yggdrasil.com> wrote:
>
>> Under 2.6.20-rc1 and 2.6.20-rc2, I get the following complaint
>> for several network programs running on my system:
>>
>> [ 156.381868] BUG: sleeping function called from invalid context at net/core/sock.c:1523
[...]
> There's a glaring bug in selinux_netlbl_inode_permission() - taking
> lock_sock() inside rcu_read_lock().
>
> I would again draw attention to Documentation/SubmitChecklist. In
> particular please always always always enable all kernel debugging options
> when developing and testing new kernel code. And everything else in that
> file, too.
>
> <guesses that this was tested on ia64>
I have not yet performed the 21 steps of
linux-2.6.20-rc3/Documentation/SubmitChecklist, which I think is a
great objectives list for future automation or some kind of community
web site. I hope to find time to make progress through that
checklist, but, in the meantime, I think the world may nevertheless be
infinitesmally better off if I post the patch that I'm currently
using that seems to fix the problem, seeing as how rc3 has passed
with no fix incorporated.
I think the intent of the offending code was to avoid doing
a lock_sock() in a presumably common case where there was no need to
take the lock. So, I have kept the presumably fast test to exit
early.
When it turns out to be necessary to take lock_sock(), RCU is
unlocked, then lock_sock is taken, the RCU is locked again, and
the test is repeated.
If I am wrong about lock_sock being expensive, I can
delete the lines that do the early return.
By the way, in a change not included in this patch,
I also tried consolidating the RCU locking in this file into a macro
IF_NLBL_REQUIRE(sksec, action), where "action" is the code
fragment to be executed with rcu_read_lock() held, although this
required splitting a couple of functions in half.
Anyhow, here is my current patch as MIME attachment.
Comments and labor in getting it through SubmitChecklist would
both be welcome.
Adam Richter
[-- Attachment #2: selinux.diff --]
[-- Type: text/plain, Size: 621 bytes --]
--- linux-2.6.20-rc3/security/selinux/ss/services.c 2007-01-02 01:47:40.000000000 +0800
+++ linux/security/selinux/ss/services.c 2007-01-02 15:36:30.000000000 +0800
@@ -2658,14 +2658,22 @@
rcu_read_lock();
if (sksec->nlbl_state != NLBL_REQUIRE) {
rcu_read_unlock();
return 0;
}
+ rcu_read_unlock();
+
+
+ rc = 0;
lock_sock(sock->sk);
- rc = selinux_netlbl_socket_setsid(sock, sksec->sid);
- release_sock(sock->sk);
+ rcu_read_lock();
+
+ if (sksec->nlbl_state == NLBL_REQUIRE)
+ rc = selinux_netlbl_socket_setsid(sock, sksec->sid);
+
rcu_read_unlock();
+ release_sock(sock->sk);
return rc;
}
/**
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: selinux networking: sleeping functin called from invalid context in 2.6.20-rc[12]
2007-01-02 7:58 ` Adam J. Richter
@ 2007-01-02 21:25 ` Paul Moore
2007-01-02 23:37 ` David Miller
0 siblings, 1 reply; 7+ messages in thread
From: Paul Moore @ 2007-01-02 21:25 UTC (permalink / raw)
To: Adam J. Richter; +Cc: Andrew Morton, linux-kernel, netdev
On Tuesday, January 2 2007 2:58 am, Adam J. Richter wrote:
> I have not yet performed the 21 steps of
> linux-2.6.20-rc3/Documentation/SubmitChecklist, which I think is a
> great objectives list for future automation or some kind of community
> web site. I hope to find time to make progress through that
> checklist, but, in the meantime, I think the world may nevertheless be
> infinitesmally better off if I post the patch that I'm currently
> using that seems to fix the problem, seeing as how rc3 has passed
> with no fix incorporated.
>
> I think the intent of the offending code was to avoid doing
> a lock_sock() in a presumably common case where there was no need to
> take the lock. So, I have kept the presumably fast test to exit
> early.
>
> When it turns out to be necessary to take lock_sock(), RCU is
> unlocked, then lock_sock is taken, the RCU is locked again, and
> the test is repeated.
Hi Adam,
I'm sorry I just saw this mail (mail not sent directly to me get shuffled off
to a folder). I agree with your patch, I think dropping and then re-taking
the RCU lock is the best way to go, although I'm curious to see what others
have to say.
The only real comment I have with the patch is that there is some extra
whitespace which could probably be removed, but that is more of a style nit
than anything substantial.
> By the way, in a change not included in this patch,
> I also tried consolidating the RCU locking in this file into a macro
> IF_NLBL_REQUIRE(sksec, action), where "action" is the code
> fragment to be executed with rcu_read_lock() held, although this
> required splitting a couple of functions in half.
>From your description above I'm not sure I like that approach so much,
however, I could be misunderstanding something. Do you have a small example
you could send?
--
paul moore
linux security @ hp
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: selinux networking: sleeping functin called from invalid context in 2.6.20-rc[12]
2007-01-02 21:25 ` Paul Moore
@ 2007-01-02 23:37 ` David Miller
2007-01-03 20:46 ` Paul Moore
0 siblings, 1 reply; 7+ messages in thread
From: David Miller @ 2007-01-02 23:37 UTC (permalink / raw)
To: paul.moore; +Cc: adam, akpm, linux-kernel, netdev
From: Paul Moore <paul.moore@hp.com>
Date: Tue, 2 Jan 2007 16:25:24 -0500
> I'm sorry I just saw this mail (mail not sent directly to me get
> shuffled off to a folder). I agree with your patch, I think
> dropping and then re-taking the RCU lock is the best way to go,
> although I'm curious to see what others have to say.
I think this is fine too.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: selinux networking: sleeping functin called from invalid context in 2.6.20-rc[12]
2007-01-02 23:37 ` David Miller
@ 2007-01-03 20:46 ` Paul Moore
0 siblings, 0 replies; 7+ messages in thread
From: Paul Moore @ 2007-01-03 20:46 UTC (permalink / raw)
To: David Miller; +Cc: adam, akpm, Ingo Molnar, netdev
On Tuesday, January 2 2007 6:37 pm, David Miller wrote:
> From: Paul Moore <paul.moore@hp.com>
> Date: Tue, 2 Jan 2007 16:25:24 -0500
>
> > I'm sorry I just saw this mail (mail not sent directly to me get
> > shuffled off to a folder). I agree with your patch, I think
> > dropping and then re-taking the RCU lock is the best way to go,
> > although I'm curious to see what others have to say.
>
> I think this is fine too.
[NOTE: dropped linux-kernel as I think this discussion is now strictly related
to socket locking so netdev is probably the best list]
I've been looking some more at Adam's and Ingo's patches for this as well as a
recent bug against a FC test kernel:
* https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=220966
For those who don't follow the link here is the meat of the bug report:
****
[ INFO: soft-safe -> soft-unsafe lock order detected ]
2.6.19-1.2891.fc7 #1
------------------------------------------------------
cupsd/1884 [HC0[0]:SC0[1]:HE1:SE0] is trying to acquire:
(&ssec->nlbl_lock){--..}, at: [<c04cec37>]
selinux_netlbl_socket_setsid+0xbb/0x123
and this task is already holding:
(af_callback_keys + sk->sk_family#3){-.-+}, at: [<c05daa1c>]
inet_accept+0x70/0xb5
which would create a new lock dependency:
(af_callback_keys + sk->sk_family#3){-.-+} -> (&ssec->nlbl_lock){--..}
but this new dependency connects a soft-irq-safe lock:
(af_callback_keys + sk->sk_family#3){-.-+}
... which became soft-irq-safe at:
[<c043fff1>] __lock_acquire+0x37d/0x9f8
[<c044094d>] lock_acquire+0x56/0x6f
[<c05fbdb6>] _read_lock_bh+0x30/0x3d
[<c04c687e>] selinux_socket_sock_rcv_skb+0xbd/0x252
[<c05d0645>] tcp_v4_rcv+0x37a/0x909
[<c05b7593>] ip_local_deliver+0x185/0x22e
[<c05b73d6>] ip_rcv+0x418/0x450
[<c059ae9c>] netif_receive_skb+0x2db/0x35a
[<c059c85f>] process_backlog+0x95/0xf6
[<c059ca46>] net_rx_action+0xa1/0x1a8
[<c042bf5a>] __do_softirq+0x6f/0xe2
[<c04063a1>] do_softirq+0x61/0xc7
[<ffffffff>] 0xffffffff
to a soft-irq-unsafe lock:
(&ssec->nlbl_lock){--..}
... which became soft-irq-unsafe at:
... [<c044007d>] __lock_acquire+0x409/0x9f8
[<c044094d>] lock_acquire+0x56/0x6f
[<c05fbc89>] _spin_lock+0x2b/0x38
[<c04cec37>] selinux_netlbl_socket_setsid+0xbb/0x123
[<c04d0c92>] selinux_netlbl_socket_post_create+0x2d/0x2f
[<c04c807b>] selinux_socket_post_create+0x156/0x15c
[<c059213c>] __sock_create+0x179/0x1b2
[<c05921ae>] sock_create+0x1a/0x1f
[<c0592435>] sys_socket+0x1b/0x3c
[<c0592cba>] sys_socketcall+0x77/0x241
[<c0404050>] syscall_call+0x7/0xb
[<ffffffff>] 0xffffffff
****
This makes me believe that Ingo's patch (which I see is now in Linus' and
Andrew's trees) is the way to go and not the lock re-order approach in Adam's
patch. I'm going to continue to look into this, almost more for my own
education than anything else, but I thought I would mention this lock
dependency message as it seemed relevant to the discussion.
--
paul moore
linux security @ hp
^ permalink raw reply [flat|nested] 7+ messages in thread
* [patch] selinux: fix selinux_netlbl_inode_permission() locking
2006-12-25 0:25 ` selinux networking: sleeping functin called from invalid context in 2.6.20-rc[12] Andrew Morton
2007-01-02 7:58 ` Adam J. Richter
@ 2007-01-02 20:09 ` Ingo Molnar
2007-01-02 21:14 ` selinux networking: sleeping functin called from invalid context in 2.6.20-rc[12] Paul Moore
2 siblings, 0 replies; 7+ messages in thread
From: Ingo Molnar @ 2007-01-02 20:09 UTC (permalink / raw)
To: Andrew Morton
Cc: Adam J. Richter, linux-kernel, netdev, Paul Moore, Linus Torvalds
* Andrew Morton <akpm@osdl.org> wrote:
> There's a glaring bug in selinux_netlbl_inode_permission() - taking
> lock_sock() inside rcu_read_lock().
Note that the bug is still in -rc3, and is easily triggerable via a
default FC6 bootup. It's fixed by the (slightly modified) patch from
Parag Warudkar below that i have in the -rt tree.
Note that this bug became visible due to the recent __resched_legal()
fix, which bug made most of our atomicity debugging checks ineffective.
About half a dozen separate atomicity bugs triggered in -rt when i fixed
the __resched_legal() bug, so i'd expect some more to trigger upstream
too.
Ingo
------------------------>
Subject: [patch] selinux: fix selinux_netlbl_inode_permission() locking
From: Parag Warudkar <paragw@paragw.zapto.org>
do not call a sleeping lock API in an RCU read section.
lock_sock_nested can sleep, its BH counterpart doesn't.
selinux_netlbl_inode_permission() needs to use the BH counterpart
unconditionally.
Compile tested.
From: Ingo Molnar <mingo@elte.hu>
added BH disabling, because this function can be called from non-atomic
contexts too, so a naked bh_lock_sock() would be deadlock-prone.
Boot-tested the resulting kernel.
Signed-off-by: Parag Warudkar <paragw@paragw.zapto.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
security/selinux/ss/services.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
Index: linux/security/selinux/ss/services.c
===================================================================
--- linux.orig/security/selinux/ss/services.c
+++ linux/security/selinux/ss/services.c
@@ -2660,9 +2660,11 @@ int selinux_netlbl_inode_permission(stru
rcu_read_unlock();
return 0;
}
- lock_sock(sock->sk);
+ local_bh_disable();
+ bh_lock_sock_nested(sock->sk);
rc = selinux_netlbl_socket_setsid(sock, sksec->sid);
- release_sock(sock->sk);
+ bh_unlock_sock(sock->sk);
+ local_bh_enable();
rcu_read_unlock();
return rc;
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: selinux networking: sleeping functin called from invalid context in 2.6.20-rc[12]
2006-12-25 0:25 ` selinux networking: sleeping functin called from invalid context in 2.6.20-rc[12] Andrew Morton
2007-01-02 7:58 ` Adam J. Richter
2007-01-02 20:09 ` [patch] selinux: fix selinux_netlbl_inode_permission() locking Ingo Molnar
@ 2007-01-02 21:14 ` Paul Moore
2 siblings, 0 replies; 7+ messages in thread
From: Paul Moore @ 2007-01-02 21:14 UTC (permalink / raw)
To: Andrew Morton; +Cc: Adam J. Richter, linux-kernel, netdev, Ingo Molnar
On Sunday, December 24 2006 7:25 pm, Andrew Morton wrote:
> On Mon, 25 Dec 2006 05:21:24 +0800
>
> "Adam J. Richter" <adam@yggdrasil.com> wrote:
> > Under 2.6.20-rc1 and 2.6.20-rc2, I get the following complaint
> > for several network programs running on my system:
> >
> > [ 156.381868] BUG: sleeping function called from invalid context at
> > net/core/sock.c:1523 [ 156.381876] in_atomic():1, irqs_disabled():0
> > [ 156.381881] no locks held by kio_http/9693.
> > [ 156.381886] [<c01057a2>] show_trace_log_lvl+0x1a/0x2f
> > [ 156.381900] [<c0105dab>] show_trace+0x12/0x14
> > [ 156.381908] [<c0105e48>] dump_stack+0x16/0x18
> > [ 156.381917] [<c011e30f>] __might_sleep+0xe5/0xeb
> > [ 156.381926] [<c025942a>] lock_sock_nested+0x1d/0xc4
> > [ 156.381937] [<c01cc570>] selinux_netlbl_inode_permission+0x5a/0x8e
> > [ 156.381946] [<c01c2505>] selinux_file_permission+0x96/0x9b
> > [ 156.381954] [<c0175a0a>] vfs_write+0x8d/0x167
> > [ 156.381962] [<c017605a>] sys_write+0x3f/0x63
> > [ 156.381971] [<c01040c0>] syscall_call+0x7/0xb
> > [ 156.381980] =======================
>
> There's a glaring bug in selinux_netlbl_inode_permission() - taking
> lock_sock() inside rcu_read_lock().
Sorry for the delay, I'm finally back at a machine where I can look at the
code.
I've been thinking about Parag Warudkar's and Ingo Molnar's patches as well as
what the selinux_netlbl_inode_permission() function actually needs to do; I
think the best answer isn't so much to change the socket locking calls, but
to restructure the function a bit.
Currently the function does the following (in order):
1. do some quick sanity checks (is the inode a socket, etc)
2. rcu_read_lock()
3. check the nlbl_state is set to NLBL_REQUIRE (otherwise return)
4. lock_sock()
5. netlabel magic
6. release_sock()
7. rcu_read_unlock()
I propose changing it to the following (in order):
1. do some quick sanity checks (is the inode a socket, etc)
2. rcu_read_lock()
3. check the nlbl_state is set to NLBL_REQUIRE (otherwise return)
4. rcu_read_unlock()
5. lock_sock()
6. rcu_read_lock()
7. verify that nlbl_state is still set to NLBL_REQUIRE (otherwise return)
8. netlabel magic
9. rcu_read_unlock()
10. release_sock()
This way we no longer need to worry about any special socket locking. I
realize this adds a bit of duplicated work but it is my understanding that
RCU lock/unlock operations are *very* fast so the extra RCU lock operations
shouldn't be too bad and the extra nlbl_state check should be of minimal
cost.
However, I'm not the expert here, just a guy learning as he goes so any
comments/feedback on the above proposal are welcome. If it turns out this
approach has some merit I'll put together a patch and send it out.
Once again, sorry for the regression.
--
paul moore
linux security @ hp
^ permalink raw reply [flat|nested] 7+ messages in thread