netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 3.11] genetlink: fix family dump race
@ 2013-08-07 12:39 Johannes Berg
       [not found] ` <1375879163-2116-1-git-send-email-johannes-cdvu00un1VgdHxzADdlk8Q@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Johannes Berg @ 2013-08-07 12:39 UTC (permalink / raw)
  To: linux-wireless, netdev, Thomas Graf
  Cc: Andrei Otcheretianski, Ilan Peer, Johannes Berg

From: Johannes Berg <johannes.berg@intel.com>

When dumping generic netlink families, only the first dump call
is locked with genl_lock(), which protects the list of families,
and thus subsequent calls can access the data without locking,
racing against family addition/removal. This can cause a crash.
Fix it - the locking needs to be conditional because the first
time around it's already locked.

BUG: unable to handle kernel paging request at f8467360
IP: [<c14c56bb>] ctrl_dumpfamily+0x6b/0xe0
EIP: 0060:[<c14c56bb>] EFLAGS: 00210297 CPU: 2
EIP is at ctrl_dumpfamily+0x6b/0xe0
EAX: f8467378 EBX: f8467340 ECX: 00000000 EDX: ec1610c4
ESI: 00000001 EDI: c2077cc0 EBP: c46c3c00 ESP: c46c3bd4
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
CR0: 80050033 CR2: f8467360 CR3: 26e54000 CR4: 001407d0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
Process wpa_supplicant (pid: 20081, ti=c46c2000 task=c44640b0 task.ti=c46c2000)
Call Trace:
 [<c14c20bc>] netlink_dump+0x5c/0x200
 [<c14c3450>] __netlink_dump_start+0x140/0x160
 [<c14c5172>] genl_rcv_msg+0x102/0x270
 [<c14c4b5e>] netlink_rcv_skb+0x8e/0xb0
 [<c14c505c>] genl_rcv+0x1c/0x30
 [<c14c456b>] netlink_unicast+0x17b/0x1c0
 [<c14c47d4>] netlink_sendmsg+0x224/0x370
 [<c1485adf>] sock_sendmsg+0xff/0x120
 [<c1486b0a>] __sys_sendmsg+0x24a/0x260
 [<c1487fdb>] sys_sendmsg+0x3b/0x60
 [<c1488683>] sys_socketcall+0x283/0x2e0
 [<c15b7c1f>] sysenter_do_call+0x12/0x38
Code: 8d 3c c5 c0 7c 07 c2 8b 04 c5 c0 7c 07 c2 39 c7 8d 58 c8 75 16 eb 71 90 81 7d ec 00 1f 86 c1 74 10 8b 43 38 39 c7 8d 58 c8 74 5d <80> 7b 20 00 74 e7 83 c6 01 3b 75 f0 7c e8 8b 55 e8 8b 42 04 8b

Cc: stable@vger.kernel.org
Reported-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
---
 net/netlink/genetlink.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index 2fd6dbe..6b6a03a 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -789,6 +789,10 @@ static int ctrl_dumpfamily(struct sk_buff *skb, struct netlink_callback *cb)
 	struct net *net = sock_net(skb->sk);
 	int chains_to_skip = cb->args[0];
 	int fams_to_skip = cb->args[1];
+	bool need_locking = chains_to_skip || fams_to_skip;
+
+	if (need_locking)
+		genl_lock();
 
 	for (i = chains_to_skip; i < GENL_FAM_TAB_SIZE; i++) {
 		n = 0;
@@ -810,6 +814,9 @@ errout:
 	cb->args[0] = i;
 	cb->args[1] = n;
 
+	if (need_locking)
+		genl_unlock();
+
 	return skb->len;
 }
 
-- 
1.8.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2 3.11] genetlink: fix family dump race
       [not found] ` <1375879163-2116-1-git-send-email-johannes-cdvu00un1VgdHxzADdlk8Q@public.gmane.org>
@ 2013-08-12  4:29   ` David Miller
  2013-08-12  8:54     ` Johannes Berg
  0 siblings, 1 reply; 4+ messages in thread
From: David Miller @ 2013-08-12  4:29 UTC (permalink / raw)
  To: johannes-cdvu00un1VgdHxzADdlk8Q
  Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, tgraf-G/eBtMaohhA,
	andrei.otcheretianski-ral2JQCrhuEAvxtiuMwx3w,
	ilan.peer-ral2JQCrhuEAvxtiuMwx3w,
	johannes.berg-ral2JQCrhuEAvxtiuMwx3w

From: Johannes Berg <johannes-cdvu00un1VgdHxzADdlk8Q@public.gmane.org>
Date: Wed,  7 Aug 2013 14:39:23 +0200

> When dumping generic netlink families, only the first dump call
> is locked with genl_lock(), which protects the list of families,
> and thus subsequent calls can access the data without locking,
> racing against family addition/removal. This can cause a crash.
> Fix it - the locking needs to be conditional because the first
> time around it's already locked.
> 
> BUG: unable to handle kernel paging request at f8467360
> IP: [<c14c56bb>] ctrl_dumpfamily+0x6b/0xe0
> EIP: 0060:[<c14c56bb>] EFLAGS: 00210297 CPU: 2
> EIP is at ctrl_dumpfamily+0x6b/0xe0
> EAX: f8467378 EBX: f8467340 ECX: 00000000 EDX: ec1610c4
> ESI: 00000001 EDI: c2077cc0 EBP: c46c3c00 ESP: c46c3bd4
>  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> CR0: 80050033 CR2: f8467360 CR3: 26e54000 CR4: 001407d0
> DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> DR6: ffff0ff0 DR7: 00000400
> Process wpa_supplicant (pid: 20081, ti=c46c2000 task=c44640b0 task.ti=c46c2000)
> Call Trace:
>  [<c14c20bc>] netlink_dump+0x5c/0x200
>  [<c14c3450>] __netlink_dump_start+0x140/0x160
>  [<c14c5172>] genl_rcv_msg+0x102/0x270
>  [<c14c4b5e>] netlink_rcv_skb+0x8e/0xb0
>  [<c14c505c>] genl_rcv+0x1c/0x30
>  [<c14c456b>] netlink_unicast+0x17b/0x1c0
>  [<c14c47d4>] netlink_sendmsg+0x224/0x370
>  [<c1485adf>] sock_sendmsg+0xff/0x120

I completely agree with your analysis that we need locking here, but
the crash OOPS backtrace doesn't make any sense to me.

The bug should trigger when we enter the dump continuation path, which
would look like:

	ctrl_dumpfamily()
	netlink_dump()
	netlink_recvmsg()
	...

Since this is the only way we get into ctrl_dumpfamily() without
holding genl_lock().

But in your trace we're going through genl_rcv() which means this is
the first call of the dump, and genl_rcv() takes the necessary locks.
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2 3.11] genetlink: fix family dump race
  2013-08-12  4:29   ` David Miller
@ 2013-08-12  8:54     ` Johannes Berg
  2013-08-13  5:06       ` David Miller
  0 siblings, 1 reply; 4+ messages in thread
From: Johannes Berg @ 2013-08-12  8:54 UTC (permalink / raw)
  To: David Miller
  Cc: linux-wireless, netdev, tgraf, andrei.otcheretianski, ilan.peer,
	stable, Pravin B Shelar

On Sun, 2013-08-11 at 21:29 -0700, David Miller wrote:

> > BUG: unable to handle kernel paging request at f8467360
> > IP: [<c14c56bb>] ctrl_dumpfamily+0x6b/0xe0
> > EIP: 0060:[<c14c56bb>] EFLAGS: 00210297 CPU: 2
> > EIP is at ctrl_dumpfamily+0x6b/0xe0
> > EAX: f8467378 EBX: f8467340 ECX: 00000000 EDX: ec1610c4
> > ESI: 00000001 EDI: c2077cc0 EBP: c46c3c00 ESP: c46c3bd4
> >  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> > CR0: 80050033 CR2: f8467360 CR3: 26e54000 CR4: 001407d0
> > DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> > DR6: ffff0ff0 DR7: 00000400
> > Process wpa_supplicant (pid: 20081, ti=c46c2000 task=c44640b0 task.ti=c46c2000)
> > Call Trace:
> >  [<c14c20bc>] netlink_dump+0x5c/0x200
> >  [<c14c3450>] __netlink_dump_start+0x140/0x160
> >  [<c14c5172>] genl_rcv_msg+0x102/0x270
> >  [<c14c4b5e>] netlink_rcv_skb+0x8e/0xb0
> >  [<c14c505c>] genl_rcv+0x1c/0x30
> >  [<c14c456b>] netlink_unicast+0x17b/0x1c0
> >  [<c14c47d4>] netlink_sendmsg+0x224/0x370
> >  [<c1485adf>] sock_sendmsg+0xff/0x120
> 
> I completely agree with your analysis that we need locking here, but
> the crash OOPS backtrace doesn't make any sense to me.
> 
> The bug should trigger when we enter the dump continuation path, which
> would look like:
> 
>         ctrl_dumpfamily()
>         netlink_dump()
>         netlink_recvmsg()
>         ...
> 
> Since this is the only way we get into ctrl_dumpfamily() without
> holding genl_lock().
> 
> But in your trace we're going through genl_rcv() which means this is
> the first call of the dump, and genl_rcv() takes the necessary locks.

Huh, yes, I only looked at the crash info as far as I needed to see that
it was crashing at accessing "rt->netnsok" with a not totally invalid
pointer "rt" (it's in EBX) and then went from the code ...

Ok, I see what's going on here. The bug was reported to me against an
old kernel (3.4.47!) and that actually did an unlock in genl_rcv_msg()
before calling netlink_dump_start():

        if (nlh->nlmsg_flags & NLM_F_DUMP) {
                if (ops->dumpit == NULL)
                        return -EOPNOTSUPP;

                genl_unlock();
                {
                        struct netlink_dump_control c = {
                                .dump = ops->dumpit,
                                .done = ops->done,
                        };
                        err = netlink_dump_start(net->genl_sock, skb,
nlh, &c);
                }
                genl_lock();
                return err;
        }

That was changed in Pravin's commit
def3117493eafd9dfa1f809d861e0031b2cc8a07
"genl: Allow concurrent genl callbacks." very recently - I'm not sure if
that was intentional, it's not described in the commit log.


I think for the current code the fix would still be correct, I can
change the commit message if you like. For backporting, we'll have to
check which tree has Pravin's change and which doesn't and change this
accordingly, I suppose.

johannes

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2 3.11] genetlink: fix family dump race
  2013-08-12  8:54     ` Johannes Berg
@ 2013-08-13  5:06       ` David Miller
  0 siblings, 0 replies; 4+ messages in thread
From: David Miller @ 2013-08-13  5:06 UTC (permalink / raw)
  To: johannes
  Cc: linux-wireless, netdev, tgraf, andrei.otcheretianski, ilan.peer,
	stable, pshelar

From: Johannes Berg <johannes@sipsolutions.net>
Date: Mon, 12 Aug 2013 10:54:57 +0200

> I think for the current code the fix would still be correct, I can
> change the commit message if you like.

I agree, please update the commit message and resubmit.

> For backporting, we'll have to check which tree has Pravin's change
> and which doesn't and change this accordingly, I suppose.

This is really tricky, because we'd need to also backport the fixes
for the regressions added by Pravin's change such as:

50754d2188b04a679a249fb57751542643a436e0
c74f2b2678f40b80265dd53556f1f778c8e1823f

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-08-13  5:06 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-07 12:39 [PATCH v2 3.11] genetlink: fix family dump race Johannes Berg
     [not found] ` <1375879163-2116-1-git-send-email-johannes-cdvu00un1VgdHxzADdlk8Q@public.gmane.org>
2013-08-12  4:29   ` David Miller
2013-08-12  8:54     ` Johannes Berg
2013-08-13  5:06       ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).