* [PATCH v2 3.11] genetlink: fix family dump race @ 2013-08-07 12:39 Johannes Berg [not found] ` <1375879163-2116-1-git-send-email-johannes-cdvu00un1VgdHxzADdlk8Q@public.gmane.org> 0 siblings, 1 reply; 4+ messages in thread From: Johannes Berg @ 2013-08-07 12:39 UTC (permalink / raw) To: linux-wireless, netdev, Thomas Graf Cc: Andrei Otcheretianski, Ilan Peer, Johannes Berg From: Johannes Berg <johannes.berg@intel.com> When dumping generic netlink families, only the first dump call is locked with genl_lock(), which protects the list of families, and thus subsequent calls can access the data without locking, racing against family addition/removal. This can cause a crash. Fix it - the locking needs to be conditional because the first time around it's already locked. BUG: unable to handle kernel paging request at f8467360 IP: [<c14c56bb>] ctrl_dumpfamily+0x6b/0xe0 EIP: 0060:[<c14c56bb>] EFLAGS: 00210297 CPU: 2 EIP is at ctrl_dumpfamily+0x6b/0xe0 EAX: f8467378 EBX: f8467340 ECX: 00000000 EDX: ec1610c4 ESI: 00000001 EDI: c2077cc0 EBP: c46c3c00 ESP: c46c3bd4 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 CR0: 80050033 CR2: f8467360 CR3: 26e54000 CR4: 001407d0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 Process wpa_supplicant (pid: 20081, ti=c46c2000 task=c44640b0 task.ti=c46c2000) Call Trace: [<c14c20bc>] netlink_dump+0x5c/0x200 [<c14c3450>] __netlink_dump_start+0x140/0x160 [<c14c5172>] genl_rcv_msg+0x102/0x270 [<c14c4b5e>] netlink_rcv_skb+0x8e/0xb0 [<c14c505c>] genl_rcv+0x1c/0x30 [<c14c456b>] netlink_unicast+0x17b/0x1c0 [<c14c47d4>] netlink_sendmsg+0x224/0x370 [<c1485adf>] sock_sendmsg+0xff/0x120 [<c1486b0a>] __sys_sendmsg+0x24a/0x260 [<c1487fdb>] sys_sendmsg+0x3b/0x60 [<c1488683>] sys_socketcall+0x283/0x2e0 [<c15b7c1f>] sysenter_do_call+0x12/0x38 Code: 8d 3c c5 c0 7c 07 c2 8b 04 c5 c0 7c 07 c2 39 c7 8d 58 c8 75 16 eb 71 90 81 7d ec 00 1f 86 c1 74 10 8b 43 38 39 c7 8d 58 c8 74 5d <80> 7b 20 00 74 e7 83 c6 01 3b 75 f0 7c e8 8b 55 e8 8b 42 04 8b Cc: stable@vger.kernel.org Reported-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> --- net/netlink/genetlink.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c index 2fd6dbe..6b6a03a 100644 --- a/net/netlink/genetlink.c +++ b/net/netlink/genetlink.c @@ -789,6 +789,10 @@ static int ctrl_dumpfamily(struct sk_buff *skb, struct netlink_callback *cb) struct net *net = sock_net(skb->sk); int chains_to_skip = cb->args[0]; int fams_to_skip = cb->args[1]; + bool need_locking = chains_to_skip || fams_to_skip; + + if (need_locking) + genl_lock(); for (i = chains_to_skip; i < GENL_FAM_TAB_SIZE; i++) { n = 0; @@ -810,6 +814,9 @@ errout: cb->args[0] = i; cb->args[1] = n; + if (need_locking) + genl_unlock(); + return skb->len; } -- 1.8.0 ^ permalink raw reply related [flat|nested] 4+ messages in thread
[parent not found: <1375879163-2116-1-git-send-email-johannes-cdvu00un1VgdHxzADdlk8Q@public.gmane.org>]
* Re: [PATCH v2 3.11] genetlink: fix family dump race [not found] ` <1375879163-2116-1-git-send-email-johannes-cdvu00un1VgdHxzADdlk8Q@public.gmane.org> @ 2013-08-12 4:29 ` David Miller 2013-08-12 8:54 ` Johannes Berg 0 siblings, 1 reply; 4+ messages in thread From: David Miller @ 2013-08-12 4:29 UTC (permalink / raw) To: johannes-cdvu00un1VgdHxzADdlk8Q Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA, tgraf-G/eBtMaohhA, andrei.otcheretianski-ral2JQCrhuEAvxtiuMwx3w, ilan.peer-ral2JQCrhuEAvxtiuMwx3w, johannes.berg-ral2JQCrhuEAvxtiuMwx3w From: Johannes Berg <johannes-cdvu00un1VgdHxzADdlk8Q@public.gmane.org> Date: Wed, 7 Aug 2013 14:39:23 +0200 > When dumping generic netlink families, only the first dump call > is locked with genl_lock(), which protects the list of families, > and thus subsequent calls can access the data without locking, > racing against family addition/removal. This can cause a crash. > Fix it - the locking needs to be conditional because the first > time around it's already locked. > > BUG: unable to handle kernel paging request at f8467360 > IP: [<c14c56bb>] ctrl_dumpfamily+0x6b/0xe0 > EIP: 0060:[<c14c56bb>] EFLAGS: 00210297 CPU: 2 > EIP is at ctrl_dumpfamily+0x6b/0xe0 > EAX: f8467378 EBX: f8467340 ECX: 00000000 EDX: ec1610c4 > ESI: 00000001 EDI: c2077cc0 EBP: c46c3c00 ESP: c46c3bd4 > DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 > CR0: 80050033 CR2: f8467360 CR3: 26e54000 CR4: 001407d0 > DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 > DR6: ffff0ff0 DR7: 00000400 > Process wpa_supplicant (pid: 20081, ti=c46c2000 task=c44640b0 task.ti=c46c2000) > Call Trace: > [<c14c20bc>] netlink_dump+0x5c/0x200 > [<c14c3450>] __netlink_dump_start+0x140/0x160 > [<c14c5172>] genl_rcv_msg+0x102/0x270 > [<c14c4b5e>] netlink_rcv_skb+0x8e/0xb0 > [<c14c505c>] genl_rcv+0x1c/0x30 > [<c14c456b>] netlink_unicast+0x17b/0x1c0 > [<c14c47d4>] netlink_sendmsg+0x224/0x370 > [<c1485adf>] sock_sendmsg+0xff/0x120 I completely agree with your analysis that we need locking here, but the crash OOPS backtrace doesn't make any sense to me. The bug should trigger when we enter the dump continuation path, which would look like: ctrl_dumpfamily() netlink_dump() netlink_recvmsg() ... Since this is the only way we get into ctrl_dumpfamily() without holding genl_lock(). But in your trace we're going through genl_rcv() which means this is the first call of the dump, and genl_rcv() takes the necessary locks. -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v2 3.11] genetlink: fix family dump race 2013-08-12 4:29 ` David Miller @ 2013-08-12 8:54 ` Johannes Berg 2013-08-13 5:06 ` David Miller 0 siblings, 1 reply; 4+ messages in thread From: Johannes Berg @ 2013-08-12 8:54 UTC (permalink / raw) To: David Miller Cc: linux-wireless, netdev, tgraf, andrei.otcheretianski, ilan.peer, stable, Pravin B Shelar On Sun, 2013-08-11 at 21:29 -0700, David Miller wrote: > > BUG: unable to handle kernel paging request at f8467360 > > IP: [<c14c56bb>] ctrl_dumpfamily+0x6b/0xe0 > > EIP: 0060:[<c14c56bb>] EFLAGS: 00210297 CPU: 2 > > EIP is at ctrl_dumpfamily+0x6b/0xe0 > > EAX: f8467378 EBX: f8467340 ECX: 00000000 EDX: ec1610c4 > > ESI: 00000001 EDI: c2077cc0 EBP: c46c3c00 ESP: c46c3bd4 > > DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 > > CR0: 80050033 CR2: f8467360 CR3: 26e54000 CR4: 001407d0 > > DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 > > DR6: ffff0ff0 DR7: 00000400 > > Process wpa_supplicant (pid: 20081, ti=c46c2000 task=c44640b0 task.ti=c46c2000) > > Call Trace: > > [<c14c20bc>] netlink_dump+0x5c/0x200 > > [<c14c3450>] __netlink_dump_start+0x140/0x160 > > [<c14c5172>] genl_rcv_msg+0x102/0x270 > > [<c14c4b5e>] netlink_rcv_skb+0x8e/0xb0 > > [<c14c505c>] genl_rcv+0x1c/0x30 > > [<c14c456b>] netlink_unicast+0x17b/0x1c0 > > [<c14c47d4>] netlink_sendmsg+0x224/0x370 > > [<c1485adf>] sock_sendmsg+0xff/0x120 > > I completely agree with your analysis that we need locking here, but > the crash OOPS backtrace doesn't make any sense to me. > > The bug should trigger when we enter the dump continuation path, which > would look like: > > ctrl_dumpfamily() > netlink_dump() > netlink_recvmsg() > ... > > Since this is the only way we get into ctrl_dumpfamily() without > holding genl_lock(). > > But in your trace we're going through genl_rcv() which means this is > the first call of the dump, and genl_rcv() takes the necessary locks. Huh, yes, I only looked at the crash info as far as I needed to see that it was crashing at accessing "rt->netnsok" with a not totally invalid pointer "rt" (it's in EBX) and then went from the code ... Ok, I see what's going on here. The bug was reported to me against an old kernel (3.4.47!) and that actually did an unlock in genl_rcv_msg() before calling netlink_dump_start(): if (nlh->nlmsg_flags & NLM_F_DUMP) { if (ops->dumpit == NULL) return -EOPNOTSUPP; genl_unlock(); { struct netlink_dump_control c = { .dump = ops->dumpit, .done = ops->done, }; err = netlink_dump_start(net->genl_sock, skb, nlh, &c); } genl_lock(); return err; } That was changed in Pravin's commit def3117493eafd9dfa1f809d861e0031b2cc8a07 "genl: Allow concurrent genl callbacks." very recently - I'm not sure if that was intentional, it's not described in the commit log. I think for the current code the fix would still be correct, I can change the commit message if you like. For backporting, we'll have to check which tree has Pravin's change and which doesn't and change this accordingly, I suppose. johannes ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v2 3.11] genetlink: fix family dump race 2013-08-12 8:54 ` Johannes Berg @ 2013-08-13 5:06 ` David Miller 0 siblings, 0 replies; 4+ messages in thread From: David Miller @ 2013-08-13 5:06 UTC (permalink / raw) To: johannes Cc: linux-wireless, netdev, tgraf, andrei.otcheretianski, ilan.peer, stable, pshelar From: Johannes Berg <johannes@sipsolutions.net> Date: Mon, 12 Aug 2013 10:54:57 +0200 > I think for the current code the fix would still be correct, I can > change the commit message if you like. I agree, please update the commit message and resubmit. > For backporting, we'll have to check which tree has Pravin's change > and which doesn't and change this accordingly, I suppose. This is really tricky, because we'd need to also backport the fixes for the regressions added by Pravin's change such as: 50754d2188b04a679a249fb57751542643a436e0 c74f2b2678f40b80265dd53556f1f778c8e1823f ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2013-08-13 5:06 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-08-07 12:39 [PATCH v2 3.11] genetlink: fix family dump race Johannes Berg [not found] ` <1375879163-2116-1-git-send-email-johannes-cdvu00un1VgdHxzADdlk8Q@public.gmane.org> 2013-08-12 4:29 ` David Miller 2013-08-12 8:54 ` Johannes Berg 2013-08-13 5:06 ` David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).