From mboxrd@z Thu Jan 1 00:00:00 1970 From: ebiederm@xmission.com (Eric W. Biederman) Subject: Re: [PATCH net-next] net: vxlan: do not use vxlan_net before checking event type Date: Mon, 20 Jan 2014 13:51:22 -0800 Message-ID: <87eh428gpx.fsf@xmission.com> References: <1389959706-30976-1-git-send-email-dborkman@redhat.com> Mime-Version: 1.0 Content-Type: text/plain Cc: davem@davemloft.net, netdev@vger.kernel.org, Jesse Brandeburg , Cong Wang To: Daniel Borkmann Return-path: Received: from out02.mta.xmission.com ([166.70.13.232]:49032 "EHLO out02.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750743AbaATVvi (ORCPT ); Mon, 20 Jan 2014 16:51:38 -0500 In-Reply-To: <1389959706-30976-1-git-send-email-dborkman@redhat.com> (Daniel Borkmann's message of "Fri, 17 Jan 2014 12:55:06 +0100") Sender: netdev-owner@vger.kernel.org List-ID: Daniel Borkmann writes: > Jesse Brandeburg reported that commit acaf4e70997f caused a panic > when adding a network namespace while vxlan module was present in > the system: > > [] vxlan_lowerdev_event+0xf5/0x100 > [] notifier_call_chain+0x4d/0x70 > [] __raw_notifier_call_chain+0xe/0x10 > [] raw_notifier_call_chain+0x16/0x20 > [] call_netdevice_notifiers_info+0x40/0x70 > [] call_netdevice_notifiers+0x16/0x20 > [] register_netdevice+0x1be/0x3a0 > [] register_netdev+0x1e/0x30 > [] loopback_net_init+0x4a/0xb0 > [] ? lockd_init_net+0x6e/0xb0 [lockd] > [] ops_init+0x4c/0x150 > [] setup_net+0x73/0x110 > [] copy_net_ns+0x7b/0x100 > [] create_new_namespaces+0x101/0x1b0 > [] copy_namespaces+0x85/0xb0 > [] copy_process.part.26+0x935/0x1500 > [] ? mntput+0x26/0x40 > [] do_fork+0xbc/0x2e0 > [] ? ____fput+0xe/0x10 > [] ? task_work_run+0xac/0xe0 > [] SyS_clone+0x16/0x20 > [] stub_clone+0x69/0x90 > [] ? system_call_fastpath+0x16/0x1b > > Apparently loopback device is being registered first and thus we > receive an event notification when vxlan_net is not ready. Hence, > when we call net_generic() and request vxlan_net_id, we seem to > access garbage at that point in time. In setup_net() where we set > up a newly allocated network namespace, we traverse the list of > pernet ops ... > > list_for_each_entry(ops, &pernet_list, list) { > error = ops_init(ops, net); > if (error < 0) > goto out_undo; > } > > ... and loopback_net_init() is invoked first here, so in the middle > of setup_net() we get this notification in vxlan. As currently we > only care about devices that unregister, move access through > net_generic() there. Fix is based on Cong Wang's proposal, but > only changes what is needed here. It sucks a bit as we only work > around the actual cure: right now it seems the only way to check if > a netns actually finished traversing all init ops would be to check > if it's part of net_namespace_list. But that I find quite expensive > each time we go through a notifier callback. Anyway, did a couple > of tests and it seems good for now. I am not going to argue against this patch as an immediate bug fix but something smells here, that bears deeper investigation. It looks like the symptom is being patched rather than the actual problem. In particular net_generic(dev_net(dev), vxlan_net_id) is valid at the point that it is being called. As the pointers are allocated in copy_net_ns in net_alloc prior to setup_net being called. On the flip side it is the responsibility of code that uses both register_netdev_notifier and register_pernet_xxx to be ready to handle events from any namespace as soon as they happen. vxlan should be using register_pernet_subsys instead of register_pernet_device to ensure the vxlan_net structure is initialized before and cleaned up after all network devices in a given network namespace. The vlan devices with a similar problem already do this. So in summary. Something smells and I don't believe this patch fixes the underlying issue. Please take a deeper look into what vxlan is doing. Eric > Fixes: acaf4e70997f ("net: vxlan: when lower dev unregisters remove vxlan dev as well") > Reported-by: Jesse Brandeburg > Cc: "Eric W. Biederman" > Cc: Jesse Brandeburg > Signed-off-by: Cong Wang > Signed-off-by: Daniel Borkmann > --- > drivers/net/vxlan.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c > index a2dee80..d6ec71f 100644 > --- a/drivers/net/vxlan.c > +++ b/drivers/net/vxlan.c > @@ -2681,10 +2681,12 @@ static int vxlan_lowerdev_event(struct notifier_block *unused, > unsigned long event, void *ptr) > { > struct net_device *dev = netdev_notifier_info_to_dev(ptr); > - struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id); > + struct vxlan_net *vn; > > - if (event == NETDEV_UNREGISTER) > + if (event == NETDEV_UNREGISTER) { > + vn = net_generic(dev_net(dev), vxlan_net_id); > vxlan_handle_lowerdev_unregister(vn, dev); > + } > > return NOTIFY_DONE; > }