From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jesse Brandeburg Subject: Re: PANIC in vxlan Date: Thu, 16 Jan 2014 18:03:18 -0800 Message-ID: <20140116180318.00004f53@unknown> References: <20140116171428.00004da1@unknown> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: Jesse Brandeburg , dborkman@redhat.com To: Return-path: Received: from mga09.intel.com ([134.134.136.24]:40056 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751310AbaAQCDT (ORCPT ); Thu, 16 Jan 2014 21:03:19 -0500 In-Reply-To: <20140116171428.00004da1@unknown> Sender: netdev-owner@vger.kernel.org List-ID: +dborkman@redhat.com and left the full text of the message for him to see. Bad commit below. On Thu, 16 Jan 2014 17:14:28 -0800 Jesse Brandeburg wrote: > I'm currently debugging this but given where the kernel release cycle > is I wanted to let the list know. > > It may well be a bug in our code, and if it is we'll find it, but here is > the panic, it doesn't occur when vxlan is not enabled. > > Jan 16 13:46:44 jbrandeb-cp2 kernel: [ 17.331010] cgroup: libvirtd (1387) created nested cgroup for controller "memory" which has incomplete hierarchy supp > ort. Nested cgroups may change behavior in the future. > Jan 16 13:46:44 jbrandeb-cp2 kernel: [ 17.331014] cgroup: "memory" requires setting use_hierarchy to 1 on the root. > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 17.576568] ------------[ cut here ]------------ > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 17.586411] kernel BUG at include/net/netns/generic.h:45! > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 17.596336] invalid opcode: 0000 [#1] SMP > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 17.606268] Modules linked in: lockd sunrpc i40e igb iTCO_wdt iTCO_vendor_support sb_edac ioatdma ptp microcode lpc_ich edac_core i2c_i801 mfd_core dca pps_core wmi kvm uinput isci firewire_ohci libsas firewire_core crc_itu_t scsi_transport_sas mgag200 drm_kms_helper ttm > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 17.637923] CPU: 0 PID: 1387 Comm: libvirtd Not tainted 3.13.0-rc7+ #30 > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 17.648599] Hardware name: Intel Corporation S2600CO ........../S2600CO, BIOS SE5C600.86B.01.08.6003.062420131549 06/24/2013 > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 17.659612] task: ffff88063b5c6000 ti: ffff8806333ca000 task.ti: ffff8806333ca000 > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 17.670661] RIP: 0010:[] [] net_generic.isra.34.part.35+0x4/0x6 > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 17.681738] RSP: 0018:ffff8806333cbb80 EFLAGS: 00010246 > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 17.692536] RAX: 0000000000000000 RBX: 00000000ffffffed RCX: 0000000000000010 > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 17.703577] RDX: ffff88063d03d380 RSI: 0000000000000010 RDI: ffffffff81cfd9f0 > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 17.714612] RBP: ffff8806333cbb80 R08: 0000000000000000 R09: ffffffff81cfd9f0 > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 17.725531] R10: 00000000000002cc R11: 0000000000000004 R12: 0000000000000000 > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 17.736448] R13: ffff880639118000 R14: ffff8806333cbc68 R15: 0000000000000000 > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 17.747292] FS: 00007f6381830700(0000) GS:ffff880647600000(0000) knlGS:0000000000000000 > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 17.758248] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 17.769263] CR2: 00007f637c04b000 CR3: 0000000c3aa1f000 CR4: 00000000000407f0 > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 17.780402] Stack: > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 17.791386] ffff8806333cbbc0 ffffffff814d0865 ffff8806333cbc40 00000000ffffffef > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 17.802702] 00000000ffffffed ffffffff81cc67d0 0000000000000010 ffff8806333cbc68 > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 17.814021] ffff8806333cbc00 ffffffff816e9e5d 0000000000000004 ffff8806333cbc68 > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 17.825185] Call Trace: > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 17.836106] [] vxlan_lowerdev_event+0xf5/0x100 > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 17.847254] [] notifier_call_chain+0x4d/0x70 > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 17.858457] [] __raw_notifier_call_chain+0xe/0x10 > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 17.869696] [] raw_notifier_call_chain+0x16/0x20 > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 17.880896] [] call_netdevice_notifiers_info+0x40/0x70 > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 17.892063] [] call_netdevice_notifiers+0x16/0x20 > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 17.903107] [] register_netdevice+0x1be/0x3a0 > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 17.914128] [] register_netdev+0x1e/0x30 > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 17.925072] [] loopback_net_init+0x4a/0xb0 > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 17.936048] [] ? lockd_init_net+0x6e/0xb0 [lockd] > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 17.947081] [] ops_init+0x4c/0x150 > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 17.958070] [] setup_net+0x73/0x110 > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 17.969006] [] copy_net_ns+0x7b/0x100 > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 17.979897] [] create_new_namespaces+0x101/0x1b0 > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 17.990855] [] copy_namespaces+0x85/0xb0 > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 18.001656] [] copy_process.part.26+0x935/0x1500 > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 18.012370] [] ? mntput+0x26/0x40 > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 18.022924] [] do_fork+0xbc/0x2e0 > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 18.033331] [] ? ____fput+0xe/0x10 > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 18.043622] [] ? task_work_run+0xac/0xe0 > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 18.053905] [] SyS_clone+0x16/0x20 > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 18.064265] [] stub_clone+0x69/0x90 > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 18.074600] [] ? system_call_fastpath+0x16/0x1b > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 18.084879] Code: 00 75 1d 55 be 2f 00 00 00 48 c7 c7 65 93 a2 81 48 89 e5 e8 f4 b5 98 ff 5d c6 05 30 aa 5f 00 01 c3 55 48 89 e5 0f 0b 55 48 89 e5 <0f> 0b 55 48 89 e5 0f 0b 66 66 66 66 90 55 48 c7 c7 c0 4c cb 81 > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 18.105818] RIP [] net_generic.isra.34.part.35+0x4/0x6 > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 18.116106] RSP > Jan 16 13:46:45 jbrandeb-cp2 kernel: [ 18.172366] ---[ end trace 0bb84cf9aa76a384 ]--- > Jan 16 13:46:47 jbrandeb-cp2 systemd[1]: Startup finished in 4s 918ms 164us (kernel) + 3s 548ms 460us (initrd) + 11s 2ms 474us (userspace) = 19s 469ms 98us. > Jan 16 13:46:47 jbrandeb-cp2 dbus-daemon[989]: dbus[989]: [system] Activating via systemd: service name='org.freedesktop.Accounts' unit='accounts-daemon.service' > > code says: > (gdb) l *(vxlan_lowerdev_event+0xf5) > 0xffffffff814d0865 is at include/net/netns/generic.h:41. > 34 static inline void *net_generic(const struct net *net, int id) > 35 { > 36 struct net_generic *ng; > 37 void *ptr; > 38 > 39 rcu_read_lock(); > 40 ng = rcu_dereference(net->gen); > 41 BUG_ON(id == 0 || id > ng->len); > 42 ptr = ng->ptr[id - 1]; > 43 rcu_read_unlock(); > 44 > >>>> 45 BUG_ON(!ptr); > 46 return ptr; > 47 } > 48 #endif > It appears that the bug is in acaf4e70997f (net: vxlan: when lower dev unregisters remove vxlan dev as well). reverting that patch avoids the panic. I wasn't able to see immediately what was wrong in the patch. -- Jesse