From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= Subject: IPv6 tunnel scalability problem Date: Sun, 31 Aug 2008 19:58:51 +0300 Message-ID: <200808311958.51510.rdenis@simphalempin.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Bernhard Schmidt To: netdev@vger.kernel.org Return-path: Received: from yop.chewa.net ([91.121.105.214]:39121 "EHLO yop.chewa.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751180AbYHaQ6z convert rfc822-to-8bit (ORCPT ); Sun, 31 Aug 2008 12:58:55 -0400 Content-Disposition: inline Sender: netdev-owner@vger.kernel.org List-ID: Hello all, I have been maintaining a TUN-based Linux implementation of Teredo/RFC4= 380. On=20 a busy node, this can trigger quite many peers on the virtual point-to-= point=20 tunnel interface. I have received complaints that the whole thing seems= to=20 hit some severe performance bottleneck when this happens. It is not cle= ar to=20 me at this point whether it's a kernel or a user problem. So I have bee= n=20 writing a stress test. Now I seem to be hitting a kernel segmentation fault as soon as there a= re 1024=20 peers on a given tunnel interface (filed as #11469): BUG: unable to handle kernel NULL pointer dereference at 0000001d IP: [] :ipv6:ip6_dst_lookup_tail+0x95/0x15a *pde =3D 00000000 Oops: 0000 [#14] SMP Modules linked in: ipx p8022 psnap llc p8023 i915 drm tun cpufreq_ondem= and binfmt_misc fuse nf_conntrack_ftp nf_conntrack_ipv6 nf_conntrack_ipv4 nf_conntrack ipv6 snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi= _event snd_seq snd_timer snd_seq_device snd intel_agp psmouse soundcore agpgar= t=20 button processor snd_page_alloc parport_pc parport iTCO_wdt evdev pcspkr dm_mi= rror dm_log dm_snapshot dm_mod sg sr_mod cdrom e100 mii ehci_hcd uhci_hcd us= bcore unix Pid: 9950, comm: tunload Tainted: G D (2.6.26.3 #8) EIP: 0060:[] EFLAGS: 00210246 CPU: 0 EIP is at ip6_dst_lookup_tail+0x95/0x15a [ipv6] EAX: 00000000 EBX: 00000000 ECX: ef4abdac EDX: 00000000 ESI: ef4abd3c EDI: ef64ca00 EBP: ef4abcb8 ESP: ef4abc64 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process tunload (pid: 9950, ti=3Def4aa000 task=3Df7d45320 task.ti=3Def4= aa000) Stack: ef4abd58 ef4abdac f7cc0c00 ef4abc80 f8b36918 00000000 ef673e40 e= f4abcc0 f8b381b2 00000002 f7cc0c00 ef7c3e00 f7cc0e24 00000000 ef4abca8 e= f4abca8 c030bcfa ef4abcc0 00000000 ef4abed4 00000000 ef4abcc0 f8b377d5 e= f4abdbc Call Trace: [] ? ip6_cork_release+0x2e/0x52 [ipv6] [] ? ip6_push_pending_frames+0x1c9/0x3d9 [ipv6] [] ? _spin_unlock_bh+0xd/0xf [] ? ip6_dst_lookup+0xe/0x10 [ipv6] [] ? rawv6_sendmsg+0x25d/0xc08 [ipv6] [] ? filemap_fault+0x203/0x3d5 [] ? inet_sendmsg+0x2e/0x50 [] ? sock_sendmsg+0xcc/0xf0 [] ? autoremove_wake_function+0x0/0x3a [] ? remove_wait_queue+0x30/0x34 [] ? tun_chr_aio_read+0x298/0x31f [tun] [] ? copy_from_user+0x2a/0x114 [] ? sys_sendto+0xa5/0xc5 [] ? neigh_periodic_timer+0x0/0x17a [] ? autoremove_wake_function+0x0/0x3a [] ? sys_socketcall+0x141/0x262 [] ? sysenter_past_esp+0x6a/0x91 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Code: 22 83 fb 9b 74 37 8b 4d b0 8b 01 e8 35 96 77 c7 8b 45 b0 c7 00 00= 00 00 00 89 d8 83 c4 48 5b 5e 5f 5d c3 8b 4d b0 8b 39 8b 47 2c 40 1d de = 74 23=20 31 db 89 d8 83 c4 48 5b 5e 5f 5d c3 64 a1 04 EIP: [] ip6_dst_lookup_tail+0x95/0x15a [ipv6] SS:ESP 0068:ef4= abc64 ---[ end trace 1035c8e1d028e84b ]--- The test case is here: http://www.remlab.net/files/divers/tunload.c I would assume some that this is an allocation failure somehow, also it= seems=20 weird that there would be need to allocate any per-destination data on = a=20 point-to-point link, as there is no need for a neighbors cache. I'll try with 2.6.27-rc later. Regards, --=20 R=C3=A9mi Denis-Courmont http://www.remlab.net/