From: Bill Fink <billfink@mindspring.com>
To: Neil Horman <nhorman@tuxdriver.com>
Cc: Linux Network Developers <netdev@vger.kernel.org>,
brice@myri.com, gallatin@myri.com
Subject: Re: Receive side performance issue with multi-10-GigE and NUMA
Date: Thu, 27 Aug 2009 13:44:29 -0400 [thread overview]
Message-ID: <20090827134429.ca1ba6bd.billfink@mindspring.com> (raw)
In-Reply-To: <20090826180811.GB10816@hmsreliant.think-freely.org>
On Wed, 26 Aug 2009, Neil Horman wrote:
> On Wed, Aug 26, 2009 at 07:00:13AM -0400, Neil Horman wrote:
> > On Wed, Aug 26, 2009 at 03:10:57AM -0400, Bill Fink wrote:
> >
> > > Fortunately, in this specific case, the SuperMicro X8DAH+-F system
> > > does have a serial console, and after a fair amount of effort I was
> > > able to get it to work as desired, and was able to finally capture
> > > a backtrace of the kernel oops. BTW I believe the reason the
> > > kexec/kdump didn't work was probably because it couldn't find
> > > a /proc/vmcore file, although I don't know why that would be,
> > > and the Fedora 10 /etc/init.d/kdump script will then just boot
> > > up normally if it fails to find the /proc/vmcore file (or it's
> > > zero size).
> > >
> > I take care of kdump for fedora and RHEL. If you file a bug on this, I'd be
> > happy to look into it further.
> >
> > > The following shows a simple ping test usage of the skb_sources
> > > tracing feature:
> > >
> > > [root@xeontest1 tracing]# numactl --membind=1 taskset -c 4 ping -c 5 -s 1472 192.168.1.10
> > > PING 192.168.1.10 (192.168.1.10) 1472(1500) bytes of data.
> > > 1480 bytes from 192.168.1.10: icmp_seq=1 ttl=64 time=0.139 ms
> > > 1480 bytes from 192.168.1.10: icmp_seq=2 ttl=64 time=0.182 ms
> > > 1480 bytes from 192.168.1.10: icmp_seq=3 ttl=64 time=0.178 ms
> > > 1480 bytes from 192.168.1.10: icmp_seq=4 ttl=64 time=0.188 ms
> > > 1480 bytes from 192.168.1.10: icmp_seq=5 ttl=64 time=0.178 ms
> > >
> > > --- 192.168.1.10 ping statistics ---
> > > 5 packets transmitted, 5 received, 0% packet loss, time 3999ms
> > > rtt min/avg/max/mdev = 0.139/0.173/0.188/0.017 ms
> > >
> > > [root@xeontest1 tracing]# cat trace
> > > # tracer: skb_sources
> > > #
> > > # PID ANID CNID IFC RXQ CCPU LEN
> > > # | | | | | | |
> > > 4217 1 1 eth2 0 4 1500
> > > 4217 1 1 eth2 0 4 1500
> > > 4217 1 1 eth2 0 4 1500
> > > 4217 1 1 eth2 0 4 1500
> > > 4217 1 1 eth2 0 4 1500
> > >
> > > All is as was expected.
> > >
> > > But if I try an actual nuttcp performance test (even rate limited
> > > to 1 Mbps), I get the following kernel oops:
> > >
> > thank you, I think I see the problem, I'll have a patch for you in just a bit
> >
> > Thanks
> > Neil
> >
> > > [root@xeontest1 tracing]# numactl --membind=1 nuttcp -In2 -Ri1m -xc4/0 192.168.1.10
> > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
> > > IP: [<ffffffff810b01ab>] probe_skb_dequeue+0xf7/0x152
> > > PGD 337d12067 PUD 337d11067 PMD 0
> > > Oops: 0000 [#1] SMP
> > > last sysfs file: /sys/devices/pci0000:80/0000:80:07.0/0000:8b:00.0/0000:8c:04.0e
> > > CPU 4
> > > Modules linked in: w83627ehf hwmon_vid coretemp hwmon ipv6 dm_multipath uinput ]
> > > Pid: 4222, comm: nuttcp Not tainted 2.6.31-rc6-bf #3 X8DAH
> > > RIP: 0010:[<ffffffff810b01ab>] [<ffffffff810b01ab>] probe_skb_dequeue+0xf7/0x12
> > > RSP: 0018:ffff8801a5811a88 EFLAGS: 00010213
> > > RAX: 0000000000000000 RBX: ffff88033906d154 RCX: 000000000000000d
> > > RDX: 000000000000f88c RSI: 000000000000000b RDI: ffff8803383d3044
> > > RBP: ffff8801a5811ab8 R08: 0000000000000001 R09: ffff8801ab311a00
> > > R10: 0000000000000005 R11: ffffc9000080e2b0 R12: ffff880337c45400
> > > R13: ffff88033906d150 R14: 0000000000000014 R15: ffffffff818bb890
> > > FS: 00007fa976d326f0(0000) GS:ffffc90000800000(0000) knlGS:0000000000000000
> > > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > > CR2: 0000000000000038 CR3: 000000033801e000 CR4: 00000000000006e0
> > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > > Process nuttcp (pid: 4222, threadinfo ffff8801a5810000, task ffff8801ab2e5d00)
> > > Stack:
> > > ffff8801a5811ab8 ffff8801b35d4ab0 0000000000000014 0000000000000000
> > > <0> 0000000000000014 0000000000000014 ffff8801a5811b18 ffffffff81366ae8
> > > <0> ffff8801a5811ed8 0000001439084000 ffff880337c45400 00000001001416ef
> > > Call Trace:
> > > [<ffffffff81366ae8>] skb_copy_datagram_iovec+0x50/0x1f5
> > > [<ffffffff813ac875>] tcp_rcv_established+0x278/0x6db
> > > [<ffffffff813b3ef5>] tcp_v4_do_rcv+0x1b8/0x366
> > > [<ffffffff8135f99e>] ? release_sock+0xab/0xb4
> > > [<ffffffff8136004d>] ? sk_wait_data+0xc8/0xd6
> > > [<ffffffff813a32d6>] tcp_prequeue_process+0x79/0x8f
> > > [<ffffffff813a455d>] tcp_recvmsg+0x4e8/0xaa0
> > > [<ffffffff8135ec90>] sock_common_recvmsg+0x37/0x4c
> > > [<ffffffff8135cb06>] __sock_recvmsg+0x72/0x7f
> > > [<ffffffff8135cbdd>] sock_aio_read+0xca/0xda
> > > [<ffffffff810d9536>] ? vma_merge+0x2a0/0x318
> > > [<ffffffff810f6d4f>] do_sync_read+0xec/0x132
> > > [<ffffffff81067ddc>] ? autoremove_wake_function+0x0/0x3d
> > > [<ffffffff811b646c>] ? security_file_permission+0x16/0x18
> > > [<ffffffff810f785c>] vfs_read+0xc0/0x107
> > > [<ffffffff810f7971>] sys_read+0x4c/0x75
> > > [<ffffffff81011c82>] system_call_fastpath+0x16/0x1b
> > > Code: 44 89 73 30 89 43 14 41 0f b7 84 24 ac 00 00 00 89 43 28 65 8b 04 25 98 e
> > > RIP [<ffffffff810b01ab>] probe_skb_dequeue+0xf7/0x152
> > > RSP <ffff8801a5811a88>
> > > CR2: 0000000000000038
>
>
>
> Here you go, I think this will fix your oops.
>
>
> Fix NULL pointer deref in skb sources ftracer
>
> Its possible that skb->sk will be null in this path, so we shouldn't just assume
> we can pass it to sock_net
>
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
>
> trace_skb_sources.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/trace/trace_skb_sources.c b/kernel/trace/trace_skb_sources.c
> index 40eb071..8bf518f 100644
> --- a/kernel/trace/trace_skb_sources.c
> +++ b/kernel/trace/trace_skb_sources.c
> @@ -29,7 +29,7 @@ static void probe_skb_dequeue(const struct sk_buff *skb, int len)
> struct ring_buffer_event *event;
> struct trace_skb_event *entry;
> struct trace_array *tr = skb_trace;
> - struct net_device *dev;
> + struct net_device *dev = NULL;
>
> if (!trace_skb_source_enabled)
> return;
> @@ -50,7 +50,9 @@ static void probe_skb_dequeue(const struct sk_buff *skb, int len)
> entry->event_data.rx_queue = skb->queue_mapping;
> entry->event_data.ccpu = smp_processor_id();
>
> - dev = dev_get_by_index(sock_net(skb->sk), skb->iif);
> + if (skb->sk)
> + dev = dev_get_by_index(sock_net(skb->sk), skb->iif);
> +
> if (dev) {
> memcpy(entry->event_data.ifname, dev->name, IFNAMSIZ);
> dev_put(dev);
On the positive side, it did fix the oops. But the results of the
skb_sources tracing was not that useful.
[root@xeontest1 tracing]# numactl --membind=1 nuttcp -In2 -xc4/0 192.168.1.10 & ps ax | grep nuttcp
5521 ttyS0 S 0:00 nuttcp -In2 -xc4/0 192.168.1.10
n2: 11819.0786 MB / 10.01 sec = 9905.6427 Mbps 26 %TX 37 %RX 0 retrans 0.18 msRTT
First off, only 10 trace entries were made:
[root@xeontest1 tracing]# wc trace
14 90 334 trace
And here they are:
[root@xeontest1 tracing]# cat trace
# tracer: skb_sources
#
# PID ANID CNID IFC RXQ CCPU LEN
# | | | | | | |
5521 0 0 Unknown 0 3 888
5521 0 0 Unknown 0 3 896
5521 0 0 Unknown 0 3 20
5521 0 0 Unknown 0 3 888
5521 0 0 Unknown 0 3 896
5521 0 0 Unknown 0 3 20
5521 1 1 Unknown 0 4 20
5521 1 1 Unknown 0 4 11
5521 1 1 Unknown 0 4 540
5521 1 1 Unknown 0 4 0
Even for these 10 entries, why is the IFC Unknown, and the LENs
seem to be wrong too.
-Bill
next prev parent reply other threads:[~2009-08-27 17:44 UTC|newest]
Thread overview: 89+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-07 21:06 Receive side performance issue with multi-10-GigE and NUMA Bill Fink
2009-08-07 21:18 ` Brice Goglin
2009-08-07 21:51 ` Bill Fink
2009-08-07 21:53 ` Brice Goglin
2009-08-07 22:08 ` Bill Fink
2009-08-07 22:17 ` Brice Goglin
2009-08-07 22:55 ` Bill Fink
2009-08-08 1:03 ` Andrew Gallatin
2009-08-08 1:35 ` Bill Fink
2009-08-08 11:08 ` Andrew Gallatin
2009-08-08 11:26 ` Neil Horman
2009-08-08 18:21 ` Andrew Gallatin
2009-08-08 18:32 ` Neil Horman
2009-08-11 7:32 ` Bill Fink
2009-08-11 11:02 ` Neil Horman
2009-08-11 19:15 ` Christoph Lameter
2009-08-11 22:27 ` Andi Kleen
2009-08-12 4:30 ` Bill Fink
2009-08-12 7:21 ` Andi Kleen
[not found] ` <4A856781.2080301@myri.com>
2009-08-14 16:38 ` Bill Fink
2009-08-14 16:55 ` Andrew Gallatin
2009-08-14 21:13 ` Aviv Greenberg
2009-08-20 7:26 ` Bill Fink
2009-08-20 13:14 ` Ben Hutchings
2009-08-21 4:00 ` Bill Fink
2009-08-20 13:17 ` Aviv Greenberg
2009-08-12 0:02 ` Brandeburg, Jesse
2009-08-12 4:38 ` Bill Fink
2009-08-12 16:00 ` Jesse Barnes
2009-08-14 20:31 ` Bill Fink
2009-08-17 16:53 ` Jesse Barnes
2009-08-18 7:07 ` Bill Fink
2009-08-18 11:54 ` Andrew Gallatin
2009-08-19 17:59 ` Bill Fink
2009-08-07 22:12 ` Neil Horman
2009-08-08 0:54 ` Bill Fink
2009-08-08 1:56 ` Neil Horman
2009-08-14 20:44 ` Bill Fink
2009-08-14 23:25 ` Neil Horman
2009-08-20 7:50 ` Bill Fink
2009-08-20 20:19 ` Neil Horman
2009-08-21 4:14 ` Bill Fink
2009-08-21 15:23 ` Neil Horman
2009-08-21 15:36 ` Andrew Gallatin
2009-08-26 7:10 ` Bill Fink
2009-08-26 11:00 ` Neil Horman
2009-08-26 18:08 ` Neil Horman
2009-08-26 18:15 ` Ingo Molnar
2009-08-26 19:04 ` Neil Horman
2009-08-26 19:08 ` Ingo Molnar
2009-08-26 19:36 ` David Miller
2009-08-26 19:48 ` Ingo Molnar
2009-08-26 20:23 ` Neil Horman
2009-08-26 20:40 ` Ingo Molnar
2009-08-26 22:39 ` Neil Horman
2009-08-26 22:44 ` David Miller
2009-08-26 23:05 ` Ingo Molnar
2009-08-26 23:08 ` David Miller
2009-08-26 23:58 ` Ingo Molnar
2009-08-27 0:05 ` Steven Rostedt
2009-08-27 0:35 ` Christoph Hellwig
2009-08-27 9:28 ` Ingo Molnar
2009-08-26 23:05 ` Steven Rostedt
2009-08-26 23:09 ` David Miller
2009-08-26 23:30 ` Ingo Molnar
2009-08-26 23:23 ` Neil Horman
2009-08-26 23:29 ` David Miller
2009-08-26 23:19 ` Neil Horman
2009-08-26 23:14 ` Ingo Molnar
2009-08-26 23:33 ` Steven Rostedt
2009-08-27 0:14 ` Neil Horman
2009-08-27 0:29 ` Steven Rostedt
2009-08-27 1:17 ` Neil Horman
2009-08-27 9:06 ` Ingo Molnar
2009-08-27 9:34 ` Ingo Molnar
2009-08-27 0:34 ` Christoph Hellwig
2009-08-26 23:46 ` Frederic Weisbecker
2009-08-26 20:28 ` Ingo Molnar
2009-08-26 20:01 ` Neil Horman
2009-08-26 22:57 ` Ingo Molnar
2009-08-27 17:32 ` Bill Fink
2009-09-02 5:28 ` Bill Fink
2009-08-27 17:44 ` Bill Fink [this message]
2009-08-27 17:51 ` Neil Horman
2009-09-02 5:11 ` Bill Fink
2009-09-02 10:49 ` Neil Horman
2009-09-02 15:38 ` Bill Fink
2009-08-12 23:29 ` David Miller
2009-08-13 2:35 ` Bill Fink
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090827134429.ca1ba6bd.billfink@mindspring.com \
--to=billfink@mindspring.com \
--cc=brice@myri.com \
--cc=gallatin@myri.com \
--cc=netdev@vger.kernel.org \
--cc=nhorman@tuxdriver.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).