From: "Vitaly V. Bursov" <vitalyb@telenet.dn.ua>
To: linux-kernel@vger.kernel.org
Subject: Scaling problem with a lot of AF_PACKET sockets on different interfaces
Date: Fri, 07 Jun 2013 14:56:00 +0300 [thread overview]
Message-ID: <51B1CA50.30702@telenet.dn.ua> (raw)
Hello,
I have a Linux router with a lot of interfaces (hundreds or
thousands of VLANs) and an application that creates AF_PACKET
socket per interface and bind()s sockets to interfaces.
Each socket has attached BPF filter too.
The problem is observed on linux-3.8.13, but as far I can see
from the source the latest version has alike behavior.
I noticed that box has strange performance problems with
most of the CPU time spent in __netif_receive_skb:
86.15% [k] __netif_receive_skb
1.41% [k] _raw_spin_lock
1.09% [k] fib_table_lookup
0.99% [k] local_bh_enable_ip
and this the assembly with the "hot spot":
│ shr $0x8,%r15w
│ and $0xf,%r15d
0.00 │ shl $0x4,%r15
│ add $0xffffffff8165ec80,%r15
│ mov (%r15),%rax
0.09 │ mov %rax,0x28(%rsp)
│ mov 0x28(%rsp),%rbp
0.01 │ sub $0x28,%rbp
│ jmp 5c7
1.72 │5b0: mov 0x28(%rbp),%rax
0.05 │ mov 0x18(%rsp),%rbx
0.00 │ mov %rax,0x28(%rsp)
0.03 │ mov 0x28(%rsp),%rbp
5.67 │ sub $0x28,%rbp
1.71 │5c7: lea 0x28(%rbp),%rax
1.73 │ cmp %r15,%rax
│ je 640
1.74 │ cmp %r14w,0x0(%rbp)
│ jne 5b0
81.36 │ mov 0x8(%rbp),%rax
2.74 │ cmp %rax,%r8
│ je 5eb
1.37 │ cmp 0x20(%rbx),%rax
│ je 5eb
1.39 │ cmp %r13,%rax
│ jne 5b0
0.04 │5eb: test %r12,%r12
0.04 │ je 6f4
│ mov 0xc0(%rbx),%eax
│ mov 0xc8(%rbx),%rdx
│ testb $0x8,0x1(%rdx,%rax,1)
│ jne 6d5
This corresponds to:
net/core/dev.c:
type = skb->protocol;
list_for_each_entry_rcu(ptype,
&ptype_base[ntohs(type) & PTYPE_HASH_MASK], list) {
if (ptype->type == type &&
(ptype->dev == null_or_dev || ptype->dev == skb->dev ||
ptype->dev == orig_dev)) {
if (pt_prev)
ret = deliver_skb(skb, pt_prev, orig_dev);
pt_prev = ptype;
}
}
Which works perfectly OK until there are a lot of AF_PACKET sockets, since
the socket adds a protocol to ptype list:
# cat /proc/net/ptype
Type Device Function
0800 eth2.1989 packet_rcv+0x0/0x400
0800 eth2.1987 packet_rcv+0x0/0x400
0800 eth2.1986 packet_rcv+0x0/0x400
0800 eth2.1990 packet_rcv+0x0/0x400
0800 eth2.1995 packet_rcv+0x0/0x400
0800 eth2.1997 packet_rcv+0x0/0x400
.......
0800 eth2.1004 packet_rcv+0x0/0x400
0800 ip_rcv+0x0/0x310
0011 llc_rcv+0x0/0x3a0
0004 llc_rcv+0x0/0x3a0
0806 arp_rcv+0x0/0x150
And this obviously results in a huge performance penalty.
ptype_all, by the looks, should be the same.
Probably one way to fix this it to perform interface name matching in
af_packet handler, but there could be other cases, other protocols.
Ideas are welcome :)
--
Thanks
Vitaly
next reply other threads:[~2013-06-07 12:06 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-07 11:56 Vitaly V. Bursov [this message]
2013-06-07 12:41 ` Scaling problem with a lot of AF_PACKET sockets on different interfaces Mike Galbraith
2013-06-07 13:05 ` Daniel Borkmann
2013-06-07 14:17 ` Vitaly V. Bursov
2013-06-07 14:33 ` Daniel Borkmann
2013-06-10 6:34 ` Vitaly V. Bursov
2013-06-07 13:30 ` David Laight
2013-06-07 13:54 ` Eric Dumazet
2013-06-07 14:09 ` David Laight
2013-06-07 14:30 ` Eric Dumazet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51B1CA50.30702@telenet.dn.ua \
--to=vitalyb@telenet.dn.ua \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox