All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Vitaly V. Bursov" <vitalyb@telenet.dn.ua>
To: linux-kernel@vger.kernel.org
Subject: Scaling problem with a lot of AF_PACKET sockets on different interfaces
Date: Fri, 07 Jun 2013 14:56:00 +0300	[thread overview]
Message-ID: <51B1CA50.30702@telenet.dn.ua> (raw)

Hello,

I have a Linux router with a lot of interfaces (hundreds or
thousands of VLANs) and an application that creates AF_PACKET
socket per interface and bind()s sockets to interfaces.

Each socket has attached BPF filter too.

The problem is observed on linux-3.8.13, but as far I can see
from the source the latest version has alike behavior.

I noticed that box has strange performance problems with
most of the CPU time spent in __netif_receive_skb:
  86.15%  [k] __netif_receive_skb
   1.41%  [k] _raw_spin_lock
   1.09%  [k] fib_table_lookup
   0.99%  [k] local_bh_enable_ip

and this the assembly with the "hot spot":
        │       shr    $0x8,%r15w
        │       and    $0xf,%r15d
   0.00 │       shl    $0x4,%r15
        │       add    $0xffffffff8165ec80,%r15
        │       mov    (%r15),%rax
   0.09 │       mov    %rax,0x28(%rsp)
        │       mov    0x28(%rsp),%rbp
   0.01 │       sub    $0x28,%rbp
        │       jmp    5c7
   1.72 │5b0:   mov    0x28(%rbp),%rax
   0.05 │       mov    0x18(%rsp),%rbx
   0.00 │       mov    %rax,0x28(%rsp)
   0.03 │       mov    0x28(%rsp),%rbp
   5.67 │       sub    $0x28,%rbp
   1.71 │5c7:   lea    0x28(%rbp),%rax
   1.73 │       cmp    %r15,%rax
        │       je     640
   1.74 │       cmp    %r14w,0x0(%rbp)
        │       jne    5b0
  81.36 │       mov    0x8(%rbp),%rax
   2.74 │       cmp    %rax,%r8
        │       je     5eb
   1.37 │       cmp    0x20(%rbx),%rax
        │       je     5eb
   1.39 │       cmp    %r13,%rax
        │       jne    5b0
   0.04 │5eb:   test   %r12,%r12
   0.04 │       je     6f4
        │       mov    0xc0(%rbx),%eax
        │       mov    0xc8(%rbx),%rdx
        │       testb  $0x8,0x1(%rdx,%rax,1)
        │       jne    6d5

This corresponds to:

net/core/dev.c:
         type = skb->protocol;
         list_for_each_entry_rcu(ptype,
                         &ptype_base[ntohs(type) & PTYPE_HASH_MASK], list) {
                 if (ptype->type == type &&
                     (ptype->dev == null_or_dev || ptype->dev == skb->dev ||
                      ptype->dev == orig_dev)) {
                         if (pt_prev)
                                 ret = deliver_skb(skb, pt_prev, orig_dev);
                         pt_prev = ptype;
                 }
         }

Which works perfectly OK until there are a lot of AF_PACKET sockets, since
the socket adds a protocol to ptype list:

# cat /proc/net/ptype
Type Device      Function
0800 eth2.1989 packet_rcv+0x0/0x400
0800 eth2.1987 packet_rcv+0x0/0x400
0800 eth2.1986 packet_rcv+0x0/0x400
0800 eth2.1990 packet_rcv+0x0/0x400
0800 eth2.1995 packet_rcv+0x0/0x400
0800 eth2.1997 packet_rcv+0x0/0x400
.......
0800 eth2.1004 packet_rcv+0x0/0x400
0800          ip_rcv+0x0/0x310
0011          llc_rcv+0x0/0x3a0
0004          llc_rcv+0x0/0x3a0
0806          arp_rcv+0x0/0x150

And this obviously results in a huge performance penalty.

ptype_all, by the looks, should be the same.

Probably one way to fix this it to perform interface name matching in
af_packet handler, but there could be other cases, other protocols.

Ideas are welcome :)

-- 
Thanks
Vitaly

             reply	other threads:[~2013-06-07 12:06 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-07 11:56 Vitaly V. Bursov [this message]
2013-06-07 12:41 ` Scaling problem with a lot of AF_PACKET sockets on different interfaces Mike Galbraith
2013-06-07 13:05   ` Daniel Borkmann
2013-06-07 14:17     ` Vitaly V. Bursov
2013-06-07 14:33       ` Daniel Borkmann
2013-06-10  6:34         ` Vitaly V. Bursov
2013-06-07 13:30   ` David Laight
2013-06-07 13:30     ` David Laight
2013-06-07 13:54     ` Eric Dumazet
2013-06-07 14:09       ` David Laight
2013-06-07 14:09         ` David Laight
2013-06-07 14:30         ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51B1CA50.30702@telenet.dn.ua \
    --to=vitalyb@telenet.dn.ua \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.