Re: [tcpdump-workers] Performance impact with multiple pcap handlers on Linux

BPF List
 help / color / mirror / Atom feed

* Re: [tcpdump-workers] Performance impact with multiple pcap handlers on Linux
       [not found] <mailman.79.1608674752.8496.tcpdump-workers@lists.tcpdump.org>
@ 2020-12-22 22:28 ` Guy Harris
  2020-12-22 23:31   ` Linus Lüssing
  0 siblings, 1 reply; 4+ messages in thread
From: Guy Harris @ 2020-12-22 22:28 UTC (permalink / raw)
  To: Linus Lüssing; +Cc: tcpdump-workers, bpf

On Dec 22, 2020, at 2:05 PM, Linus Lüssing via tcpdump-workers <tcpdump-workers@lists.tcpdump.org> wrote:

> I was experimenting a bit with migrating from the use of
> pcap_offline_filter() to pcap_setfilter().
> 
> I was a bit surprised that installing for instance 500 pcap
> handlers

What is a "pcap handler" in this context?  An open live-capture pcap_t?

> with a BPF rule "arp" via pcap_setfilter() reduced
> the TCP performance of iperf3 over veth interfaces from 73.8 Gbits/sec
> to 5.39 Gbits/sec. Using only one or even five handlers seemed
> fine (71.7 Gbits/sec and 70.3 Gbits/sec).
> 
> Is that expected?
> 
> Full test setup description and more detailed results can be found
> here: https://github.com/lemoer/bpfcountd/pull/8

That talks about numbers of "rules" rather than "handlers".  It does speak of "pcap *handles*"; did you mean "handles", rather than "handlers"?

Do those "rules" correspond to items in the filter expression that's compiled into BPF code, or do they correspond to open `pcap_t`s?  If a "rule" corresponds to a "handle", then does it correspond to an open pcap_t?

Or do they correspond to an entire filter expression?

Does this change involve replacing a *single* pcap_t, on which you use pcap_offline_filter() with multiple different filter expressions, with *multiple* pcap_t's, with each one having a separate filter, set with pcap_setfilter()?  If so, note that this involves replacing a single file descriptor with multiple file descriptors, and replacing a single ring buffer into which the kernel puts captured packets with multiple ring buffers into *each* of which the kernel puts captured packets, which increases the amount of work the kernel does.

> PS: And I was also surprised that there seems to be a limit of
> only 510 pcap handlers on Linux.

"handlers" or "handles"?

If it's "handles", as in "pcap_t's open for live capture", and if you're switching from a single pcap_t to multiple pcap_t's, that means using more file descriptors (so that you may eventually run out) and more ring buffers (so that the kernel may eventually say "you're tying up too much wired memory for all those ring buffers").

In either of those cases, the attempt to open a pcap_t will eventually get an error; what is the error that's reported?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [tcpdump-workers] Performance impact with multiple pcap handlers on Linux
  2020-12-22 22:28 ` [tcpdump-workers] Performance impact with multiple pcap handlers on Linux Guy Harris
@ 2020-12-22 23:31   ` Linus Lüssing
  2020-12-23  6:20     ` Guy Harris
  0 siblings, 1 reply; 4+ messages in thread
From: Linus Lüssing @ 2020-12-22 23:31 UTC (permalink / raw)
  To: Guy Harris; +Cc: tcpdump-workers, bpf

On Tue, Dec 22, 2020 at 02:28:17PM -0800, Guy Harris wrote:
> On Dec 22, 2020, at 2:05 PM, Linus Lüssing via tcpdump-workers <tcpdump-workers@lists.tcpdump.org> wrote:
> 
> > I was experimenting a bit with migrating from the use of
> > pcap_offline_filter() to pcap_setfilter().
> > 
> > I was a bit surprised that installing for instance 500 pcap
> > handlers
> 
> What is a "pcap handler" in this context?  An open live-capture pcap_t?
> 
> > with a BPF rule "arp" via pcap_setfilter() reduced
> > the TCP performance of iperf3 over veth interfaces from 73.8 Gbits/sec
> > to 5.39 Gbits/sec. Using only one or even five handlers seemed
> > fine (71.7 Gbits/sec and 70.3 Gbits/sec).
> > 
> > Is that expected?
> > 
> > Full test setup description and more detailed results can be found
> > here: https://github.com/lemoer/bpfcountd/pull/8
> 
> That talks about numbers of "rules" rather than "handlers".  It does speak of "pcap *handles*"; did you mean "handles", rather than "handlers"?

Sorry, right, I ment pcap handles everywhere.

So far the bpfcountd code uses one pcap_t handle created via one
pcap_open_live() call. And then for each received packet iterates
over a list of user specified filter expressions and applies
pcap_offline_filter() for each filter to the packet. And then
counts the number of packets and packet bytes that matched each
filter expression.

> 
> Do those "rules" correspond to items in the filter expression that's compiled into BPF code, or do they correspond to open `pcap_t`s?  If a "rule" corresponds to a "handle", then does it correspond to an open pcap_t?
> 
> Or do they correspond to an entire filter expression?

What I ment with "rule" was an entire filter expression. The user
specifies a list of filter expressions. And bpfcountd counts how
many packets and the sum of packet bytes which matched each filter
expression.

Basically we want to do live measurements of the overhead of the mesh
routing protocol and measure and dissect the layer 2 broadcast traffic.
To measure how much ARP, DHCP, ICMPv6 NS/NA/RS/RA, MDNS, LLDP overhead
etc. we have.

> 
> Does this change involve replacing a *single* pcap_t, on which you use pcap_offline_filter() with multiple different filter expressions, with *multiple* pcap_t's, with each one having a separate filter, set with pcap_setfilter()?  If so, note that this involves replacing a single file descriptor with multiple file descriptors, and replacing a single ring buffer into which the kernel puts captured packets with multiple ring buffers into *each* of which the kernel puts captured packets, which increases the amount of work the kernel does.

Correct. I tried to replace the single pcap_t with multiple
pcap_t's, one for each filter expression the user specified. And
then tried using pcap_setfilter() on each pcap_t and removing the
filtering in userspace via pcap_offline_filter().

The idea was to improve performance by A): Avoiding to copy the
actual packet data to userspace. And B) I was hoping that
traffic which does not match any filter expression would not be
impacted by running bpfcountd / libpcap that much anymore.

Right, for matching, captured traffic the work for the kernel is
probably more, with mulitple ring buffers as you described. But
we only want to match and measure and dissect broadcast and mesh
protocol traffic with bpfcountd. For which we are expecting traffic
rates of about 100 to 500 kbits/s which are supposed to match.

Unicast IP traffic at much higher rates will not be matched and the
idea/hope for these changes was to leave the IP unicast performance
mostly untampered while still measuring and dissecting the other,
non IP unicast traffic.

> 
> > PS: And I was also surprised that there seems to be a limit of
> > only 510 pcap handlers on Linux.
> 
> "handlers" or "handles"?
> 
> If it's "handles", as in "pcap_t's open for live capture", and if you're switching from a single pcap_t to multiple pcap_t's, that means using more file descriptors (so that you may eventually run out) and more ring buffers (so that the kernel may eventually say "you're tying up too much wired memory for all those ring buffers").
> 
> In either of those cases, the attempt to open a pcap_t will eventually get an error; what is the error that's reported?

pcap_activate() returns "socket: Too many open files" for the
511th pcap_t and pcap_activate() call.

Ah! "ulimit -n" as root returns "1024" for me. Increasing that
limit helps, I can have more pcap_t handles then, thanks!

(as a non-root user "ulimit -n" returns 1048576 - interesting that
an unprivileged user can open more sockets than root by default,
didn't expect that)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [tcpdump-workers] Performance impact with multiple pcap handlers on Linux
  2020-12-22 23:31   ` Linus Lüssing
@ 2020-12-23  6:20     ` Guy Harris
  2021-01-04 12:40       ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 4+ messages in thread
From: Guy Harris @ 2020-12-23  6:20 UTC (permalink / raw)
  To: Linus Lüssing; +Cc: tcpdump-workers, bpf

On Dec 22, 2020, at 3:31 PM, Linus Lüssing <linus.luessing@c0d3.blue> wrote:

> Basically we want to do live measurements of the overhead of the mesh
> routing protocol and measure and dissect the layer 2 broadcast traffic.
> To measure how much ARP, DHCP, ICMPv6 NS/NA/RS/RA, MDNS, LLDP overhead
> etc. we have.

OK, so I'm not a member of the bpf mailing list, so this message won't get to that list, but:

Given how general (e)BPF is in Linux, and given the number of places where you can add an eBPF program, and given the extensions added by the "(e)" part, it might be possible to:

	construct a single eBPF program that matches all of those packet types;

	provides, in some fashion, an indication of *which* of the packet types matched;

	provides the packet length as well.

If you *only* care about the packet counts and packet byte counts, that might be sufficient if the eBPF program can be put into the right place in the networking stack - it would also mean that the Linux kernel wouldn't have to copy the packets (as it does for each PF_PACKET socket being used for capturing, and there's one of those for every pcap_t), and your program wouldn't have to read those packets.

libpcap won't help you there, as it doesn't even know about eBPF, much less about it's added capabilities, but it sounds as if this is a Linux-specific program, so that doesn't matter.  There may be a compiler allowing you to write a program to do what's described above and get it compiled into eBPF.

I don't know whether there's a place in the networking stack to which you can attach an eBPF probe to do this, but I wouldn't be surprised to find out that there is one.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [tcpdump-workers] Performance impact with multiple pcap handlers on Linux
  2020-12-23  6:20     ` Guy Harris
@ 2021-01-04 12:40       ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 4+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-01-04 12:40 UTC (permalink / raw)
  To: Guy Harris, Linus Lüssing; +Cc: tcpdump-workers, bpf

Guy Harris <gharris@sonic.net> writes:

> On Dec 22, 2020, at 3:31 PM, Linus Lüssing <linus.luessing@c0d3.blue> wrote:
>
>> Basically we want to do live measurements of the overhead of the mesh
>> routing protocol and measure and dissect the layer 2 broadcast traffic.
>> To measure how much ARP, DHCP, ICMPv6 NS/NA/RS/RA, MDNS, LLDP overhead
>> etc. we have.
>
> OK, so I'm not a member of the bpf mailing list, so this message won't
> get to that list, but:

Yes it did :)

> Given how general (e)BPF is in Linux, and given the number of places
> where you can add an eBPF program, and given the extensions added by
> the "(e)" part, it might be possible to:
>
> 	construct a single eBPF program that matches all of those packet types;
>
> 	provides, in some fashion, an indication of *which* of the packet types matched;
>
> 	provides the packet length as well.
>
> If you *only* care about the packet counts and packet byte counts,
> that might be sufficient if the eBPF program can be put into the right
> place in the networking stack - it would also mean that the Linux
> kernel wouldn't have to copy the packets (as it does for each
> PF_PACKET socket being used for capturing, and there's one of those
> for every pcap_t), and your program wouldn't have to read those
> packets.
>
> libpcap won't help you there, as it doesn't even know about eBPF, much
> less about it's added capabilities, but it sounds as if this is a
> Linux-specific program, so that doesn't matter. There may be a
> compiler allowing you to write a program to do what's described above
> and get it compiled into eBPF.

You could certainly do this in eBPF: Write an eBPF program that matches
each packet type, and updates a BPF map with the count and total size.
Then you just need to read this map from userspace to get the values you
want, and there will be no copying involved anywhere.

We have some examples of packet parsing in the XDP tutorial (also partly
applicable to other eBPF hooks with direct packet access):
https://github.com/xdp-project/xdp-tutorial/blob/master/packet-solutions/xdp_prog_kern_02.c

> I don't know whether there's a place in the networking stack to which
> you can attach an eBPF probe to do this, but I wouldn't be surprised
> to find out that there is one.

On egress you could attach to the TC hook; on ingress if your driver has
native XDP support you can attach there with very little overhead. If it
doesn't (which would be the case for WiFi drivers), you may as well use
the TC hook on ingress as well (via a tc 'ingress' filter). There's also
a 'generic XDP', but it doesn't really have any performance benefit over
TC, so you may as well just use the TC hook...

Here's an example of how to share code between the XDP and TC hooks:

https://github.com/xdp-project/bpf-examples/tree/master/encap-forward

The bpf-examples repository is also meant as a way to showcase
real-world examples of how to do useful things with BPF, and contains
some Makefile infrastructure to get the build setup. If you want to
contribute your packet monitor as an example here, feel free to open a
pull request! :)

-Toke

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-01-04 12:42 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <mailman.79.1608674752.8496.tcpdump-workers@lists.tcpdump.org>
2020-12-22 22:28 ` [tcpdump-workers] Performance impact with multiple pcap handlers on Linux Guy Harris
2020-12-22 23:31   ` Linus Lüssing
2020-12-23  6:20     ` Guy Harris
2021-01-04 12:40       ` Toke Høiland-Jørgensen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox