netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Problems with TPACKET_V3 delivery of wakeups (and empty buffer blocks)
@ 2014-07-26  0:43 Guy Harris
  2014-08-04  8:07 ` Daniel Borkmann
  0 siblings, 1 reply; 3+ messages in thread
From: Guy Harris @ 2014-07-26  0:43 UTC (permalink / raw)
  To: netdev

Users of libpcap, which supports TPACKET_V3 as of libpcap 1.5.0, have reported problems that turned out to be due to some oddities in TPACKET_V3's behavior.

See, for example:

	https://github.com/the-tcpdump-group/libpcap/issues/335

	https://github.com/the-tcpdump-group/libpcap/issues/364

	http://thread.gmane.org/gmane.network.tcpdump.devel/6823

To quote one of my comments for the first issue:

It appears that PF_PACKET sockets deliver a wakeup when a packet is put in a buffer block or dropped due to no buffer blocks being empty, but *not* when a buffer block is handed to userland.

This means that if the kernel's timer expires, and there are no packets in the current buffer block being filled by the kernel, that buffer block will be handed to userland, but userland won't be woken up to tell it to consume that block.

Thus, libpcap will consume that block only if either:

	1. a packet is put in a buffer block, meaning it must pass the filter *and* there must be a current buffer block, belonging to the kernel, into which to put it;

	2. a packet arrives and passes the filter, but there are *no* current buffer blocks belonging to the kernel, so it's dropped;

	3. the poll() times out.

So, with a low packet acceptance rate (either because there isn't much network traffic or because there is but most of it is rejected by the packet filter), and with a poll() timeout of -1, meaning "block forever", 1) will happen infrequently, and 3) will never happen.  With an in-kernel timeout rate significantly lower than the rate of packet acceptance, the timeout will often occur when there are no packets in the current buffer block, in which case the kernel will hand an empty buffer block to userland and *not* tell userland about it.

If that happens often enough in sequence to cause *all* buffer blocks to be handed to userland before any wakeups occur, the kernel now has no buffer blocks into which to put packets, and the next time a packet arrives, it will be dropped, and a wakeup will finally occur.  libpcap will drain the ring, handing all buffer blocks to the kernel, *but* it won't have any packets to process!

So this is ultimately a problem with the TPACKET_V3 code in the kernel.  I personally think that it should *not* deliver empty buffer blocks to userland, and that it also should *not* deliver a wakeup when a packet is accepted, and *should* deliver a wakeup whenever a buffer block is handed to userland.  I'll report this to somebody and let them decide which of those changes should be done.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Problems with TPACKET_V3 delivery of wakeups (and empty buffer blocks)
  2014-07-26  0:43 Problems with TPACKET_V3 delivery of wakeups (and empty buffer blocks) Guy Harris
@ 2014-08-04  8:07 ` Daniel Borkmann
       [not found]   ` <CAAsGZS5GNcYhXz5cD9W2iOR9mKHLJi9NpksKW5YGdPr9mb3ZnQ@mail.gmail.com>
  0 siblings, 1 reply; 3+ messages in thread
From: Daniel Borkmann @ 2014-08-04  8:07 UTC (permalink / raw)
  To: Guy Harris; +Cc: netdev, Chetan Loke

[ cc'ing Chetan for TPACKET_V3 ]

On 07/26/2014 02:43 AM, Guy Harris wrote:
> Users of libpcap, which supports TPACKET_V3 as of libpcap 1.5.0, have reported problems that
 > turned out to be due to some oddities in TPACKET_V3's behavior.
>
> See, for example:
>
> 	https://github.com/the-tcpdump-group/libpcap/issues/335
>
> 	https://github.com/the-tcpdump-group/libpcap/issues/364
>
> 	http://thread.gmane.org/gmane.network.tcpdump.devel/6823
>
> To quote one of my comments for the first issue:
>
> It appears that PF_PACKET sockets deliver a wakeup when a packet is put in a buffer block or
 > dropped due to no buffer blocks being empty, but *not* when a buffer block is handed to userland.
>
> This means that if the kernel's timer expires, and there are no packets in the current buffer
 > block being filled by the kernel, that buffer block will be handed to userland, but userland
 > won't be woken up to tell it to consume that block.
>
> Thus, libpcap will consume that block only if either:
>
> 	1. a packet is put in a buffer block, meaning it must pass the filter *and* there must be
 >          a current buffer block, belonging to the kernel, into which to put it;
>
> 	2. a packet arrives and passes the filter, but there are *no* current buffer blocks
 >          belonging to the kernel, so it's dropped;
>
> 	3. the poll() times out.
>
> So, with a low packet acceptance rate (either because there isn't much network traffic or because
 > there is but most of it is rejected by the packet filter), and with a poll() timeout of -1, meaning
 > "block forever", 1) will happen infrequently, and 3) will never happen.  With an in-kernel timeout
 > rate significantly lower than the rate of packet acceptance, the timeout will often occur when
 > there are no packets in the current buffer block, in which case the kernel will hand an empty buffer
 > block to userland and *not* tell userland about it.
>
> If that happens often enough in sequence to cause *all* buffer blocks to be handed to userland
 > before any wakeups occur, the kernel now has no buffer blocks into which to put packets, and the
 > next time a packet arrives, it will be dropped, and a wakeup will finally occur.  libpcap will drain
 > the ring, handing all buffer blocks to the kernel, *but* it won't have any packets to process!
>
> So this is ultimately a problem with the TPACKET_V3 code in the kernel.  I personally think that
 > it should *not* deliver empty buffer blocks to userland, and that it also should *not* deliver a
 > wakeup when a packet is accepted, and *should* deliver a wakeup whenever a buffer block is handed
 > to userland.  I'll report this to somebody and let them decide which of those changes should be done.
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Problems with TPACKET_V3 delivery of wakeups (and empty buffer blocks)
       [not found]   ` <CAAsGZS5GNcYhXz5cD9W2iOR9mKHLJi9NpksKW5YGdPr9mb3ZnQ@mail.gmail.com>
@ 2014-08-05  4:52     ` Guy Harris
  0 siblings, 0 replies; 3+ messages in thread
From: Guy Harris @ 2014-08-05  4:52 UTC (permalink / raw)
  To: chetan loke; +Cc: Daniel Borkmann, netdev@vger.kernel.org


On Aug 4, 2014, at 8:23 PM, chetan loke <loke.chetan@gmail.com> wrote:

> Empty buffer blocks have to be delivered. The patch explains it why.

If by "the patch" you mean

	https://lkml.org/lkml/2011/6/21/463

and the comment

> E2) Also implemented basic timeout mechanism to close 'a' current block.
>     That way, user-space won't be blocked forever on an idle link.
>     This is a much needed feature while monitoring multiple ports.
>     Look at 3) below.

then that's presumably referring to 3.4.2 in

> 3) Port aggregation analysis:
>    Multiple ports are viewed/analyzed as one logical pipe.
>    Example:
>    3.1) up-stream    path can be tapped in eth1
>    3.2) down-stream  path can be tapped in eth2
>    3.3) Network TAP splits Rx/Tx paths and then feeds to eth1,eth2.
> 
>    If both eth1,eth2 need to be viewed as one logical channel,
>    then that implies we need to timesort the packets as they come across
>    eth1,eth2.
> 
>    3.4) But following issues further complicates the problem:
>         3.4.1)What if one stream is bursty and other is flowing
>               at line rate?
>         3.4.2)How long do we wait before we can actually make a
>               decision in the app-space and bail-out from the spin-wait?

and presumably this is referring to some mechanism by which eth1 and eth2 are handled by *one* socket, because if it's handled by *two* sockets, you use select()/poll()/epoll() on the two sockets, and, when *either* of the devices either fills up a block or times out, a wakeup is delivered to its socket and userland wakes up.

And even if there's some reason why a wakeup has to be delivered when the timer expires even if *no* packets are available, what is the reason why you don't just deliver a wakeup but *no* buffer blocks, so that userland just says "well, I woke up, but there are no buffer blocks available for that particular socket, so there's nothing to process for that socket"?

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-08-05  4:52 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-26  0:43 Problems with TPACKET_V3 delivery of wakeups (and empty buffer blocks) Guy Harris
2014-08-04  8:07 ` Daniel Borkmann
     [not found]   ` <CAAsGZS5GNcYhXz5cD9W2iOR9mKHLJi9NpksKW5YGdPr9mb3ZnQ@mail.gmail.com>
2014-08-05  4:52     ` Guy Harris

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).