All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Borkmann <dborkman@redhat.com>
To: Guy Harris <guy@alum.mit.edu>
Cc: netdev@vger.kernel.org, Chetan Loke <loke.chetan@gmail.com>
Subject: Re: Problems with TPACKET_V3 delivery of wakeups (and empty buffer blocks)
Date: Mon, 04 Aug 2014 10:07:32 +0200	[thread overview]
Message-ID: <53DF3F44.6090804@redhat.com> (raw)
In-Reply-To: <FBF13AD0-7721-4CEF-94AE-A14C1B06E84C@alum.mit.edu>

[ cc'ing Chetan for TPACKET_V3 ]

On 07/26/2014 02:43 AM, Guy Harris wrote:
> Users of libpcap, which supports TPACKET_V3 as of libpcap 1.5.0, have reported problems that
 > turned out to be due to some oddities in TPACKET_V3's behavior.
>
> See, for example:
>
> 	https://github.com/the-tcpdump-group/libpcap/issues/335
>
> 	https://github.com/the-tcpdump-group/libpcap/issues/364
>
> 	http://thread.gmane.org/gmane.network.tcpdump.devel/6823
>
> To quote one of my comments for the first issue:
>
> It appears that PF_PACKET sockets deliver a wakeup when a packet is put in a buffer block or
 > dropped due to no buffer blocks being empty, but *not* when a buffer block is handed to userland.
>
> This means that if the kernel's timer expires, and there are no packets in the current buffer
 > block being filled by the kernel, that buffer block will be handed to userland, but userland
 > won't be woken up to tell it to consume that block.
>
> Thus, libpcap will consume that block only if either:
>
> 	1. a packet is put in a buffer block, meaning it must pass the filter *and* there must be
 >          a current buffer block, belonging to the kernel, into which to put it;
>
> 	2. a packet arrives and passes the filter, but there are *no* current buffer blocks
 >          belonging to the kernel, so it's dropped;
>
> 	3. the poll() times out.
>
> So, with a low packet acceptance rate (either because there isn't much network traffic or because
 > there is but most of it is rejected by the packet filter), and with a poll() timeout of -1, meaning
 > "block forever", 1) will happen infrequently, and 3) will never happen.  With an in-kernel timeout
 > rate significantly lower than the rate of packet acceptance, the timeout will often occur when
 > there are no packets in the current buffer block, in which case the kernel will hand an empty buffer
 > block to userland and *not* tell userland about it.
>
> If that happens often enough in sequence to cause *all* buffer blocks to be handed to userland
 > before any wakeups occur, the kernel now has no buffer blocks into which to put packets, and the
 > next time a packet arrives, it will be dropped, and a wakeup will finally occur.  libpcap will drain
 > the ring, handing all buffer blocks to the kernel, *but* it won't have any packets to process!
>
> So this is ultimately a problem with the TPACKET_V3 code in the kernel.  I personally think that
 > it should *not* deliver empty buffer blocks to userland, and that it also should *not* deliver a
 > wakeup when a packet is accepted, and *should* deliver a wakeup whenever a buffer block is handed
 > to userland.  I'll report this to somebody and let them decide which of those changes should be done.
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

  reply	other threads:[~2014-08-04  8:07 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-26  0:43 Problems with TPACKET_V3 delivery of wakeups (and empty buffer blocks) Guy Harris
2014-08-04  8:07 ` Daniel Borkmann [this message]
     [not found]   ` <CAAsGZS5GNcYhXz5cD9W2iOR9mKHLJi9NpksKW5YGdPr9mb3ZnQ@mail.gmail.com>
2014-08-05  4:52     ` Guy Harris

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53DF3F44.6090804@redhat.com \
    --to=dborkman@redhat.com \
    --cc=guy@alum.mit.edu \
    --cc=loke.chetan@gmail.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.