netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Borkmann <dborkman@redhat.com>
To: John Fastabend <john.fastabend@gmail.com>
Cc: Florian Westphal <fw@strlen.de>,
	gerlitz.or@gmail.com, hannes@stressinduktion.org,
	netdev@vger.kernel.org, john.ronciak@intel.com,
	amirv@mellanox.com, eric.dumazet@gmail.com, danny.zhou@intel.com
Subject: Re: [net-next PATCH v1 1/3] net: sched: af_packet support for direct ring access
Date: Mon, 06 Oct 2014 11:49:25 +0200	[thread overview]
Message-ID: <543265A5.8000606@redhat.com> (raw)
In-Reply-To: <5431EC82.7010305@gmail.com>

Hi John,

On 10/06/2014 03:12 AM, John Fastabend wrote:
> On 10/05/2014 05:29 PM, Florian Westphal wrote:
>> John Fastabend <john.fastabend@gmail.com> wrote:
>>> There is one critical difference when running with these interfaces
>>> vs running without them. In the normal case the af_packet module
>>> uses a standard descriptor format exported by the af_packet user
>>> space headers. In this model because we are working directly with
>>> driver queues the descriptor format maps to the descriptor format
>>> used by the device. User space applications can learn device
>>> information from the socket option PACKET_DEV_DESC_INFO which
>>> should provide enough details to extrapulate the descriptor formats.
>>> Although this adds some complexity to user space it removes the
>>> requirement to copy descriptor fields around.
>>
>> I find it very disappointing that we seem to have to expose such
>> hardware specific details to userspace via hw-independent interface.
>
> Well it was only for convenience if it doesn't fit as a socket
> option we can remove it. We can look up the device using the netdev
> name from the bind call. I see your point though so if there is
> consensus that this is not needed that is fine.
>
>> How big of a cost are we talking about when you say that it 'removes
>> the requirement to copy descriptor fields'?
>
> This was likely a poor description. If you want to let user space
> poll on the ring (without using system calls or interrupts) then
> I don't see how you can _not_ expose the ring directly complete with
> the vendor descriptor formats.

But how big is the concrete performance degradation you're seeing if you
use an e.g. `netmap-alike` Linux-own variant as a hw-neutral interface
that does *not* directly expose hw descriptor formats to user space?

With 1 core netmap does 10G line-rate on 64b; I don't know their numbers
on 40G when run on decent hardware though.

It would really be great if we have something vendor neutral exposed as
a stable ABI and could leverage emerging infrastructure we already have
in the kernel such as eBPF and recent qdisc batching for raw sockets
instead of reinventing the wheels. (Don't get me wrong, I would love to
see AF_PACKET improved ...)

Thanks,
Daniel

  reply	other threads:[~2014-10-06  9:49 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-06  0:06 [net-next PATCH v1 1/3] net: sched: af_packet support for direct ring access John Fastabend
2014-10-06  0:07 ` [net-next PATCH v1 2/3] net: sched: add direct ring acces via af_packet to ixgbe John Fastabend
2014-10-06  0:07 ` [net-next PATCH v1 3/3] net: packet: Document PACKET_DEV_QPAIR_SPLIT and friends John Fastabend
2014-10-06  0:29 ` [net-next PATCH v1 1/3] net: sched: af_packet support for direct ring access Florian Westphal
2014-10-06  1:09   ` David Miller
2014-10-06  1:18     ` John Fastabend
2014-10-06  1:12   ` John Fastabend
2014-10-06  9:49     ` Daniel Borkmann [this message]
2014-10-06 15:01       ` John Fastabend
2014-10-06 16:35         ` Jesper Dangaard Brouer
2014-10-06 17:03         ` Hannes Frederic Sowa
2014-10-06 20:37           ` John Fastabend
2014-10-06 23:26             ` Hannes Frederic Sowa
2014-10-07 18:59               ` Neil Horman
2014-10-08 17:20                 ` John Fastabend
2014-10-09 13:36                   ` [PATCH] af_packet: Add Doorbell transmit mode to AF_PACKET sockets Neil Horman
2014-10-09 15:01                     ` John Fastabend
2014-10-09 16:05                       ` Neil Horman
2014-10-06 16:55 ` [net-next PATCH v1 1/3] net: sched: af_packet support for direct ring access Stephen Hemminger
2014-10-06 20:42   ` John Fastabend
2014-10-06 21:42 ` David Miller
2014-10-07  4:25   ` John Fastabend
2014-10-07  4:24 ` Willem de Bruijn
2014-10-07  9:27   ` David Laight
2014-10-07 15:43     ` David Miller
2014-10-07 15:59       ` David Laight
2014-10-07 16:08         ` David Miller
2014-10-07 15:21   ` Zhou, Danny
2014-10-07 15:46     ` Willem de Bruijn
2014-10-07 15:55       ` John Fastabend
2014-10-07 16:06         ` Zhou, Danny
2014-10-07 16:05     ` David Miller
2014-10-10  3:49       ` Zhou, Danny
  -- strict thread matches above, loose matches on Subject: below --
2014-10-07 16:33 Alexei Starovoitov
2014-10-07 16:46 ` Zhou, Danny
2014-10-07 17:01 ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=543265A5.8000606@redhat.com \
    --to=dborkman@redhat.com \
    --cc=amirv@mellanox.com \
    --cc=danny.zhou@intel.com \
    --cc=eric.dumazet@gmail.com \
    --cc=fw@strlen.de \
    --cc=gerlitz.or@gmail.com \
    --cc=hannes@stressinduktion.org \
    --cc=john.fastabend@gmail.com \
    --cc=john.ronciak@intel.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).