netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Fastabend <john.fastabend@gmail.com>
To: Alexander Duyck <alexander.duyck@gmail.com>,
	Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>,
	Netdev <netdev@vger.kernel.org>,
	Tom Herbert <tom@herbertland.com>,
	Alexei Starovoitov <ast@kernel.org>,
	John Fastabend <john.r.fastabend@intel.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	David Miller <davem@davemloft.net>
Subject: Re: Questions on XDP
Date: Sat, 18 Feb 2017 15:28:11 -0800	[thread overview]
Message-ID: <58A8D88B.3000907@gmail.com> (raw)
In-Reply-To: <CAKgT0UfE+RN-bK_Hu05kJv62s-edJtkrmkBefHU6UCYQSDdkvw@mail.gmail.com>

On 17-02-18 10:18 AM, Alexander Duyck wrote:
> On Sat, Feb 18, 2017 at 9:41 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> On Sat, 2017-02-18 at 17:34 +0100, Jesper Dangaard Brouer wrote:
>>> On Thu, 16 Feb 2017 14:36:41 -0800
>>> John Fastabend <john.fastabend@gmail.com> wrote:
>>>
>>>> On 17-02-16 12:41 PM, Alexander Duyck wrote:
>>>>> So I'm in the process of working on enabling XDP for the Intel NICs
>>>>> and I had a few questions so I just thought I would put them out here
>>>>> to try and get everything sorted before I paint myself into a corner.
>>>>>
>>>>> So my first question is why does the documentation mention 1 frame per
>>>>> page for XDP?
>>>
>>> Yes, XDP defines upfront a memory model where there is only one packet
>>> per page[1], please respect that!
>>>
>>> This is currently used/needed for fast-direct recycling of pages inside
>>> the driver for XDP_DROP and XDP_TX, _without_ performing any atomic
>>> refcnt operations on the page. E.g. see mlx4_en_rx_recycle().

Alex, does your pagecnt_bias trick resolve this? It seems to me that the
recycling is working in ixgbe patches just fine (at least I never see the
allocator being triggered with simple XDP programs). The biggest win for
me right now is to avoid the dma mapping operations.

>>
>>
>> XDP_DROP does not require having one page per frame.
> 
> Agreed.
> 
>> (Look after my recent mlx4 patch series if you need to be convinced)
>>
>> Only XDP_TX is.

I'm still not sure what page per packet buys us on XDP_TX. What was the
explanation again?

>>
>> This requirement makes XDP useless (very OOM likely) on arches with 64K
>> pages.
> 
> Actually I have been having a side discussion with John about XDP_TX.
> Looking at the Mellanox way of doing it I am not entirely sure it is
> useful.  It looks good for benchmarks but that is about it.  Also I
> don't see it extending out to the point that we would be able to
> exchange packets between interfaces which really seems like it should
> be the ultimate goal for XDP_TX.

This is needed if we want XDP to be used for vswitch use cases. We have
a patch running on virtio but really need to get it working on real
hardware before we push it.

> 
> It seems like eventually we want to be able to peel off the buffer and
> send it to something other than ourselves.  For example it seems like
> it might be useful at some point to use XDP to do traffic
> classification and have it route packets between multiple interfaces
> on a host and it wouldn't make sense to have all of them map every
> page as bidirectional because it starts becoming ridiculous if you
> have dozens of interfaces in a system.
> 
> As per our original discussion at netconf if we want to be able to do
> XDP Tx with a fully lockless Tx ring we needed to have a Tx ring per
> CPU that is performing XDP.  The Tx path will end up needing to do the
> map/unmap itself in the case of physical devices but the expense of
> that can be somewhat mitigated on x86 at least by either disabling the
> IOMMU or using identity mapping.  I think this might be the route
> worth exploring as we could then start looking at doing things like
> implementing bridges and routers in XDP and see what performance gains
> can be had there.

One issue I have with TX ring per CPU per device is in my current use
case I have 2k tap/vhost devices and need to scale up to more than that.
Taking the naive approach and making each tap/vhost create a per cpu
ring would be 128k rings on my current dev box. I think locking could
be optional without too much difficulty.

> 
> Also as far as the one page per frame it occurs to me that you will
> have to eventually deal with things like frame replication.  Once that
> comes into play everything becomes much more difficult because the
> recycling doesn't work without some sort of reference counting, and
> since the device interrupt can migrate you could end up with clean-up
> occurring on a different CPUs so you need to have some sort of
> synchronization mechanism.
> 
> Thanks.
> 
> - Alex
> 

  reply	other threads:[~2017-02-18 23:34 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-16 20:41 Questions on XDP Alexander Duyck
2017-02-16 22:36 ` John Fastabend
2017-02-18 16:34   ` Jesper Dangaard Brouer
2017-02-18 17:41     ` Eric Dumazet
2017-02-18 18:18       ` Alexander Duyck
2017-02-18 23:28         ` John Fastabend [this message]
  -- strict thread matches above, loose matches on Subject: below --
2017-02-18 23:31 Alexei Starovoitov
2017-02-18 23:48 ` John Fastabend
2017-02-18 23:59   ` Eric Dumazet
2017-02-19  2:16   ` Alexander Duyck
2017-02-19  3:48     ` John Fastabend
2017-02-20 20:06       ` Jakub Kicinski
2017-02-22  5:02         ` John Fastabend
2017-02-21  3:18     ` Alexei Starovoitov
2017-02-21  3:39       ` John Fastabend
2017-02-21  4:00         ` Alexander Duyck
2017-02-21  7:55           ` Alexei Starovoitov
2017-02-21 17:44             ` Alexander Duyck
2017-02-22 17:08               ` John Fastabend
2017-02-22 21:59                 ` Jesper Dangaard Brouer
2017-02-18 23:59 Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=58A8D88B.3000907@gmail.com \
    --to=john.fastabend@gmail.com \
    --cc=alexander.duyck@gmail.com \
    --cc=ast@kernel.org \
    --cc=brouer@redhat.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=john.r.fastabend@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=tom@herbertland.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).