From: John Fastabend <john.fastabend@gmail.com>
To: Alexander Duyck <alexander.duyck@gmail.com>,
Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>,
Netdev <netdev@vger.kernel.org>,
Tom Herbert <tom@herbertland.com>,
Alexei Starovoitov <ast@kernel.org>,
John Fastabend <john.r.fastabend@intel.com>,
Daniel Borkmann <daniel@iogearbox.net>,
David Miller <davem@davemloft.net>
Subject: Re: Questions on XDP
Date: Sat, 18 Feb 2017 15:28:11 -0800 [thread overview]
Message-ID: <58A8D88B.3000907@gmail.com> (raw)
In-Reply-To: <CAKgT0UfE+RN-bK_Hu05kJv62s-edJtkrmkBefHU6UCYQSDdkvw@mail.gmail.com>
On 17-02-18 10:18 AM, Alexander Duyck wrote:
> On Sat, Feb 18, 2017 at 9:41 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> On Sat, 2017-02-18 at 17:34 +0100, Jesper Dangaard Brouer wrote:
>>> On Thu, 16 Feb 2017 14:36:41 -0800
>>> John Fastabend <john.fastabend@gmail.com> wrote:
>>>
>>>> On 17-02-16 12:41 PM, Alexander Duyck wrote:
>>>>> So I'm in the process of working on enabling XDP for the Intel NICs
>>>>> and I had a few questions so I just thought I would put them out here
>>>>> to try and get everything sorted before I paint myself into a corner.
>>>>>
>>>>> So my first question is why does the documentation mention 1 frame per
>>>>> page for XDP?
>>>
>>> Yes, XDP defines upfront a memory model where there is only one packet
>>> per page[1], please respect that!
>>>
>>> This is currently used/needed for fast-direct recycling of pages inside
>>> the driver for XDP_DROP and XDP_TX, _without_ performing any atomic
>>> refcnt operations on the page. E.g. see mlx4_en_rx_recycle().
Alex, does your pagecnt_bias trick resolve this? It seems to me that the
recycling is working in ixgbe patches just fine (at least I never see the
allocator being triggered with simple XDP programs). The biggest win for
me right now is to avoid the dma mapping operations.
>>
>>
>> XDP_DROP does not require having one page per frame.
>
> Agreed.
>
>> (Look after my recent mlx4 patch series if you need to be convinced)
>>
>> Only XDP_TX is.
I'm still not sure what page per packet buys us on XDP_TX. What was the
explanation again?
>>
>> This requirement makes XDP useless (very OOM likely) on arches with 64K
>> pages.
>
> Actually I have been having a side discussion with John about XDP_TX.
> Looking at the Mellanox way of doing it I am not entirely sure it is
> useful. It looks good for benchmarks but that is about it. Also I
> don't see it extending out to the point that we would be able to
> exchange packets between interfaces which really seems like it should
> be the ultimate goal for XDP_TX.
This is needed if we want XDP to be used for vswitch use cases. We have
a patch running on virtio but really need to get it working on real
hardware before we push it.
>
> It seems like eventually we want to be able to peel off the buffer and
> send it to something other than ourselves. For example it seems like
> it might be useful at some point to use XDP to do traffic
> classification and have it route packets between multiple interfaces
> on a host and it wouldn't make sense to have all of them map every
> page as bidirectional because it starts becoming ridiculous if you
> have dozens of interfaces in a system.
>
> As per our original discussion at netconf if we want to be able to do
> XDP Tx with a fully lockless Tx ring we needed to have a Tx ring per
> CPU that is performing XDP. The Tx path will end up needing to do the
> map/unmap itself in the case of physical devices but the expense of
> that can be somewhat mitigated on x86 at least by either disabling the
> IOMMU or using identity mapping. I think this might be the route
> worth exploring as we could then start looking at doing things like
> implementing bridges and routers in XDP and see what performance gains
> can be had there.
One issue I have with TX ring per CPU per device is in my current use
case I have 2k tap/vhost devices and need to scale up to more than that.
Taking the naive approach and making each tap/vhost create a per cpu
ring would be 128k rings on my current dev box. I think locking could
be optional without too much difficulty.
>
> Also as far as the one page per frame it occurs to me that you will
> have to eventually deal with things like frame replication. Once that
> comes into play everything becomes much more difficult because the
> recycling doesn't work without some sort of reference counting, and
> since the device interrupt can migrate you could end up with clean-up
> occurring on a different CPUs so you need to have some sort of
> synchronization mechanism.
>
> Thanks.
>
> - Alex
>
next prev parent reply other threads:[~2017-02-18 23:34 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-02-16 20:41 Questions on XDP Alexander Duyck
2017-02-16 22:36 ` John Fastabend
2017-02-18 16:34 ` Jesper Dangaard Brouer
2017-02-18 17:41 ` Eric Dumazet
2017-02-18 18:18 ` Alexander Duyck
2017-02-18 23:28 ` John Fastabend [this message]
-- strict thread matches above, loose matches on Subject: below --
2017-02-18 23:31 Alexei Starovoitov
2017-02-18 23:48 ` John Fastabend
2017-02-18 23:59 ` Eric Dumazet
2017-02-19 2:16 ` Alexander Duyck
2017-02-19 3:48 ` John Fastabend
2017-02-20 20:06 ` Jakub Kicinski
2017-02-22 5:02 ` John Fastabend
2017-02-21 3:18 ` Alexei Starovoitov
2017-02-21 3:39 ` John Fastabend
2017-02-21 4:00 ` Alexander Duyck
2017-02-21 7:55 ` Alexei Starovoitov
2017-02-21 17:44 ` Alexander Duyck
2017-02-22 17:08 ` John Fastabend
2017-02-22 21:59 ` Jesper Dangaard Brouer
2017-02-18 23:59 Alexei Starovoitov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=58A8D88B.3000907@gmail.com \
--to=john.fastabend@gmail.com \
--cc=alexander.duyck@gmail.com \
--cc=ast@kernel.org \
--cc=brouer@redhat.com \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=eric.dumazet@gmail.com \
--cc=john.r.fastabend@intel.com \
--cc=netdev@vger.kernel.org \
--cc=tom@herbertland.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).