From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Michael Chan <michael.chan@broadcom.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>,
"Duyck, Alexander H" <alexander.h.duyck@intel.com>,
"john.fastabend@gmail.com" <john.fastabend@gmail.com>,
"pstaszewski@itcare.pl" <pstaszewski@itcare.pl>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"xdp-newbies@vger.kernel.org" <xdp-newbies@vger.kernel.org>,
"andy@greyhouse.net" <andy@greyhouse.net>,
"borkmann@iogearbox.net" <borkmann@iogearbox.net>,
brouer@redhat.com
Subject: Re: XDP redirect measurements, gotchas and tracepoints
Date: Fri, 25 Aug 2017 14:45:13 +0200 [thread overview]
Message-ID: <20170825144513.1ee9fbb1@redhat.com> (raw)
In-Reply-To: <CACKFLinGuaDLxYRd=vC99DL5n0mf0rDbPRaDg4ctev=DEAhRSQ@mail.gmail.com>
On Thu, 24 Aug 2017 20:36:28 -0700
Michael Chan <michael.chan@broadcom.com> wrote:
> On Wed, Aug 23, 2017 at 1:29 AM, Jesper Dangaard Brouer
> <brouer@redhat.com> wrote:
> > On Tue, 22 Aug 2017 23:59:05 -0700
> > Michael Chan <michael.chan@broadcom.com> wrote:
> >
> >> On Tue, Aug 22, 2017 at 6:06 PM, Alexander Duyck
> >> <alexander.duyck@gmail.com> wrote:
> >> > On Tue, Aug 22, 2017 at 1:04 PM, Michael Chan <michael.chan@broadcom.com> wrote:
> >> >>
> >> >> Right, but it's conceivable to add an API to "return" the buffer to
> >> >> the input device, right?
> >
> > Yes, I would really like to see an API like this.
> >
> >> >
> >> > You could, it is just added complexity. "just free the buffer" in
> >> > ixgbe usually just amounts to one atomic operation to decrement the
> >> > total page count since page recycling is already implemented in the
> >> > driver. You still would have to unmap the buffer regardless of if you
> >> > were recycling it or not so all you would save is 1.000015259 atomic
> >> > operations per packet. The fraction is because once every 64K uses we
> >> > have to bulk update the count on the page.
> >> >
> >>
> >> If the buffer is returned to the input device, the input device can
> >> keep the DMA mapping. All it needs to do is to dma_sync it back to
> >> the input device when the buffer is returned.
> >
> > Yes, exactly, return to the input device. I really think we should
> > work on a solution where we can keep the DMA mapping around. We have
> > an opportunity here to make ndo_xdp_xmit TX queues use a specialized
> > page return call, to achieve this. (I imagine other arch's have a high
> > DMA overhead than Intel)
> >
> > I'm not sure how the API should look. The ixgbe recycle mechanism and
> > splitting the page (into two packets) actually complicates things, and
> > tie us into a page-refcnt based model. We could get around this by
> > each driver implementing a page-return-callback, that allow us to
> > return the page to the input device? Then, drivers implementing the
> > 1-packet-per-page can simply check/read the page-refcnt, and if it is
> > "1" DMA-sync and reuse it in the RX queue.
> >
>
> Yeah, based on Alex' description, it's not clear to me whether ixgbe
> redirecting to a non-intel NIC or vice versa will actually work. It
> sounds like the output device has to make some assumptions about how
> the page was allocated by the input device.
Yes, exactly. We are tied into a page refcnt based scheme.
Besides the ixgbe page recycle scheme (which keeps the DMA RX-mapping)
is also tied to the RX queue size, plus how fast the pages are returned.
This makes it very hard to tune. As I demonstrated, default ixgbe
settings does not work well with XDP_REDIRECT. I needed to increase
TX-ring size, but it broke page recycling (dropping perf from 13Mpps to
10Mpps) so I also needed it increase RX-ring size. But perf is best if
RX-ring size is smaller, thus two contradicting tuning needed.
> With buffer return API,
> each driver can cleanly recycle or free its own buffers properly.
Yes, exactly. And RX-driver can implement a special memory model for
this queue. E.g. RX-driver can know this is a dedicated XDP RX-queue
which is never used for SKBs, thus opening for new RX memory models.
Another advantage of a return API. There is also an opportunity for
avoiding the DMA map on TX. As we need to know the from-device. Thus,
we can add a DMA API, where we can query if the two devices uses the
same DMA engine, and can reuse the same DMA address the RX-side already
knows.
> Let me discuss this further with Andy to see if we can come up with a
> good scheme.
Sound good, looking forward to hear what you come-up with :-)
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
next prev parent reply other threads:[~2017-08-25 12:45 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-21 19:25 XDP redirect measurements, gotchas and tracepoints Jesper Dangaard Brouer
2017-08-21 22:35 ` Alexei Starovoitov
2017-08-22 6:37 ` Jesper Dangaard Brouer
2017-08-22 17:09 ` Alexei Starovoitov
2017-08-22 17:17 ` John Fastabend
2017-08-23 8:56 ` Jesper Dangaard Brouer
2017-08-22 18:02 ` Michael Chan
2017-08-22 18:17 ` John Fastabend
2017-08-22 18:30 ` Duyck, Alexander H
2017-08-22 20:04 ` Michael Chan
2017-08-23 1:06 ` Alexander Duyck
2017-08-23 6:59 ` Michael Chan
2017-08-23 8:29 ` Jesper Dangaard Brouer
2017-08-25 3:36 ` Michael Chan
2017-08-25 12:45 ` Jesper Dangaard Brouer [this message]
2017-08-25 15:10 ` John Fastabend
2017-08-25 15:28 ` Michael Chan
2017-08-28 16:02 ` Andy Gospodarek
2017-08-28 16:11 ` Alexander Duyck
2017-08-29 13:26 ` Jesper Dangaard Brouer
2017-08-29 16:23 ` Alexander Duyck
2017-08-29 19:02 ` Andy Gospodarek
2017-08-29 19:52 ` Alexander Duyck
2017-08-28 16:14 ` John Fastabend
2017-08-28 19:39 ` Andy Gospodarek
2017-08-23 14:51 ` Alexander Duyck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170825144513.1ee9fbb1@redhat.com \
--to=brouer@redhat.com \
--cc=alexander.duyck@gmail.com \
--cc=alexander.h.duyck@intel.com \
--cc=andy@greyhouse.net \
--cc=borkmann@iogearbox.net \
--cc=john.fastabend@gmail.com \
--cc=michael.chan@broadcom.com \
--cc=netdev@vger.kernel.org \
--cc=pstaszewski@itcare.pl \
--cc=xdp-newbies@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).