netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: John Fastabend <john.fastabend@gmail.com>
Cc: netdev@vger.kernel.org, jakub.kicinski@netronome.com,
	"Michael S. Tsirkin" <mst@redhat.com>,
	pavel.odintsov@gmail.com, Jason Wang <jasowang@redhat.com>,
	mchan@broadcom.com, peter.waskiewicz.jr@intel.com,
	Daniel Borkmann <borkmann@iogearbox.net>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Andy Gospodarek <andy@greyhouse.net>,
	brouer@redhat.com
Subject: Re: [net-next V6 PATCH 0/5] New bpf cpumap type for XDP_REDIRECT
Date: Wed, 11 Oct 2017 10:06:08 +0200	[thread overview]
Message-ID: <20171011100608.48bc1c86@redhat.com> (raw)
In-Reply-To: <a430c181-aa56-61a7-fc59-9b135bbb262b@gmail.com>

On Tue, 10 Oct 2017 23:10:39 -0700
John Fastabend <john.fastabend@gmail.com> wrote:

> On 10/10/2017 05:47 AM, Jesper Dangaard Brouer wrote:
> > Introducing a new way to redirect XDP frames.  Notice how no driver
> > changes are necessary given the design of XDP_REDIRECT.
> > 
> > This redirect map type is called 'cpumap', as it allows redirection
> > XDP frames to remote CPUs.  The remote CPU will do the SKB allocation
> > and start the network stack invocation on that CPU.
> > 
> > This is a scalability and isolation mechanism, that allow separating
> > the early driver network XDP layer, from the rest of the netstack, and
> > assigning dedicated CPUs for this stage.  The sysadm control/configure
> > the RX-CPU to NIC-RX queue (as usual) via procfs smp_affinity and how
> > many queues are configured via ethtool --set-channels.  Benchmarks
> > show that a single CPU can handle approx 11Mpps.  Thus, only assigning
> > two NIC RX-queues (and two CPUs) is sufficient for handling 10Gbit/s
> > wirespeed smallest packet 14.88Mpps.  Reducing the number of queues
> > have the advantage that more packets being "bulk" available per hard
> > interrupt[1].
> > 
> > [1] https://www.netdevconf.org/2.1/papers/BusyPollingNextGen.pdf
> > 
> > Use-cases:
> > 
> > 1. End-host based pre-filtering for DDoS mitigation.  This is fast
> >    enough to allow software to see and filter all packets wirespeed.
> >    Thus, no packets getting silently dropped by hardware.
> > 
> > 2. Given NIC HW unevenly distributes packets across RX queue, this
> >    mechanism can be used for redistribution load across CPUs.  This
> >    usually happens when HW is unaware of a new protocol.  This
> >    resembles RPS (Receive Packet Steering), just faster, but with more
> >    responsibility placed on the BPF program for correct steering.  
> 
> Hi Jesper,
> 
> Another (somewhat meta) comment about the performance benchmarks. In
> one of the original threads you showed that the XDP cpu map outperformed
> RPS in TCP_CRR netperf tests. It was significant iirc in the mpps range.

Let me correct this.  This is (significantly) faster than RPS, and
it have the same performance as netperf TCP_CRR and TCP_RR.  As this is
just invoking the network stack (on a remote CPU). Thus, I'm very happy
to see the same comparative performance.  The netperf TCP_RR test is
actually the worst case scenario, where the "hidden" bulking doesn't
work.  And RPS is the best case scenario. I've even left several
optimization opportunities for later.


> But, with this series we will skip GRO. Do you have any idea how this
> looks with other tests such as TCP_STREAM? I'm trying to understand
> if this is something that can be used in the general case or is more
> for the special case and will have to be enabled/disabled by the
> orchestration layer depending on workload/network conditions.

On my testlab server, the TCP_STREAM tests show the same results (full
10G with MTU size packets).  This is because my server is fast-enough,
and don't need the GRO aggregation to keep up (it "only" need to handle
812Kpps).
 
> My intuition is the general case will be slower due to lack of GRO. If
> this is the case any ideas how we could add GRO? Not needed in the
> initial patchset but trying to see if the two are mutually exclusive.
> I don't off-hand see an easy way to pull GRO into this feature.

Adding GRO _later_ is a big part of my plan.  I haven't figured out the
exact code paths.  The general idea is to perform partial sorting of
flows, based on the RSS-hash or something provided by the BPF prog.

NetFlix's extension to FreeBSD illustrate the GRO sorting problem
nicely[1], see section "RSS Assisted LRO".  For the record, my idea is
not based on their idea.  I had this idea long before reading their
article. I want to partial sorting on many levels. E.g. cpumap enqueue
can have 8 times 8 percpu packet queues (64 packets max NAPI budget)
sorted on some part of the RSS-hash.  BPF prog choosing a CPU
destination is also a sorting step.  The cpumap dequeue kthread step,
that need to invoke a GRO netstack function, can also perform a partial
sorting step, plus implement a GRO flush point when the queue is empty.

[1] https://medium.com/netflix-techblog/serving-100-gbps-from-an-open-connect-appliance-cdb51dda3b99

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

      reply	other threads:[~2017-10-11  8:06 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-10 12:47 [net-next V6 PATCH 0/5] New bpf cpumap type for XDP_REDIRECT Jesper Dangaard Brouer
2017-10-10 12:47 ` [net-next V6 PATCH 1/5] bpf: introduce new bpf cpu map type BPF_MAP_TYPE_CPUMAP Jesper Dangaard Brouer
2017-10-10 22:48   ` Daniel Borkmann
2017-10-11  5:36     ` Jesper Dangaard Brouer
2017-10-10 12:47 ` [net-next V6 PATCH 2/5] bpf: XDP_REDIRECT enable use of cpumap Jesper Dangaard Brouer
2017-10-10 12:47 ` [net-next V6 PATCH 3/5] bpf: cpumap xdp_buff to skb conversion and allocation Jesper Dangaard Brouer
2017-10-10 12:47 ` [net-next V6 PATCH 4/5] bpf: cpumap add tracepoints Jesper Dangaard Brouer
2017-10-10 12:47 ` [net-next V6 PATCH 5/5] samples/bpf: add cpumap sample program xdp_redirect_cpu Jesper Dangaard Brouer
2017-10-11  6:10 ` [net-next V6 PATCH 0/5] New bpf cpumap type for XDP_REDIRECT John Fastabend
2017-10-11  8:06   ` Jesper Dangaard Brouer [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171011100608.48bc1c86@redhat.com \
    --to=brouer@redhat.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andy@greyhouse.net \
    --cc=borkmann@iogearbox.net \
    --cc=jakub.kicinski@netronome.com \
    --cc=jasowang@redhat.com \
    --cc=john.fastabend@gmail.com \
    --cc=mchan@broadcom.com \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pavel.odintsov@gmail.com \
    --cc=peter.waskiewicz.jr@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).