Re: [PATCH v5 2/2] skb_array: ring test

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jesper Dangaard Brouer <brouer@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: linux-kernel@vger.kernel.org, Jason Wang <jasowang@redhat.com>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	davem@davemloft.net, netdev@vger.kernel.org,
	Steven Rostedt <rostedt@goodmis.org>,
	brouer@redhat.com
Subject: Re: [PATCH v5 2/2] skb_array: ring test
Date: Thu, 2 Jun 2016 20:47:25 +0200	[thread overview]
Message-ID: <20160602204725.7bcfd927@redhat.com> (raw)
In-Reply-To: <20160524224710-mutt-send-email-mst@redhat.com>

On Tue, 24 May 2016 23:34:14 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Tue, May 24, 2016 at 07:03:20PM +0200, Jesper Dangaard Brouer wrote:
> > 
> > On Tue, 24 May 2016 12:28:09 +0200
> > Jesper Dangaard Brouer <brouer@redhat.com> wrote:
> >   
> > > I do like perf, but it does not answer my questions about the
> > > performance of this queue. I will code something up in my own
> > > framework[2] to answer my own performance questions.
> > > 
> > > Like what is be minimum overhead (in cycles) achievable with this type
> > > of queue, in the most optimal situation (e.g. same CPU enq+deq cache hot)
> > > for fastpath usage.  
> > 
> > Coded it up here:
> >  https://github.com/netoptimizer/prototype-kernel/commit/b16a3332184
> >  https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/skb_array_bench01.c
> > 
> > This is a really fake benchmark, but it sort of shows the  
> > overhead achievable with this type of queue, where it is the same
> > CPU enqueuing and dequeuing, and cache is guaranteed to be hot.
> > 
> > Measured on a i7-4790K CPU @ 4.00GHz, the average cost of
> > enqueue+dequeue of a single object is around 102 cycles(tsc).
> > 
> > To compare this with below, where enq and deq is measured separately:
> >  102 / 2 = 51 cycles

The alf_queue[1] baseline is 26 cycles in this minimum overhead
achievable benchmark with a MPMC (Multi-Producer/Multi-Consumer) queue
which use a locked cmpxchg.  (SPSC variant is 5 cycles, thus most cost
comes from locked cmpxchg).

[1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/include/linux/alf_queue.h

> > > Then I also want to know how this performs when two CPUs are involved.
> > > As this is also a primary use-case, for you when sending packets into a
> > > guest.  
> > 
> > Coded it up here:
> >  https://github.com/netoptimizer/prototype-kernel/commit/75fe31ef62e
> >  https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/skb_array_parallel01.c
> >  
> > This parallel benchmark try to keep two (or more) CPUs busy enqueuing or
> > dequeuing on the same skb_array queue.  It prefills the queue,
> > and stops the test as soon as queue is empty or full, or
> > completes a number of "loops"/cycles.
> > 
> > For two CPUs the results are really good:
> >  enqueue: 54 cycles(tsc)
> >  dequeue: 53 cycles(tsc)

As MST points out, a scheme like the alf_queue[1] have the issue that it
"reads" the opposite cacheline of the consumer.tail/producer.tail to
determine if space-is-left/queue-is-empty.  This cause an expensive
transition for the cache coherency protocol.

Coded up similar test for alf_queue:
 https://github.com/netoptimizer/prototype-kernel/commit/b3ff2624f1
 https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/alf_queue_parallel01.c

For two CPUs MPMC results are, significantly worse, and demonstrate MSTs point:
 enqueue: 227 cycles(tsc)
 dequeue: 231 cycles(tsc)

Alf_queue also have a SPSC (Single-Producer/Single-Consumer) variant:
 enqueue: 24 cycles(tsc)
 dequeue: 23 cycles(tsc)


> > Going to 4 CPUs, things break down (but it was not primary use-case?):
> >  CPU(0) 927 cycles(tsc) enqueue
> >  CPU(1) 921 cycles(tsc) dequeue
> >  CPU(2) 927 cycles(tsc) enqueue
> >  CPU(3) 898 cycles(tsc) dequeue  
> 
> It's mostly the spinlock contention I guess.
> Maybe we don't need fair spinlocks in this case.
> Try replacing spinlocks with simple cmpxchg
> and see what happens?

The alf_queue uses a cmpxchg scheme, and it does scale better when the
number of CPUs increase:

 CPUs:4 Average: 586 cycles(tsc)
 CPUs:6 Average: 744 cycles(tsc)
 CPUs:8 Average: 1578 cycles(tsc)

Notice the alf_queue was designed with the purpose of bulking, to
mitigate the effect of this cacheline bouncing, but it was not covered
in this test.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

next prev parent reply	other threads:[~2016-06-02 18:47 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-23 10:43 [PATCH v5 0/2] skb_array: array based FIFO for skbs Michael S. Tsirkin
2016-05-23 10:43 ` [PATCH v5 1/2] " Michael S. Tsirkin
2016-05-23 10:43 ` [PATCH v5 2/2] skb_array: ring test Michael S. Tsirkin
2016-05-23 13:09   ` Jesper Dangaard Brouer
2016-05-23 20:52     ` Michael S. Tsirkin
2016-05-24 10:28       ` Jesper Dangaard Brouer
2016-05-24 10:33         ` Michael S. Tsirkin
2016-05-24 11:54         ` Michael S. Tsirkin
2016-05-24 12:11         ` Michael S. Tsirkin
2016-05-24 17:03         ` Jesper Dangaard Brouer
2016-05-24 20:34           ` Michael S. Tsirkin
2016-06-02 18:47             ` Jesper Dangaard Brouer [this message]
2016-06-03 12:15               ` Jesper Dangaard Brouer
2016-05-23 13:31 ` [PATCH v5 0/2] skb_array: array based FIFO for skbs Eric Dumazet
2016-05-23 20:35   ` Michael S. Tsirkin
2016-05-30  9:59 ` Jason Wang
2016-05-30 15:37   ` Michael S. Tsirkin
2016-05-31  2:29     ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160602204725.7bcfd927@redhat.com \
    --to=brouer@redhat.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=jasowang@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.