All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Yunsheng Lin <linyunsheng@huawei.com>
Cc: davem@davemloft.net, kuba@kernel.org, jasowang@redhat.com,
	brouer@redhat.com, paulmck@kernel.org, peterz@infradead.org,
	will@kernel.org, shuah@kernel.org, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linuxarm@openeuler.org
Subject: Re: [PATCH net-next v2 2/2] ptr_ring: make __ptr_ring_empty() checking more reliable
Date: Sun, 27 Jun 2021 02:07:13 -0400	[thread overview]
Message-ID: <20210627020440-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <77615160-6f4f-64bf-7de9-b0adaddd5f06@huawei.com>

On Fri, Jun 25, 2021 at 05:20:10PM +0800, Yunsheng Lin wrote:
> On 2021/6/25 14:39, Michael S. Tsirkin wrote:
> > On Fri, Jun 25, 2021 at 11:18:56AM +0800, Yunsheng Lin wrote:
> >> Currently r->queue[] is cleared after r->consumer_head is moved
> >> forward, which makes the __ptr_ring_empty() checking called in
> >> page_pool_refill_alloc_cache() unreliable if the checking is done
> >> after the r->queue clearing and before the consumer_head moving
> >> forward.
> >>
> >> Move the r->queue[] clearing after consumer_head moving forward
> >> to make __ptr_ring_empty() checking more reliable.
> >>
> >> As a side effect of above change, a consumer_head checking is
> >> avoided for the likely case, and it has noticeable performance
> >> improvement when it is tested using the ptr_ring_test selftest
> >> added in the previous patch.
> >>
> >> Using "taskset -c 1 ./ptr_ring_test -s 1000 -m 0 -N 100000000"
> >> to test the case of single thread doing both the enqueuing and
> >> dequeuing:
> >>
> >>  arch     unpatched           patched       delta
> >> arm64      4648 ms            4464 ms       +3.9%
> >>  X86       2562 ms            2401 ms       +6.2%
> >>
> >> Using "taskset -c 1-2 ./ptr_ring_test -s 1000 -m 1 -N 100000000"
> >> to test the case of one thread doing enqueuing and another thread
> >> doing dequeuing concurrently, also known as single-producer/single-
> >> consumer:
> >>
> >>  arch      unpatched             patched         delta
> >> arm64   3624 ms + 3624 ms   3462 ms + 3462 ms    +4.4%
> >>  x86    2758 ms + 2758 ms   2547 ms + 2547 ms    +7.6%
> > 
> > Nice but it's small - could be a fluke.
> > How many tests did you run? What is the variance?
> > Did you try pinning to different CPUs to observe numa effects?
> > Please use perf or some other modern tool for this kind
> > of benchmark. Thanks!
> 
> The result is quite stable, and retest using perf stat:

How stable exactly?  Try with -r so we can find out.

> ---------------unpatched ptr_ring.c begin----------------------------------
> 
> perf stat ./ptr_ring_test -s 1000 -m 0 -N 100000000
> ptr_ring(size:1000) perf simple test for 100000000 times, took 2385198 us
> 
>  Performance counter stats for './ptr_ring_test -s 1000 -m 0 -N 100000000':
> 
>            2385.49 msec task-clock                #    1.000 CPUs utilized
>                 26      context-switches          #    0.011 K/sec
>                  0      cpu-migrations            #    0.000 K/sec
>                 57      page-faults               #    0.024 K/sec
>         6202023521      cycles                    #    2.600 GHz
>        17424187640      instructions              #    2.81  insn per cycle
>    <not supported>      branches
>            6506477      branch-misses
> 
>        2.385785170 seconds time elapsed
> 
>        2.384014000 seconds user
>        0.000000000 seconds sys
> 
> 
> root@(none):~# perf stat ./ptr_ring_test -s 1000 -m 0 -N 100000000
> ptr_ring(size:1000) perf simple test for 100000000 times, took 2383385 us
> 
>  Performance counter stats for './ptr_ring_test -s 1000 -m 0 -N 100000000':
> 
>            2383.67 msec task-clock                #    1.000 CPUs utilized
>                 26      context-switches          #    0.011 K/sec
>                  0      cpu-migrations            #    0.000 K/sec
>                 57      page-faults               #    0.024 K/sec
>         6197278066      cycles                    #    2.600 GHz
>        17424207772      instructions              #    2.81  insn per cycle
>    <not supported>      branches
>            6495766      branch-misses
> 
>        2.383941170 seconds time elapsed
> 
>        2.382215000 seconds user
>        0.000000000 seconds sys
> 
> 
> root@(none):~# perf stat ./ptr_ring_test -s 1000 -m 0 -N 100000000
> ptr_ring(size:1000) perf simple test for 100000000 times, took 2390858 us
> 
>  Performance counter stats for './ptr_ring_test -s 1000 -m 0 -N 100000000':
> 
>            2391.16 msec task-clock                #    1.000 CPUs utilized
>                 25      context-switches          #    0.010 K/sec
>                  0      cpu-migrations            #    0.000 K/sec
>                 57      page-faults               #    0.024 K/sec
>         6216704120      cycles                    #    2.600 GHz
>        17424243041      instructions              #    2.80  insn per cycle
>    <not supported>      branches
>            6483886      branch-misses
> 
>        2.391420440 seconds time elapsed
> 
>        2.389647000 seconds user
>        0.000000000 seconds sys
> 
> 
> root@(none):~# perf stat ./ptr_ring_test -s 1000 -m 0 -N 100000000
> ptr_ring(size:1000) perf simple test for 100000000 times, took 2389810 us
> 
>  Performance counter stats for './ptr_ring_test -s 1000 -m 0 -N 100000000':
> 
>            2390.10 msec task-clock                #    1.000 CPUs utilized
>                 26      context-switches          #    0.011 K/sec
>                  0      cpu-migrations            #    0.000 K/sec
>                 58      page-faults               #    0.024 K/sec
>         6213995715      cycles                    #    2.600 GHz
>        17424227499      instructions              #    2.80  insn per cycle
>    <not supported>      branches
>            6474069      branch-misses
> 
>        2.390367070 seconds time elapsed
> 
>        2.388644000 seconds user
>        0.000000000 seconds sys
> 
> ---------------unpatched ptr_ring.c end----------------------------------
> 
> 
> 
> ---------------patched ptr_ring.c begin----------------------------------
> root@(none):~# perf stat ./ptr_ring_test_opt -s 1000 -m 0 -N 100000000
> ptr_ring(size:1000) perf simple test for 100000000 times, took 2198894 us
> 
>  Performance counter stats for './ptr_ring_test_opt -s 1000 -m 0 -N 100000000':
> 
>            2199.18 msec task-clock                #    1.000 CPUs utilized
>                 23      context-switches          #    0.010 K/sec
>                  0      cpu-migrations            #    0.000 K/sec
>                 56      page-faults               #    0.025 K/sec
>         5717671859      cycles                    #    2.600 GHz
>        16124164124      instructions              #    2.82  insn per cycle
>    <not supported>      branches
>            6564829      branch-misses
> 
>        2.199445990 seconds time elapsed
> 
>        2.197859000 seconds user
>        0.000000000 seconds sys
> 
> 
> root@(none):~# perf stat ./ptr_ring_test_opt -s 1000 -m 0 -N 100000000
> ptr_ring(size:1000) perf simple test for 100000000 times, took 2222337 us
> 
>  Performance counter stats for './ptr_ring_test_opt -s 1000 -m 0 -N 100000000':
> 
>            2222.63 msec task-clock                #    1.000 CPUs utilized
>                 23      context-switches          #    0.010 K/sec
>                  0      cpu-migrations            #    0.000 K/sec
>                 57      page-faults               #    0.026 K/sec
>         5778632853      cycles                    #    2.600 GHz
>        16124210769      instructions              #    2.79  insn per cycle
>    <not supported>      branches
>            6603904      branch-misses
> 
>        2.222901020 seconds time elapsed
> 
>        2.221312000 seconds user
>        0.000000000 seconds sys
> 
> 
> root@(none):~# perf stat ./ptr_ring_test_opt -s 1000 -m 0 -N 100000000
> ptr_ring(size:1000) perf simple test for 100000000 times, took 2251980 us
> 
>  Performance counter stats for './ptr_ring_test_opt -s 1000 -m 0 -N 100000000':
> 
>            2252.28 msec task-clock                #    1.000 CPUs utilized
>                 25      context-switches          #    0.011 K/sec
>                  0      cpu-migrations            #    0.000 K/sec
>                 57      page-faults               #    0.025 K/sec
>         5855668335      cycles                    #    2.600 GHz
>        16124310588      instructions              #    2.75  insn per cycle
>    <not supported>      branches
>            6777279      branch-misses
> 
>        2.252543340 seconds time elapsed
> 
>        2.250897000 seconds user
>        0.000000000 seconds sys
> 
> 
> root@(none):~#
> root@(none):~# perf stat ./ptr_ring_test_opt -s 1000 -m 0 -N 100000000
> ptr_ring(size:1000) perf simple test for 100000000 times, took 2209415 us
> 
>  Performance counter stats for './ptr_ring_test_opt -s 1000 -m 0 -N 100000000':
> 
>            2209.70 msec task-clock                #    1.000 CPUs utilized
>                 24      context-switches          #    0.011 K/sec
>                  0      cpu-migrations            #    0.000 K/sec
>                 58      page-faults               #    0.026 K/sec
>         5745003772      cycles                    #    2.600 GHz
>        16124198886      instructions              #    2.81  insn per cycle
>    <not supported>      branches
>            6508414      branch-misses
> 
>        2.209973960 seconds time elapsed
> 
>        2.208354000 seconds user
>        0.000000000 seconds sys
> 
> 
> root@(none):~# perf stat ./ptr_ring_test_opt -s 1000 -m 0 -N 100000000
> ptr_ring(size:1000) perf simple test for 100000000 times, took 2211409 us
> 
>  Performance counter stats for './ptr_ring_test_opt -s 1000 -m 0 -N 100000000':
> 
>            2211.70 msec task-clock                #    1.000 CPUs utilized
>                 24      context-switches          #    0.011 K/sec
>                  0      cpu-migrations            #    0.000 K/sec
>                 57      page-faults               #    0.026 K/sec
>         5750136694      cycles                    #    2.600 GHz
>        16124176577      instructions              #    2.80  insn per cycle
>    <not supported>      branches
>            6553023      branch-misses
> 
>        2.211968470 seconds time elapsed
> 
>        2.210303000 seconds user
>        0.000000000 seconds sys
> 
> ---------------patched ptr_ring.c end----------------------------------
> 
> > 
> >>
> 


  reply	other threads:[~2021-06-27  6:07 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-25  3:18 [PATCH net-next v2 0/2] add benchmark selftest and optimization for ptr_ring Yunsheng Lin
2021-06-25  3:18 ` [PATCH net-next v2 1/2] selftests/ptr_ring: add benchmark application " Yunsheng Lin
2021-06-25  3:36   ` Jason Wang
2021-06-25  3:52     ` Yunsheng Lin
2021-06-27  6:09       ` Michael S. Tsirkin
2021-06-28  1:42         ` Yunsheng Lin
2021-06-25  6:37   ` Michael S. Tsirkin
2021-06-25  7:40     ` Yunsheng Lin
2021-06-25  3:18 ` [PATCH net-next v2 2/2] ptr_ring: make __ptr_ring_empty() checking more reliable Yunsheng Lin
2021-06-25  6:32   ` Michael S. Tsirkin
2021-06-25  7:21     ` Yunsheng Lin
2021-06-25  7:30       ` Michael S. Tsirkin
2021-06-25  8:33         ` Yunsheng Lin
2021-06-27  6:03           ` Michael S. Tsirkin
2021-06-28  2:17             ` Yunsheng Lin
2021-06-25  6:39   ` Michael S. Tsirkin
2021-06-25  9:20     ` Yunsheng Lin
2021-06-27  6:07       ` Michael S. Tsirkin [this message]
2021-06-28  2:11         ` [Linuxarm] " Yunsheng Lin
2021-06-25  6:42 ` [PATCH net-next v2 0/2] add benchmark selftest and optimization for ptr_ring Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210627020440-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=brouer@redhat.com \
    --cc=davem@davemloft.net \
    --cc=jasowang@redhat.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linuxarm@openeuler.org \
    --cc=linyunsheng@huawei.com \
    --cc=netdev@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=shuah@kernel.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.