All of lore.kernel.org
 help / color / mirror / Atom feed
From: Cristian Marussi <cristian.marussi@arm.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, sudeep.holla@arm.com,
	james.quinlan@broadcom.com, Jonathan.Cameron@huawei.com,
	f.fainelli@gmail.com, etienne.carriere@linaro.org,
	vincent.guittot@linaro.org, souvik.chakravarty@arm.com,
	peter.hilber@opensynergy.com, igor.skalkin@opensynergy.com,
	virtualization@lists.linux-foundation.org
Subject: Re: [PATCH v2 3/9] [RFC] virtio_ring: Embed a wrap counter in opaque poll index value
Date: Mon, 7 Feb 2022 18:52:16 +0000	[thread overview]
Message-ID: <20220207185216.GD2535@e120937-lin> (raw)
In-Reply-To: <20220203063115-mutt-send-email-mst@kernel.org>

On Thu, Feb 03, 2022 at 06:32:29AM -0500, Michael S. Tsirkin wrote:
> On Thu, Feb 03, 2022 at 10:51:19AM +0000, Cristian Marussi wrote:
> > On Tue, Feb 01, 2022 at 01:27:38PM -0500, Michael S. Tsirkin wrote:
> > > Looks correct, thanks. Some minor comments below:
> > > 
> > 
> > Hi Michael,
> > 
> > thanks for the feedback.
> > 

Hi Michael,

 
[snip]

> > > > @@ -806,6 +821,8 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
> > > >  	ret = vq->split.desc_state[i].data;
> > > >  	detach_buf_split(vq, i, ctx);
> > > >  	vq->last_used_idx++;
> > > > +	if (unlikely(!vq->last_used_idx))
> > > > +		vq->wraps++;
> > > >  	/* If we expect an interrupt for the next entry, tell host
> > > >  	 * by writing event index and flush out the write before
> > > >  	 * the read in the next get_buf call. */
> > > 
> > > So most drivers don't call virtqueue_poll.
> > > Concerned about the overhead here: another option is
> > > with a flag that will have to be set whenever a driver
> > > wants to use virtqueue_poll.
> > 
> > Do you mean a compile time flag/Kconfig to just remove the possible
> > overhead instructions as a whole when not needed by the driver ?
> > 
> > Or do you mean at runtime since checking the flag evry time should be
> > less costly than checking the wrpas each time AND counting when it
> > happens ?
> 
> The later.
> 
> > > Could you pls do a quick perf test e.g. using tools/virtio/
> > > to see what's faster?
> > 
> > Yes I'll do, thanks for the hint, I have some compilation issues in
> > tools/virtio due to my additions (missing mirrored hehaders) or to some
> > recently added stuff (missing drv_to_virtio & friends for
> > suppressed_used_validation thing)...anyway I fixed those now and I'll
> > post related tools/virtio patches with next iteration.
> > 
> > Anyway, do you mean perf data about vringh_test and virtio_test/vhost
> > right ? (ringtest/ excluded 'cause does not use any API is just
> > prototyping)
> 
> can be either or both, virtio_test/vhost is a bit easier to use.
> 

After a number of round tests with tools/virtio/virtio_test, below you can find
the most reliable results I had.

Using the flag as you suggested as in:

if (unlikely(vq->use_wrap_counter))
	vq->wraps += !vq->last_used_idx;


seems definitely better as per the result in virtio_test_flag_V1.

The last run with virtio_test_flag_V1 --wrap-counters is the case in which
the test code request to set the use_wrap_counter flag to true at start.

Since such flag is nothig spec related, I added a new EXPORT API

virtqueue_use_wrap_counter()

to allow a driver willing to use polling to ask for wrap counting at
probe time.

Since all of this required a few build-fixes in tool/virtio both before
and after my additions, I'm going to post the proposed change in a new series
independent from this SCMI virtio series (and add later the call to
virtqueue_use_wrap_counter() to the SCMI virtio driver.

How does this sound ?

Thanks,
Cristian

---
+ cset shield --exec -- perf stat --repeat 25 -- nice -n -20 /root/virtio_test_unpatched
 Performance counter stats for 'nice -n -20 /root/virtio_test_unpatched' (25 runs):

           6100.81 msec task-clock                #    1.002 CPUs utilized            ( +-  0.16% )
                19      context-switches          #    3.126 /sec                     ( +-  2.08% )
                 0      cpu-migrations            #    0.000 /sec                   
               134      page-faults               #   22.049 /sec                     ( +-  0.03% )
       18249525657      cycles                    #    3.003 GHz                      ( +-  0.07% )
       45583397473      instructions              #    2.52  insn per cycle           ( +-  0.09% )
       14009712668      branches                  #    2.305 G/sec                    ( +-  0.09% )
          10075872      branch-misses             #    0.07% of all branches          ( +-  0.83% )

            6.0908 +- 0.0107 seconds time elapsed  ( +-  0.18% )


+ cset shield --exec -- perf stat --repeat 25 -- nice -n -20 /root/virtio_test_wraps_noflag
 Performance counter stats for 'nice -n -20 /root/virtio_test_wraps_noflag' (25 runs):

           7982.99 msec task-clock                #    0.996 CPUs utilized            ( +-  0.14% )
                16      context-switches          #    1.999 /sec                     ( +-  2.56% )
                 0      cpu-migrations            #    0.000 /sec                   
               134      page-faults               #   16.744 /sec                     ( +-  0.03% )
       23691074946      cycles                    #    2.960 GHz                      ( +-  0.06% )
       68176350359      instructions              #    2.88  insn per cycle           ( +-  0.09% )
       21037768642      branches                  #    2.629 G/sec                    ( +-  0.09% )
           9083084      branch-misses             #    0.04% of all branches          ( +-  0.74% )

            8.0125 +- 0.0114 seconds time elapsed  ( +-  0.14% )


+ cset shield --exec -- perf stat --repeat 25 -- nice -n -20 /root/virtio_test_flag_V1
 Performance counter stats for 'nice -n -20 /root/virtio_test_flag_V1' (25 runs):

           6182.21 msec task-clock                #    1.007 CPUs utilized            ( +-  0.25% )
                19      context-switches          #    3.104 /sec                     ( +-  1.68% )
                 0      cpu-migrations            #    0.000 /sec                   
               134      page-faults               #   21.889 /sec                     ( +-  0.03% )
       18142274957      cycles                    #    2.963 GHz                      ( +-  0.13% )
       48973010013      instructions              #    2.71  insn per cycle           ( +-  0.18% )
       15064825126      branches                  #    2.461 G/sec                    ( +-  0.18% )
           8697800      branch-misses             #    0.06% of all branches          ( +-  0.89% )

            6.1382 +- 0.0172 seconds time elapsed  ( +-  0.28% )

+ cset shield --exec -- perf stat --repeat 25 -- nice -n -20 /root/virtio_test_flag_V1 --wrap-counters
 Performance counter stats for 'nice -n -20 /root/virtio_test_flag_V1 --wrap-counters' (25 runs):

           6051.58 msec task-clock                #    0.984 CPUs utilized            ( +-  0.22% )
                21      context-switches          #    3.424 /sec                     ( +-  1.25% )
                 0      cpu-migrations            #    0.000 /sec                   
               134      page-faults               #   21.846 /sec                     ( +-  0.03% )
       17928356478      cycles                    #    2.923 GHz                      ( +-  0.11% )
       48147192304      instructions              #    2.67  insn per cycle           ( +-  0.14% )
       14808798588      branches                  #    2.414 G/sec                    ( +-  0.15% )
           9108899      branch-misses             #    0.06% of all branches          ( +-  1.22% )

            6.1525 +- 0.0155 seconds time elapsed  ( +-  0.25% )



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

WARNING: multiple messages have this Message-ID (diff)
From: Cristian Marussi <cristian.marussi@arm.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, sudeep.holla@arm.com,
	james.quinlan@broadcom.com, Jonathan.Cameron@huawei.com,
	f.fainelli@gmail.com, etienne.carriere@linaro.org,
	vincent.guittot@linaro.org, souvik.chakravarty@arm.com,
	peter.hilber@opensynergy.com, igor.skalkin@opensynergy.com,
	virtualization@lists.linux-foundation.org
Subject: Re: [PATCH v2 3/9] [RFC] virtio_ring: Embed a wrap counter in opaque poll index value
Date: Mon, 7 Feb 2022 18:52:16 +0000	[thread overview]
Message-ID: <20220207185216.GD2535@e120937-lin> (raw)
In-Reply-To: <20220203063115-mutt-send-email-mst@kernel.org>

On Thu, Feb 03, 2022 at 06:32:29AM -0500, Michael S. Tsirkin wrote:
> On Thu, Feb 03, 2022 at 10:51:19AM +0000, Cristian Marussi wrote:
> > On Tue, Feb 01, 2022 at 01:27:38PM -0500, Michael S. Tsirkin wrote:
> > > Looks correct, thanks. Some minor comments below:
> > > 
> > 
> > Hi Michael,
> > 
> > thanks for the feedback.
> > 

Hi Michael,

 
[snip]

> > > > @@ -806,6 +821,8 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
> > > >  	ret = vq->split.desc_state[i].data;
> > > >  	detach_buf_split(vq, i, ctx);
> > > >  	vq->last_used_idx++;
> > > > +	if (unlikely(!vq->last_used_idx))
> > > > +		vq->wraps++;
> > > >  	/* If we expect an interrupt for the next entry, tell host
> > > >  	 * by writing event index and flush out the write before
> > > >  	 * the read in the next get_buf call. */
> > > 
> > > So most drivers don't call virtqueue_poll.
> > > Concerned about the overhead here: another option is
> > > with a flag that will have to be set whenever a driver
> > > wants to use virtqueue_poll.
> > 
> > Do you mean a compile time flag/Kconfig to just remove the possible
> > overhead instructions as a whole when not needed by the driver ?
> > 
> > Or do you mean at runtime since checking the flag evry time should be
> > less costly than checking the wrpas each time AND counting when it
> > happens ?
> 
> The later.
> 
> > > Could you pls do a quick perf test e.g. using tools/virtio/
> > > to see what's faster?
> > 
> > Yes I'll do, thanks for the hint, I have some compilation issues in
> > tools/virtio due to my additions (missing mirrored hehaders) or to some
> > recently added stuff (missing drv_to_virtio & friends for
> > suppressed_used_validation thing)...anyway I fixed those now and I'll
> > post related tools/virtio patches with next iteration.
> > 
> > Anyway, do you mean perf data about vringh_test and virtio_test/vhost
> > right ? (ringtest/ excluded 'cause does not use any API is just
> > prototyping)
> 
> can be either or both, virtio_test/vhost is a bit easier to use.
> 

After a number of round tests with tools/virtio/virtio_test, below you can find
the most reliable results I had.

Using the flag as you suggested as in:

if (unlikely(vq->use_wrap_counter))
	vq->wraps += !vq->last_used_idx;


seems definitely better as per the result in virtio_test_flag_V1.

The last run with virtio_test_flag_V1 --wrap-counters is the case in which
the test code request to set the use_wrap_counter flag to true at start.

Since such flag is nothig spec related, I added a new EXPORT API

virtqueue_use_wrap_counter()

to allow a driver willing to use polling to ask for wrap counting at
probe time.

Since all of this required a few build-fixes in tool/virtio both before
and after my additions, I'm going to post the proposed change in a new series
independent from this SCMI virtio series (and add later the call to
virtqueue_use_wrap_counter() to the SCMI virtio driver.

How does this sound ?

Thanks,
Cristian

---
+ cset shield --exec -- perf stat --repeat 25 -- nice -n -20 /root/virtio_test_unpatched
 Performance counter stats for 'nice -n -20 /root/virtio_test_unpatched' (25 runs):

           6100.81 msec task-clock                #    1.002 CPUs utilized            ( +-  0.16% )
                19      context-switches          #    3.126 /sec                     ( +-  2.08% )
                 0      cpu-migrations            #    0.000 /sec                   
               134      page-faults               #   22.049 /sec                     ( +-  0.03% )
       18249525657      cycles                    #    3.003 GHz                      ( +-  0.07% )
       45583397473      instructions              #    2.52  insn per cycle           ( +-  0.09% )
       14009712668      branches                  #    2.305 G/sec                    ( +-  0.09% )
          10075872      branch-misses             #    0.07% of all branches          ( +-  0.83% )

            6.0908 +- 0.0107 seconds time elapsed  ( +-  0.18% )


+ cset shield --exec -- perf stat --repeat 25 -- nice -n -20 /root/virtio_test_wraps_noflag
 Performance counter stats for 'nice -n -20 /root/virtio_test_wraps_noflag' (25 runs):

           7982.99 msec task-clock                #    0.996 CPUs utilized            ( +-  0.14% )
                16      context-switches          #    1.999 /sec                     ( +-  2.56% )
                 0      cpu-migrations            #    0.000 /sec                   
               134      page-faults               #   16.744 /sec                     ( +-  0.03% )
       23691074946      cycles                    #    2.960 GHz                      ( +-  0.06% )
       68176350359      instructions              #    2.88  insn per cycle           ( +-  0.09% )
       21037768642      branches                  #    2.629 G/sec                    ( +-  0.09% )
           9083084      branch-misses             #    0.04% of all branches          ( +-  0.74% )

            8.0125 +- 0.0114 seconds time elapsed  ( +-  0.14% )


+ cset shield --exec -- perf stat --repeat 25 -- nice -n -20 /root/virtio_test_flag_V1
 Performance counter stats for 'nice -n -20 /root/virtio_test_flag_V1' (25 runs):

           6182.21 msec task-clock                #    1.007 CPUs utilized            ( +-  0.25% )
                19      context-switches          #    3.104 /sec                     ( +-  1.68% )
                 0      cpu-migrations            #    0.000 /sec                   
               134      page-faults               #   21.889 /sec                     ( +-  0.03% )
       18142274957      cycles                    #    2.963 GHz                      ( +-  0.13% )
       48973010013      instructions              #    2.71  insn per cycle           ( +-  0.18% )
       15064825126      branches                  #    2.461 G/sec                    ( +-  0.18% )
           8697800      branch-misses             #    0.06% of all branches          ( +-  0.89% )

            6.1382 +- 0.0172 seconds time elapsed  ( +-  0.28% )

+ cset shield --exec -- perf stat --repeat 25 -- nice -n -20 /root/virtio_test_flag_V1 --wrap-counters
 Performance counter stats for 'nice -n -20 /root/virtio_test_flag_V1 --wrap-counters' (25 runs):

           6051.58 msec task-clock                #    0.984 CPUs utilized            ( +-  0.22% )
                21      context-switches          #    3.424 /sec                     ( +-  1.25% )
                 0      cpu-migrations            #    0.000 /sec                   
               134      page-faults               #   21.846 /sec                     ( +-  0.03% )
       17928356478      cycles                    #    2.923 GHz                      ( +-  0.11% )
       48147192304      instructions              #    2.67  insn per cycle           ( +-  0.14% )
       14808798588      branches                  #    2.414 G/sec                    ( +-  0.15% )
           9108899      branch-misses             #    0.06% of all branches          ( +-  1.22% )

            6.1525 +- 0.0155 seconds time elapsed  ( +-  0.25% )



  reply	other threads:[~2022-02-07 18:53 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-01 17:15 [PATCH 0/9] Add SCMI Virtio & Clock atomic support Cristian Marussi
2022-02-01 17:15 ` Cristian Marussi
2022-02-01 17:15 ` [PATCH v2 1/9] firmware: arm_scmi: Add a virtio channel refcount Cristian Marussi
2022-02-01 17:15   ` Cristian Marussi
2022-02-01 22:05   ` Michael S. Tsirkin
2022-02-01 22:05     ` Michael S. Tsirkin
2022-02-03 10:08     ` Cristian Marussi
2022-02-03 10:08       ` Cristian Marussi
2022-02-01 17:15 ` [PATCH v2 2/9] firmware: arm_scmi: Review virtio free_list handling Cristian Marussi
2022-02-01 17:15   ` Cristian Marussi
2022-02-01 17:15 ` [PATCH v2 3/9] [RFC] virtio_ring: Embed a wrap counter in opaque poll index value Cristian Marussi
2022-02-01 17:15   ` Cristian Marussi
2022-02-01 18:27   ` Michael S. Tsirkin
2022-02-01 18:27     ` Michael S. Tsirkin
2022-02-01 18:27     ` Michael S. Tsirkin
2022-02-03 10:51     ` Cristian Marussi
2022-02-03 10:51       ` Cristian Marussi
2022-02-03 11:32       ` Michael S. Tsirkin
2022-02-03 11:32         ` Michael S. Tsirkin
2022-02-03 11:32         ` Michael S. Tsirkin
2022-02-07 18:52         ` Cristian Marussi [this message]
2022-02-07 18:52           ` Cristian Marussi
2022-02-01 17:15 ` [PATCH v2 4/9] firmware: arm_scmi: Add atomic mode support to virtio transport Cristian Marussi
2022-02-01 17:15   ` Cristian Marussi
2022-02-01 17:15 ` [PATCH v2 5/9] dt-bindings: firmware: arm, scmi: Add atomic_threshold optional property Cristian Marussi
2022-02-01 17:15   ` [PATCH v2 5/9] dt-bindings: firmware: arm,scmi: " Cristian Marussi
2022-02-01 17:15 ` [PATCH v2 6/9] firmware: arm_scmi: Support optional system wide atomic_threshold Cristian Marussi
2022-02-01 17:15   ` Cristian Marussi
2022-02-01 17:15 ` [PATCH v2 7/9] firmware: arm_scmi: Add atomic support to clock protocol Cristian Marussi
2022-02-01 17:15   ` Cristian Marussi
2022-02-01 17:16 ` [PATCH v2 8/9] [RFC] firmware: arm_scmi: Add support for clock_enable_latency Cristian Marussi
2022-02-01 17:16   ` Cristian Marussi
2022-02-01 17:16 ` [PATCH v2 9/9] [RFC] clk: scmi: Support atomic clock enable/disable API Cristian Marussi
2022-02-01 17:16   ` Cristian Marussi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220207185216.GD2535@e120937-lin \
    --to=cristian.marussi@arm.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=etienne.carriere@linaro.org \
    --cc=f.fainelli@gmail.com \
    --cc=igor.skalkin@opensynergy.com \
    --cc=james.quinlan@broadcom.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=peter.hilber@opensynergy.com \
    --cc=souvik.chakravarty@arm.com \
    --cc=sudeep.holla@arm.com \
    --cc=vincent.guittot@linaro.org \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.