From: Cristian Marussi <cristian.marussi@arm.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, sudeep.holla@arm.com,
james.quinlan@broadcom.com, Jonathan.Cameron@huawei.com,
f.fainelli@gmail.com, etienne.carriere@linaro.org,
vincent.guittot@linaro.org, souvik.chakravarty@arm.com,
peter.hilber@opensynergy.com, igor.skalkin@opensynergy.com,
virtualization@lists.linux-foundation.org
Subject: Re: [PATCH v2 3/9] [RFC] virtio_ring: Embed a wrap counter in opaque poll index value
Date: Mon, 7 Feb 2022 18:52:16 +0000 [thread overview]
Message-ID: <20220207185216.GD2535@e120937-lin> (raw)
In-Reply-To: <20220203063115-mutt-send-email-mst@kernel.org>
On Thu, Feb 03, 2022 at 06:32:29AM -0500, Michael S. Tsirkin wrote:
> On Thu, Feb 03, 2022 at 10:51:19AM +0000, Cristian Marussi wrote:
> > On Tue, Feb 01, 2022 at 01:27:38PM -0500, Michael S. Tsirkin wrote:
> > > Looks correct, thanks. Some minor comments below:
> > >
> >
> > Hi Michael,
> >
> > thanks for the feedback.
> >
Hi Michael,
[snip]
> > > > @@ -806,6 +821,8 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
> > > > ret = vq->split.desc_state[i].data;
> > > > detach_buf_split(vq, i, ctx);
> > > > vq->last_used_idx++;
> > > > + if (unlikely(!vq->last_used_idx))
> > > > + vq->wraps++;
> > > > /* If we expect an interrupt for the next entry, tell host
> > > > * by writing event index and flush out the write before
> > > > * the read in the next get_buf call. */
> > >
> > > So most drivers don't call virtqueue_poll.
> > > Concerned about the overhead here: another option is
> > > with a flag that will have to be set whenever a driver
> > > wants to use virtqueue_poll.
> >
> > Do you mean a compile time flag/Kconfig to just remove the possible
> > overhead instructions as a whole when not needed by the driver ?
> >
> > Or do you mean at runtime since checking the flag evry time should be
> > less costly than checking the wrpas each time AND counting when it
> > happens ?
>
> The later.
>
> > > Could you pls do a quick perf test e.g. using tools/virtio/
> > > to see what's faster?
> >
> > Yes I'll do, thanks for the hint, I have some compilation issues in
> > tools/virtio due to my additions (missing mirrored hehaders) or to some
> > recently added stuff (missing drv_to_virtio & friends for
> > suppressed_used_validation thing)...anyway I fixed those now and I'll
> > post related tools/virtio patches with next iteration.
> >
> > Anyway, do you mean perf data about vringh_test and virtio_test/vhost
> > right ? (ringtest/ excluded 'cause does not use any API is just
> > prototyping)
>
> can be either or both, virtio_test/vhost is a bit easier to use.
>
After a number of round tests with tools/virtio/virtio_test, below you can find
the most reliable results I had.
Using the flag as you suggested as in:
if (unlikely(vq->use_wrap_counter))
vq->wraps += !vq->last_used_idx;
seems definitely better as per the result in virtio_test_flag_V1.
The last run with virtio_test_flag_V1 --wrap-counters is the case in which
the test code request to set the use_wrap_counter flag to true at start.
Since such flag is nothig spec related, I added a new EXPORT API
virtqueue_use_wrap_counter()
to allow a driver willing to use polling to ask for wrap counting at
probe time.
Since all of this required a few build-fixes in tool/virtio both before
and after my additions, I'm going to post the proposed change in a new series
independent from this SCMI virtio series (and add later the call to
virtqueue_use_wrap_counter() to the SCMI virtio driver.
How does this sound ?
Thanks,
Cristian
---
+ cset shield --exec -- perf stat --repeat 25 -- nice -n -20 /root/virtio_test_unpatched
Performance counter stats for 'nice -n -20 /root/virtio_test_unpatched' (25 runs):
6100.81 msec task-clock # 1.002 CPUs utilized ( +- 0.16% )
19 context-switches # 3.126 /sec ( +- 2.08% )
0 cpu-migrations # 0.000 /sec
134 page-faults # 22.049 /sec ( +- 0.03% )
18249525657 cycles # 3.003 GHz ( +- 0.07% )
45583397473 instructions # 2.52 insn per cycle ( +- 0.09% )
14009712668 branches # 2.305 G/sec ( +- 0.09% )
10075872 branch-misses # 0.07% of all branches ( +- 0.83% )
6.0908 +- 0.0107 seconds time elapsed ( +- 0.18% )
+ cset shield --exec -- perf stat --repeat 25 -- nice -n -20 /root/virtio_test_wraps_noflag
Performance counter stats for 'nice -n -20 /root/virtio_test_wraps_noflag' (25 runs):
7982.99 msec task-clock # 0.996 CPUs utilized ( +- 0.14% )
16 context-switches # 1.999 /sec ( +- 2.56% )
0 cpu-migrations # 0.000 /sec
134 page-faults # 16.744 /sec ( +- 0.03% )
23691074946 cycles # 2.960 GHz ( +- 0.06% )
68176350359 instructions # 2.88 insn per cycle ( +- 0.09% )
21037768642 branches # 2.629 G/sec ( +- 0.09% )
9083084 branch-misses # 0.04% of all branches ( +- 0.74% )
8.0125 +- 0.0114 seconds time elapsed ( +- 0.14% )
+ cset shield --exec -- perf stat --repeat 25 -- nice -n -20 /root/virtio_test_flag_V1
Performance counter stats for 'nice -n -20 /root/virtio_test_flag_V1' (25 runs):
6182.21 msec task-clock # 1.007 CPUs utilized ( +- 0.25% )
19 context-switches # 3.104 /sec ( +- 1.68% )
0 cpu-migrations # 0.000 /sec
134 page-faults # 21.889 /sec ( +- 0.03% )
18142274957 cycles # 2.963 GHz ( +- 0.13% )
48973010013 instructions # 2.71 insn per cycle ( +- 0.18% )
15064825126 branches # 2.461 G/sec ( +- 0.18% )
8697800 branch-misses # 0.06% of all branches ( +- 0.89% )
6.1382 +- 0.0172 seconds time elapsed ( +- 0.28% )
+ cset shield --exec -- perf stat --repeat 25 -- nice -n -20 /root/virtio_test_flag_V1 --wrap-counters
Performance counter stats for 'nice -n -20 /root/virtio_test_flag_V1 --wrap-counters' (25 runs):
6051.58 msec task-clock # 0.984 CPUs utilized ( +- 0.22% )
21 context-switches # 3.424 /sec ( +- 1.25% )
0 cpu-migrations # 0.000 /sec
134 page-faults # 21.846 /sec ( +- 0.03% )
17928356478 cycles # 2.923 GHz ( +- 0.11% )
48147192304 instructions # 2.67 insn per cycle ( +- 0.14% )
14808798588 branches # 2.414 G/sec ( +- 0.15% )
9108899 branch-misses # 0.06% of all branches ( +- 1.22% )
6.1525 +- 0.0155 seconds time elapsed ( +- 0.25% )
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
WARNING: multiple messages have this Message-ID (diff)
From: Cristian Marussi <cristian.marussi@arm.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, sudeep.holla@arm.com,
james.quinlan@broadcom.com, Jonathan.Cameron@huawei.com,
f.fainelli@gmail.com, etienne.carriere@linaro.org,
vincent.guittot@linaro.org, souvik.chakravarty@arm.com,
peter.hilber@opensynergy.com, igor.skalkin@opensynergy.com,
virtualization@lists.linux-foundation.org
Subject: Re: [PATCH v2 3/9] [RFC] virtio_ring: Embed a wrap counter in opaque poll index value
Date: Mon, 7 Feb 2022 18:52:16 +0000 [thread overview]
Message-ID: <20220207185216.GD2535@e120937-lin> (raw)
In-Reply-To: <20220203063115-mutt-send-email-mst@kernel.org>
On Thu, Feb 03, 2022 at 06:32:29AM -0500, Michael S. Tsirkin wrote:
> On Thu, Feb 03, 2022 at 10:51:19AM +0000, Cristian Marussi wrote:
> > On Tue, Feb 01, 2022 at 01:27:38PM -0500, Michael S. Tsirkin wrote:
> > > Looks correct, thanks. Some minor comments below:
> > >
> >
> > Hi Michael,
> >
> > thanks for the feedback.
> >
Hi Michael,
[snip]
> > > > @@ -806,6 +821,8 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
> > > > ret = vq->split.desc_state[i].data;
> > > > detach_buf_split(vq, i, ctx);
> > > > vq->last_used_idx++;
> > > > + if (unlikely(!vq->last_used_idx))
> > > > + vq->wraps++;
> > > > /* If we expect an interrupt for the next entry, tell host
> > > > * by writing event index and flush out the write before
> > > > * the read in the next get_buf call. */
> > >
> > > So most drivers don't call virtqueue_poll.
> > > Concerned about the overhead here: another option is
> > > with a flag that will have to be set whenever a driver
> > > wants to use virtqueue_poll.
> >
> > Do you mean a compile time flag/Kconfig to just remove the possible
> > overhead instructions as a whole when not needed by the driver ?
> >
> > Or do you mean at runtime since checking the flag evry time should be
> > less costly than checking the wrpas each time AND counting when it
> > happens ?
>
> The later.
>
> > > Could you pls do a quick perf test e.g. using tools/virtio/
> > > to see what's faster?
> >
> > Yes I'll do, thanks for the hint, I have some compilation issues in
> > tools/virtio due to my additions (missing mirrored hehaders) or to some
> > recently added stuff (missing drv_to_virtio & friends for
> > suppressed_used_validation thing)...anyway I fixed those now and I'll
> > post related tools/virtio patches with next iteration.
> >
> > Anyway, do you mean perf data about vringh_test and virtio_test/vhost
> > right ? (ringtest/ excluded 'cause does not use any API is just
> > prototyping)
>
> can be either or both, virtio_test/vhost is a bit easier to use.
>
After a number of round tests with tools/virtio/virtio_test, below you can find
the most reliable results I had.
Using the flag as you suggested as in:
if (unlikely(vq->use_wrap_counter))
vq->wraps += !vq->last_used_idx;
seems definitely better as per the result in virtio_test_flag_V1.
The last run with virtio_test_flag_V1 --wrap-counters is the case in which
the test code request to set the use_wrap_counter flag to true at start.
Since such flag is nothig spec related, I added a new EXPORT API
virtqueue_use_wrap_counter()
to allow a driver willing to use polling to ask for wrap counting at
probe time.
Since all of this required a few build-fixes in tool/virtio both before
and after my additions, I'm going to post the proposed change in a new series
independent from this SCMI virtio series (and add later the call to
virtqueue_use_wrap_counter() to the SCMI virtio driver.
How does this sound ?
Thanks,
Cristian
---
+ cset shield --exec -- perf stat --repeat 25 -- nice -n -20 /root/virtio_test_unpatched
Performance counter stats for 'nice -n -20 /root/virtio_test_unpatched' (25 runs):
6100.81 msec task-clock # 1.002 CPUs utilized ( +- 0.16% )
19 context-switches # 3.126 /sec ( +- 2.08% )
0 cpu-migrations # 0.000 /sec
134 page-faults # 22.049 /sec ( +- 0.03% )
18249525657 cycles # 3.003 GHz ( +- 0.07% )
45583397473 instructions # 2.52 insn per cycle ( +- 0.09% )
14009712668 branches # 2.305 G/sec ( +- 0.09% )
10075872 branch-misses # 0.07% of all branches ( +- 0.83% )
6.0908 +- 0.0107 seconds time elapsed ( +- 0.18% )
+ cset shield --exec -- perf stat --repeat 25 -- nice -n -20 /root/virtio_test_wraps_noflag
Performance counter stats for 'nice -n -20 /root/virtio_test_wraps_noflag' (25 runs):
7982.99 msec task-clock # 0.996 CPUs utilized ( +- 0.14% )
16 context-switches # 1.999 /sec ( +- 2.56% )
0 cpu-migrations # 0.000 /sec
134 page-faults # 16.744 /sec ( +- 0.03% )
23691074946 cycles # 2.960 GHz ( +- 0.06% )
68176350359 instructions # 2.88 insn per cycle ( +- 0.09% )
21037768642 branches # 2.629 G/sec ( +- 0.09% )
9083084 branch-misses # 0.04% of all branches ( +- 0.74% )
8.0125 +- 0.0114 seconds time elapsed ( +- 0.14% )
+ cset shield --exec -- perf stat --repeat 25 -- nice -n -20 /root/virtio_test_flag_V1
Performance counter stats for 'nice -n -20 /root/virtio_test_flag_V1' (25 runs):
6182.21 msec task-clock # 1.007 CPUs utilized ( +- 0.25% )
19 context-switches # 3.104 /sec ( +- 1.68% )
0 cpu-migrations # 0.000 /sec
134 page-faults # 21.889 /sec ( +- 0.03% )
18142274957 cycles # 2.963 GHz ( +- 0.13% )
48973010013 instructions # 2.71 insn per cycle ( +- 0.18% )
15064825126 branches # 2.461 G/sec ( +- 0.18% )
8697800 branch-misses # 0.06% of all branches ( +- 0.89% )
6.1382 +- 0.0172 seconds time elapsed ( +- 0.28% )
+ cset shield --exec -- perf stat --repeat 25 -- nice -n -20 /root/virtio_test_flag_V1 --wrap-counters
Performance counter stats for 'nice -n -20 /root/virtio_test_flag_V1 --wrap-counters' (25 runs):
6051.58 msec task-clock # 0.984 CPUs utilized ( +- 0.22% )
21 context-switches # 3.424 /sec ( +- 1.25% )
0 cpu-migrations # 0.000 /sec
134 page-faults # 21.846 /sec ( +- 0.03% )
17928356478 cycles # 2.923 GHz ( +- 0.11% )
48147192304 instructions # 2.67 insn per cycle ( +- 0.14% )
14808798588 branches # 2.414 G/sec ( +- 0.15% )
9108899 branch-misses # 0.06% of all branches ( +- 1.22% )
6.1525 +- 0.0155 seconds time elapsed ( +- 0.25% )
next prev parent reply other threads:[~2022-02-07 18:53 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-01 17:15 [PATCH 0/9] Add SCMI Virtio & Clock atomic support Cristian Marussi
2022-02-01 17:15 ` Cristian Marussi
2022-02-01 17:15 ` [PATCH v2 1/9] firmware: arm_scmi: Add a virtio channel refcount Cristian Marussi
2022-02-01 17:15 ` Cristian Marussi
2022-02-01 22:05 ` Michael S. Tsirkin
2022-02-01 22:05 ` Michael S. Tsirkin
2022-02-03 10:08 ` Cristian Marussi
2022-02-03 10:08 ` Cristian Marussi
2022-02-01 17:15 ` [PATCH v2 2/9] firmware: arm_scmi: Review virtio free_list handling Cristian Marussi
2022-02-01 17:15 ` Cristian Marussi
2022-02-01 17:15 ` [PATCH v2 3/9] [RFC] virtio_ring: Embed a wrap counter in opaque poll index value Cristian Marussi
2022-02-01 17:15 ` Cristian Marussi
2022-02-01 18:27 ` Michael S. Tsirkin
2022-02-01 18:27 ` Michael S. Tsirkin
2022-02-01 18:27 ` Michael S. Tsirkin
2022-02-03 10:51 ` Cristian Marussi
2022-02-03 10:51 ` Cristian Marussi
2022-02-03 11:32 ` Michael S. Tsirkin
2022-02-03 11:32 ` Michael S. Tsirkin
2022-02-03 11:32 ` Michael S. Tsirkin
2022-02-07 18:52 ` Cristian Marussi [this message]
2022-02-07 18:52 ` Cristian Marussi
2022-02-01 17:15 ` [PATCH v2 4/9] firmware: arm_scmi: Add atomic mode support to virtio transport Cristian Marussi
2022-02-01 17:15 ` Cristian Marussi
2022-02-01 17:15 ` [PATCH v2 5/9] dt-bindings: firmware: arm, scmi: Add atomic_threshold optional property Cristian Marussi
2022-02-01 17:15 ` [PATCH v2 5/9] dt-bindings: firmware: arm,scmi: " Cristian Marussi
2022-02-01 17:15 ` [PATCH v2 6/9] firmware: arm_scmi: Support optional system wide atomic_threshold Cristian Marussi
2022-02-01 17:15 ` Cristian Marussi
2022-02-01 17:15 ` [PATCH v2 7/9] firmware: arm_scmi: Add atomic support to clock protocol Cristian Marussi
2022-02-01 17:15 ` Cristian Marussi
2022-02-01 17:16 ` [PATCH v2 8/9] [RFC] firmware: arm_scmi: Add support for clock_enable_latency Cristian Marussi
2022-02-01 17:16 ` Cristian Marussi
2022-02-01 17:16 ` [PATCH v2 9/9] [RFC] clk: scmi: Support atomic clock enable/disable API Cristian Marussi
2022-02-01 17:16 ` Cristian Marussi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220207185216.GD2535@e120937-lin \
--to=cristian.marussi@arm.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=etienne.carriere@linaro.org \
--cc=f.fainelli@gmail.com \
--cc=igor.skalkin@opensynergy.com \
--cc=james.quinlan@broadcom.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mst@redhat.com \
--cc=peter.hilber@opensynergy.com \
--cc=souvik.chakravarty@arm.com \
--cc=sudeep.holla@arm.com \
--cc=vincent.guittot@linaro.org \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.