From: "Michael S. Tsirkin" <mst@redhat.com>
To: Krishna Kumar2 <krkumar2@in.ibm.com>
Cc: anthony@codemonkey.ws, arnd@arndb.de, avi@redhat.com,
davem@davemloft.net, kvm@vger.kernel.org, netdev@vger.kernel.org,
rusty@rustcorp.com.au
Subject: Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net
Date: Tue, 5 Oct 2010 20:23:23 +0200 [thread overview]
Message-ID: <20101005182323.GA25852@redhat.com> (raw)
In-Reply-To: <OF13594229.1A55A20C-ON652577B3.00393C8D-652577B3.003A54C9@in.ibm.com>
On Tue, Oct 05, 2010 at 04:10:00PM +0530, Krishna Kumar2 wrote:
> "Michael S. Tsirkin" <mst@redhat.com> wrote on 09/19/2010 06:14:43 PM:
>
> > Could you document how exactly do you measure multistream bandwidth:
> > netperf flags, etc?
>
> All results were without any netperf flags or system tuning:
> for i in $list
> do
> netperf -c -C -l 60 -H 192.168.122.1 > /tmp/netperf.$$.$i &
> done
> wait
> Another script processes the result files. It also displays the
> start time/end time of each iteration to make sure skew due to
> parallel netperfs is minimal.
>
> I changed the vhost functionality once more to try to get the
> best model, the new model being:
> 1. #numtxqs=1 -> #vhosts=1, this thread handles both RX/TX.
> 2. #numtxqs>1 -> vhost[0] handles RX and vhost[1-MAX] handles
> TX[0-n], where MAX is 4. Beyond numtxqs=4, the remaining TX
> queues are handled by vhost threads in round-robin fashion.
>
> Results from here on are with these changes, and only "tuning" is
> to set each vhost's affinity to CPUs[0-3] ("taskset -p f <vhost-pids>").
>
> > Any idea where does this come from?
> > Do you see more TX interrupts? RX interrupts? Exits?
> > Do interrupts bounce more between guest CPUs?
> > 4. Identify reasons for single netperf BW regression.
>
> After testing various combinations of #txqs, #vhosts, #netperf
> sessions, I think the drop for 1 stream is due to TX and RX for
> a flow being processed on different cpus.
Right. Can we fix it?
> I did two more tests:
> 1. Pin vhosts to same CPU:
> - BW drop is much lower for 1 stream case (- 5 to -8% range)
> - But performance is not so high for more sessions.
> 2. Changed vhost to be single threaded:
> - No degradation for 1 session, and improvement for upto
> 8, sometimes 16 streams (5-12%).
> - BW degrades after that, all the way till 128 netperf sessions.
> - But overall CPU utilization improves.
> Summary of the entire run (for 1-128 sessions):
> txq=4: BW: (-2.3) CPU: (-16.5) RCPU: (-5.3)
> txq=16: BW: (-1.9) CPU: (-24.9) RCPU: (-9.6)
>
> I don't see any reasons mentioned above. However, for higher
> number of netperf sessions, I see a big increase in retransmissions:
Hmm, ok, and do you see any errors?
> _______________________________________
> #netperf ORG NEW
> BW (#retr) BW (#retr)
> _______________________________________
> 1 70244 (0) 64102 (0)
> 4 21421 (0) 36570 (416)
> 8 21746 (0) 38604 (148)
> 16 21783 (0) 40632 (464)
> 32 22677 (0) 37163 (1053)
> 64 23648 (4) 36449 (2197)
> 128 23251 (2) 31676 (3185)
> _______________________________________
>
> Single netperf case didn't have any retransmissions so that is not
> the cause for drop. I tested ixgbe (MQ):
> ___________________________________________________________
> #netperf ixgbe ixgbe (pin intrs to cpu#0 on
> both server/client)
> BW (#retr) BW (#retr)
> ___________________________________________________________
> 1 3567 (117) 6000 (251)
> 2 4406 (477) 6298 (725)
> 4 6119 (1085) 7208 (3387)
> 8 6595 (4276) 7381 (15296)
> 16 6651 (11651) 6856 (30394)
Interesting.
You are saying we get much more retransmissions with physical nic as
well?
> ___________________________________________________________
>
> > 5. Test perf in more scenarious:
> > small packets
>
> 512 byte packets - BW drop for upto 8 (sometimes 16) netperf sessions,
> but increases with #sessions:
> _______________________________________________________________________________
> # BW1 BW2 (%) CPU1 CPU2 (%) RCPU1 RCPU2 (%)
> _______________________________________________________________________________
> 1 4043 3800 (-6.0) 50 50 (0) 86 98 (13.9)
> 2 8358 7485 (-10.4) 153 178 (16.3) 230 264 (14.7)
> 4 20664 13567 (-34.3) 448 490 (9.3) 530 624 (17.7)
> 8 25198 17590 (-30.1) 967 1021 (5.5) 1085 1257 (15.8)
> 16 23791 24057 (1.1) 1904 2220 (16.5) 2156 2578 (19.5)
> 24 23055 26378 (14.4) 2807 3378 (20.3) 3225 3901 (20.9)
> 32 22873 27116 (18.5) 3748 4525 (20.7) 4307 5239 (21.6)
> 40 22876 29106 (27.2) 4705 5717 (21.5) 5388 6591 (22.3)
> 48 23099 31352 (35.7) 5642 6986 (23.8) 6475 8085 (24.8)
> 64 22645 30563 (34.9) 7527 9027 (19.9) 8619 10656 (23.6)
> 80 22497 31922 (41.8) 9375 11390 (21.4) 10736 13485 (25.6)
> 96 22509 32718 (45.3) 11271 13710 (21.6) 12927 16269 (25.8)
> 128 22255 32397 (45.5) 15036 18093 (20.3) 17144 21608 (26.0)
> _______________________________________________________________________________
> SUM: BW: (16.7) CPU: (20.6) RCPU: (24.3)
> _______________________________________________________________________________
>
> > host -> guest
> _______________________________________________________________________________
> # BW1 BW2 (%) CPU1 CPU2 (%) RCPU1 RCPU2 (%)
> _______________________________________________________________________________
> *1 70706 90398 (27.8) 300 327 (9.0) 140 175 (25.0)
> 2 20951 21937 (4.7) 188 196 (4.2) 93 103 (10.7)
> 4 19952 25281 (26.7) 397 496 (24.9) 210 304 (44.7)
> 8 18559 24992 (34.6) 802 1010 (25.9) 439 659 (50.1)
> 16 18882 25608 (35.6) 1642 2082 (26.7) 953 1454 (52.5)
> 24 19012 26955 (41.7) 2465 3153 (27.9) 1452 2254 (55.2)
> 32 19846 26894 (35.5) 3278 4238 (29.2) 1914 3081 (60.9)
> 40 19704 27034 (37.2) 4104 5303 (29.2) 2409 3866 (60.4)
> 48 19721 26832 (36.0) 4924 6418 (30.3) 2898 4701 (62.2)
> 64 19650 26849 (36.6) 6595 8611 (30.5) 3975 6433 (61.8)
> 80 19432 26823 (38.0) 8244 10817 (31.2) 4985 8165 (63.7)
> 96 20347 27886 (37.0) 9913 13017 (31.3) 5982 9860 (64.8)
> 128 19108 27715 (45.0) 13254 17546 (32.3) 8153 13589 (66.6)
> _______________________________________________________________________________
> SUM: BW: (32.4) CPU: (30.4) RCPU: (62.6)
> _______________________________________________________________________________
> *: Sum over 7 iterations, remaining test cases are sum over 2 iterations
>
> > guest <-> external
>
> I haven't done this right now since I don't have a setup. I guess
> it would be limited by wire speed and gains may not be there. I
> will try to do this later when I get the setup.
OK but at least need to check that it does not hurt things.
> > in last case:
> > find some other way to measure host CPU utilization,
> > try multiqueue and single queue devices
> > 6. Use above to figure out what is a sane default for numtxqs
>
> A. Summary for default I/O (16K):
> #txqs=2 (#vhost=3): BW: (37.6) CPU: (69.2) RCPU: (40.8)
> #txqs=4 (#vhost=5): BW: (36.9) CPU: (60.9) RCPU: (25.2)
> #txqs=8 (#vhost=5): BW: (41.8) CPU: (50.0) RCPU: (15.2)
> #txqs=16 (#vhost=5): BW: (40.4) CPU: (49.9) RCPU: (10.0)
>
> B. Summary for 512 byte I/O:
> #txqs=2 (#vhost=3): BW: (31.6) CPU: (35.7) RCPU: (28.6)
> #txqs=4 (#vhost=5): BW: (5.7) CPU: (27.2) RCPU: (22.7)
> #txqs=8 (#vhost=5): BW: (-.6) CPU: (25.1) RCPU: (22.5)
> #txqs=16 (#vhost=5): BW: (-6.6) CPU: (24.7) RCPU: (21.7)
>
> Summary:
>
> 1. Average BW increase for regular I/O is best for #txq=16 with the
> least CPU utilization increase.
> 2. The average BW for 512 byte I/O is best for lower #txq=2. For higher
> #txqs, BW increased only after a particular #netperf sessions - in
> my testing that limit was 32 netperf sessions.
> 3. Multiple txq for guest by itself doesn't seem to have any issues.
> Guest CPU% increase is slightly higher than BW improvement. I
> think it is true for all mq drivers since more paths run in parallel
> upto the device instead of sleeping and allowing one thread to send
> all packets via qdisc_restart.
> 4. Having high number of txqs gives better gains and reduces cpu util
> on the guest and the host.
> 5. MQ is intended for server loads. MQ should probably not be explicitly
> specified for client systems.
> 6. No regression with numtxqs=1 (or if mq option is not used) in any
> testing scenario.
Of course txq=1 can be considered a kind of fix, but if we know the
issue is TX/RX flows getting bounced between CPUs, can we fix this?
Workload-specific optimizations can only get us this far.
>
> I will send the v3 patch within a day after some more testing.
>
> Thanks,
>
> - KK
next prev parent reply other threads:[~2010-10-05 18:23 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-09-17 10:03 [v2 RFC PATCH 0/4] Implement multiqueue virtio-net Krishna Kumar
2010-09-17 10:03 ` [v2 RFC PATCH 1/4] Change virtqueue structure Krishna Kumar
2010-09-17 10:03 ` [v2 RFC PATCH 2/4] Changes for virtio-net Krishna Kumar
2010-09-17 10:25 ` Eric Dumazet
2010-09-17 12:27 ` Krishna Kumar2
2010-09-17 13:20 ` Krishna Kumar2
2010-09-17 10:03 ` [v2 RFC PATCH 3/4] Changes for vhost Krishna Kumar
2010-09-17 10:03 ` [v2 RFC PATCH 4/4] qemu changes Krishna Kumar
2010-09-17 15:42 ` [v2 RFC PATCH 0/4] Implement multiqueue virtio-net Sridhar Samudrala
2010-09-19 12:44 ` Michael S. Tsirkin
2010-10-05 10:40 ` Krishna Kumar2
2010-10-05 18:23 ` Michael S. Tsirkin [this message]
2010-10-06 17:43 ` Krishna Kumar2
2010-10-06 19:03 ` Michael S. Tsirkin
2010-10-06 12:19 ` Arnd Bergmann
2010-10-06 17:14 ` Krishna Kumar2
2010-10-06 17:50 ` Arnd Bergmann
-- strict thread matches above, loose matches on Subject: below --
2010-10-06 13:34 Michael S. Tsirkin
2010-10-06 17:02 ` Krishna Kumar2
2010-10-11 7:21 ` Krishna Kumar2
2010-10-12 17:09 ` Michael S. Tsirkin
2010-10-14 7:58 ` Krishna Kumar2
2010-10-14 8:17 ` Michael S. Tsirkin
2010-10-14 9:04 ` Krishna Kumar2
[not found] ` <OFEC86A094.39835EBF-ON652577BC.002F9AAF-652577BC.003186B5@LocalDomain>
2010-10-14 12:17 ` Krishna Kumar2
[not found] ` <OF0BDA6B3A.F673A449-ON652577BC.00422911-652577BC.0043474B@LocalDomain>
2010-10-14 12:47 ` Krishna Kumar2
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101005182323.GA25852@redhat.com \
--to=mst@redhat.com \
--cc=anthony@codemonkey.ws \
--cc=arnd@arndb.de \
--cc=avi@redhat.com \
--cc=davem@davemloft.net \
--cc=krkumar2@in.ibm.com \
--cc=kvm@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=rusty@rustcorp.com.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.