dev.dpdk.org archive mirror
 help / color / mirror / Atom feed
* outw() in virtio_ring_doorbell() in DPDK+virtio consume 40% of the CPU in oprofile
@ 2013-12-13 22:04 James Yu
       [not found] ` <CAFMB=kCmVfXNJdCBoH_g51_M0QxaQU6tevu-qDmNW-okhR_rRw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: James Yu @ 2013-12-13 22:04 UTC (permalink / raw)
  To: dev-VfR2kkLFssw

Resending it due to missing [dpdk-dev] in the subject line.

I am using Spirent to send a 2Gbps traffic to a 10G port that are looped
back by l2fwd+DPDK+virtio in a CentOS 32-bit and receive on the other port
only at 700 Mbps.   The CentOS 32-bit is on a Fedora 18 KVM host. The
virtual interfaces are configured as virtio port type, not e1000. vhost-net
was automatically used in qemu-kvm when virtio ports are used in the guest.

The questions are
A. Why it can only reach 2Gbps
B. Why outw() is using 40% of the entire measurement when it only try to
write 2 bytes to the IO port using assembly outw command ? Is it a blocking
call ? or it wastes time is mapping from the IO address of the guest to the
physical address of the IO port on the host ?
C. any way to improve it ?
D. vmxnet PMD codes are using memory mapped IO address, not port IO
address. Will it be faster to use memory mapped IO address ?

Any pointers or feedback will help.
Thanks

James

---
While the traffic is on, I run a oprofile and oreport using the following
scripts on a seperate xterm window.
1. ./oprofile_start.sh
2. wait for 10 seconds
3. ./oprofile_stop.sh
::::::::::::::
oprofile_start.sh
::::::::::::::
#!/bin/bash
opcontrol --reset
opcontrol --deinit
modprobe oprofile timer=1
opcontrol --no-vmlinux --separate=cpu,thread --callgraph=10
--separate=kernel
opcontrol --session-dir=/root
opcontrol --start

::::::::::::::
oprofile_stop.sh
::::::::::::::
opcontrol --dump
opcontrol --stop
opcontrol --shutdown
opreport --session-dir=/root --details --merge tgid --symbols
/root/dpdk/dpdk-1.3.1r2/examples/l2fwd/build/l2fwd

Profiling through timer interrupt
vma      samples  %        image name               symbol name
00000d36 5445     40.1105  librte_pmd_virtio.so     outw
  00000d54 5442     99.9449
  00000d55 3         0.0551
00003032 3513     25.8785  librte_pmd_virtio.so     virtio_recv_buf

---
static void outw_jyu1(unsigned short int value, unsigned short int __port){
  __asm__ __volatile__ ("outw %w0,%w1": :"a" (value), "Nd" (__port));
}
---
This link
http://www.cs.nthu.edu.tw/~ychung/slides/Virtualization/VM-Lecture-2-3-IO%20Virtualization.pptx<http://www.cs.nthu.edu.tw/%7Eychung/slides/Virtualization/VM-Lecture-2-3-IO%20Virtualization.pptx>(page
17 – 22) described about the how IO ports can be accessed.

^ permalink raw reply	[flat|nested] 6+ messages in thread
* outw() in virtio_ring_doorbell() in DPDK+virtio consume 40% of the CPU in oprofile
@ 2013-12-13 22:02 James Yu
  0 siblings, 0 replies; 6+ messages in thread
From: James Yu @ 2013-12-13 22:02 UTC (permalink / raw)
  To: dev-VfR2kkLFssw

I am using Spirent to send a 2Gbps traffic to a 10G port that are looped
back by l2fwd+DPDK+virtio in a CentOS 32-bit and receive on the other port
only at 700 Mbps.   The CentOS 32-bit is on a Fedora 18 KVM host. The
virtual interfaces are configured as virtio port type, not e1000. vhost-net
was automatically used in qemu-kvm when virtio ports are used in the guest.

The questions are
A. Why it can only reach 2Gbps
B. Why outw() is using 40% of the entire measurement when it only try to
write 2 bytes to the IO port using assembly outw command ? Is it a blocking
call ? or it wastes time is mapping from the IO address of the guest to the
physical address of the IO port on the host ?
C. any way to improve it ?
D. vmxnet PMD codes are using memory mapped IO address, not port IO
address. Will it be faster to use memory mapped IO address ?

Any pointers or feedback will help.
Thanks

James

---
While the traffic is on, I run a oprofile and oreport using the following
scripts on a seperate xterm window.
1. ./oprofile_start.sh
2. wait for 10 seconds
3. ./oprofile_stop.sh
::::::::::::::
oprofile_start.sh
::::::::::::::
#!/bin/bash
opcontrol --reset
opcontrol --deinit
modprobe oprofile timer=1
opcontrol --no-vmlinux --separate=cpu,thread --callgraph=10
--separate=kernel
opcontrol --session-dir=/root
opcontrol --start

::::::::::::::
oprofile_stop.sh
::::::::::::::
opcontrol --dump
opcontrol --stop
opcontrol --shutdown
opreport --session-dir=/root --details --merge tgid --symbols /root/dpdk/
dpdk-1.3.1r2/examples/l2fwd/build/l2fwd

Profiling through timer interrupt
vma      samples  %        image name               symbol name
00000d36 5445     40.1105  librte_pmd_virtio.so     outw
  00000d54 5442     99.9449
  00000d55 3         0.0551
00003032 3513     25.8785  librte_pmd_virtio.so     virtio_recv_buf

---
static void outw_jyu1(unsigned short int value, unsigned short int __port){
  __asm__ __volatile__ ("outw %w0,%w1": :"a" (value), "Nd" (__port));
}
---
This link
http://www.cs.nthu.edu.tw/~ychung/slides/Virtualization/VM-Lecture-2-3-IO%20Virtualization.pptx<http://www.cs.nthu.edu.tw/%7Eychung/slides/Virtualization/VM-Lecture-2-3-IO%20Virtualization.pptx>(page
17 – 22) described about the how IO ports can be accessed.

^ permalink raw reply	[flat|nested] 6+ messages in thread
* outw() in virtio_ring_doorbell() in DPDK+virtio consume 40% of the CPU in oprofile
@ 2013-12-13 21:59 James Yu
  0 siblings, 0 replies; 6+ messages in thread
From: James Yu @ 2013-12-13 21:59 UTC (permalink / raw)
  To: dev-VfR2kkLFssw

I am using Spirent to send a 2Gbps traffic to a 10G port that are looped
back by l2fwd+DPDK+virtio in a CentOS 32-bit and receive on the other port
only at 700 Mbps.   The CentOS 32-bit is on a Fedora 18 KVM host. The
virtual interfaces are configured as virtio port type, not e1000. vhost-net
was automatically used in qemu-kvm when virtio ports are used in the guest.

The questions are
A. Why it can only reach 2Gbps
B. Why outw() is using 40% of the entire measurement when it only try to
write 2 bytes to the IO port using assembly outw command ? Is it a blocking
call ? or it wastes time is mapping from the IO address of the guest to the
physical address of the IO port on the host ?
C. any way to improve it ?
D. vmxnet PMD codes are using memory mapped IO address, not port IO
address. Will it be faster to use memory mapped IO address ?

Any pointers or feedback will help.
Thanks

James

---
While the traffic is on, I run a oprofile and oreport using the following
scripts on a seperate xterm window.
1. ./oprofile_start.sh
2. wait for 10 seconds
3. ./oprofile_stop.sh
::::::::::::::
oprofile_start.sh
::::::::::::::
#!/bin/bash
opcontrol --reset
opcontrol --deinit
modprobe oprofile timer=1
opcontrol --no-vmlinux --separate=cpu,thread --callgraph=10
--separate=kernel
opcontrol --session-dir=/root
opcontrol --start

::::::::::::::
oprofile_stop.sh
::::::::::::::
opcontrol --dump
opcontrol --stop
opcontrol --shutdown
opreport --session-dir=/root --details --merge tgid --symbols
/root/dpdk/dpdk-1.3.1r2/examples/l2fwd/build/l2fwd

Profiling through timer interrupt
vma      samples  %        image name               symbol name
00000d36 5445     40.1105  librte_pmd_virtio.so     outw
  00000d54 5442     99.9449
  00000d55 3         0.0551
00003032 3513     25.8785  librte_pmd_virtio.so     virtio_recv_buf

---
static void outw_jyu1(unsigned short int value, unsigned short int __port){
  __asm__ __volatile__ ("outw %w0,%w1": :"a" (value), "Nd" (__port));
}
---
This link
http://www.cs.nthu.edu.tw/~ychung/slides/Virtualization/VM-Lecture-2-3-IO%20Virtualization.pptx(page
17 – 22) described about the how IO ports can be accessed.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-12-16 23:58 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-13 22:04 outw() in virtio_ring_doorbell() in DPDK+virtio consume 40% of the CPU in oprofile James Yu
     [not found] ` <CAFMB=kCmVfXNJdCBoH_g51_M0QxaQU6tevu-qDmNW-okhR_rRw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-12-13 23:01   ` Stephen Hemminger
     [not found]     ` <20131213150121.3d70a0d2-We1ePj4FEcvRI77zikRAJc56i+j3xesD0e7PPNI6Mm0@public.gmane.org>
2013-12-16 23:35       ` James Yu
     [not found]         ` <CAFMB=kBTzbYWvEG9qsdhU7u2Jzh_wZid4vcrenK_XX8A-eqckA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-12-16 23:58           ` Stephen Hemminger
  -- strict thread matches above, loose matches on Subject: below --
2013-12-13 22:02 James Yu
2013-12-13 21:59 James Yu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).