DPDK & QPI performance issue in Romley platform.

All of lore.kernel.org
 help / color / mirror / Atom feed

* DPDK & QPI performance issue in Romley platform.
@ 2013-09-02  3:22 Zachary
       [not found] ` <52240466.7050907-hquedaq+nxtWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Zachary @ 2013-09-02  3:22 UTC (permalink / raw)
  To: dev-VfR2kkLFssw
  Cc: "Yannic.Chou (周哲正) : 6808",
	"Alan Yu (俞亦偉) : 6632"

[-- Attachment #1: Type: text/plain, Size: 1858 bytes --]

Hi~

I have a question about DPDK & QPI performance issue in Romley  platform.
Recently, I use DPDK example, l2fwd, to test DPDK's performance in my Romley platform.
When I try to do the test, crossing used CPU, I find the performance dramatically decrease.
Is it true? Or any method can prove the phenomenon?

In my opinion, there should be no this kind of issue here due to QPI have enough bandwidth to deal the kinds of case.
Thus, I am so amaze in our results and can not explain it.
Could someone can help me to solve this problem.

Thank a lot!

My testing environment describe as below:

Platform:         Romley
CPU:                E5-2643 * 2
RAM:               Transcend 8GB PC3-1600 DDR3 * 8
OS:                 Fedora core 14
DPDK:            v1.3.1r2, example/l2fwd
Slot setting:
                      SlotA is controled by CPU1 directly.
                      SlotB is controled by CPU0 directly.

DPDK pre-setting:
a. BIOS setting:
    HT=disable
b. Kernel paramaters
    isolcpus=2,3,6,7
    default_hugepagesz=1024M
    hugepagesz=1024M
    hugepages=16
c. OS setting:
    service avahi-daemon stop
    service NetworkManager stop
    service iptables stop
    service acpid stop
    selinux disable

Example program Command:
a. SlotB(CPU0) -> CPU1
    #>./l2fwd -c 0xc -n 4 -- -q 1 -p 0xc

b. SlotA(CPU1) -> CPU0
    #>./l2fwd -c 0xc0 -n 4 -- -q 1 -p 0xc0

Results:
     use frame size 128 bytes
CPU Affinity

Slot A (CPU1)

Slot B (CPU0)

CPU0

15.9%

96.49%

CPU1

90.88%

24.78%

本信件可能包含瑞祺電通機密資訊，非指定之收件者，請勿使用或揭露本信件內容，並請銷毀此信件。 This email may contain confidential information. Please do not use or disclose it in any way and delete it if you are not the intended recipient.

[-- Attachment #2: Type: text/html, Size: 7104 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

[parent not found: <52240466.7050907-hquedaq+nxtWk0Htik3J/w@public.gmane.org>]

* Re: DPDK & QPI performance issue in Romley platform.
       [not found] ` <52240466.7050907-hquedaq+nxtWk0Htik3J/w@public.gmane.org>
@ 2013-09-02 16:10   ` Stephen Hemminger
  2013-09-03 16:19   ` =?gb18030?b?u9i4tKO6IERQREsgJiBRUEkgcGVyZm9ybWFuY2Ug?= =?gb18030?q?issue_in_Romley_platform=2E?= =?gb18030?B?Qm9iIENoZW4=?=
  1 sibling, 0 replies; 4+ messages in thread
From: Stephen Hemminger @ 2013-09-02 16:10 UTC (permalink / raw)
  To: Zachary
  Cc: dev-VfR2kkLFssw, Yannic.Chou (周哲正) : 6808,
	Alan Yu (俞亦偉) : 6632

On Mon, 2 Sep 2013 11:22:14 +0800
Zachary <zachary.jen-hquedaq+nxtWk0Htik3J/w@public.gmane.org> wrote:

> Hi~
> 
> I have a question about DPDK & QPI performance issue in Romley  platform.
> Recently, I use DPDK example, l2fwd, to test DPDK's performance in my Romley platform.
> When I try to do the test, crossing used CPU, I find the performance dramatically decrease.
> Is it true? Or any method can prove the phenomenon?
> 
> In my opinion, there should be no this kind of issue here due to QPI have enough bandwidth to deal the kinds of case.
> Thus, I am so amaze in our results and can not explain it.
> Could someone can help me to solve this problem.
> 
> Thank a lot!

Many DPDK API's have NUMA socket as one of the parameters. In order to get good
performance it is up to the application to be NUMA aware and use socket local
resources.

One example we do is to have a packet mbuf pool per socket, and assign each
device to the correct pool. Also, you may want to choose which lcore's to assign
to which function based on socket locality. For example threads that are polling
receiver should be on same socket as that NIC.

Remember the example applications are demo toys, and don't do all the things a real
application would need to do.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* =?gb18030?b?u9i4tKO6IERQREsgJiBRUEkgcGVyZm9ybWFuY2Ug?= =?gb18030?q?issue_in_Romley_platform=2E?=
       [not found] ` <52240466.7050907-hquedaq+nxtWk0Htik3J/w@public.gmane.org>
  2013-09-02 16:10   ` Stephen Hemminger
@ 2013-09-03 16:19   ` =?gb18030?B?Qm9iIENoZW4=?=
  1 sibling, 0 replies; 4+ messages in thread
From: =?gb18030?B?Qm9iIENoZW4=?= @ 2013-09-03 16:19 UTC (permalink / raw)
  To: =?gb18030?B?WmFjaGFyeQ==?=, =?gb18030?B?ZGV2?=

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="gb18030", Size: 2592 bytes --]

QPI bandwidth is definitely large enough, but it seems that QPI is only responsible for the communication between separate CPU chips. What you need to do is actually accessing the memory on the other part, probably not even hit the bandwidth. The latency can be caused by a lot of facts during a NUMA operation.

/Bob

------------------ ÔÊ¼ÓÊ¼þ ------------------
·¢¼þÈË: "Zachary";<zachary.jen@cas-well.com>;
·¢ËÍÊ±¼ä: 2013Äê9ÔÂ2ÈÕ(ÐÇÆÚÒ») ÖÐÎç11:22
ÊÕ¼þÈË: "dev"<dev@dpdk.org>; 
³ËÍ: " "Yannic.Chou (ÖÜÕÜÕý) : 6808" <yannic.chou@cas-well.com>; "Alan Yu ÓáÒà‚¥ : 6632""<Alan.Yu@cas-well.com>; 
Ö÷Ìâ: [dpdk-dev] DPDK & QPI performance issue in Romley platform.

 Hi~

 I have a question about DPDK & QPI performance issue in Romley  platform.
 Recently, I use DPDK example, l2fwd, to test DPDK's performance in my Romley platform.
 When I try to do the test, crossing used CPU, I find the performance dramatically decrease.
 Is it true? Or any method can prove the phenomenon?

 In my opinion, there should be no this kind of issue here due to QPI have enough bandwidth to deal the kinds of case.
 Thus, I am so amaze in our results and can not explain it.
 Could someone can help me to solve this problem.

 Thank a lot!

 My testing environment describe as below:

 Platform:         Romley
 CPU:                E5-2643 * 2
 RAM:               Transcend  8GB PC3-1600 DDR3 * 8
 OS:                 Fedora core 14
 DPDK:            v1.3.1r2, example/l2fwd
 Slot setting:
                       SlotA is controled by CPU1 directly.
                       SlotB is controled by CPU0 directly.

 DPDK pre-setting:
 a. BIOS setting:
     HT=disable
 b. Kernel paramaters 
     isolcpus=2,3,6,7
     default_hugepagesz=1024M
     hugepagesz=1024M
     hugepages=16
 c. OS setting:
     service avahi-daemon stop
     service NetworkManager stop
     service iptables stop
     service acpid stop
     selinux disable

 Example program Command:
 a. SlotB(CPU0) -> CPU1
     #>./l2fwd -c 0xc -n 4 -- -q 1 -p 0xc

 b. SlotA(CPU1) -> CPU0
     #>./l2fwd -c 0xc0 -n 4 -- -q 1 -p 0xc0 

 Results:
      use frame size 128 bytes

CPU Affinity

Slot A (CPU1)

Slot B (CPU0)

CPU0

15.9%

96.49%

CPU1

90.88%

24.78%

 ±¾ÐÅ¼þ¿ÉÄÜ°üº¬Èðì÷ëŠÍ¨™CÃÜÙYÓ£¬·ÇÖ¸¶¨Ö®ÊÕ¼þÕß£¬ÕˆÎðÊ¹ÓÃ»ò½ÒÂ¶±¾ÐÅ¼þƒÈÈÝ£¬KÕˆäNš§´ËÐÅ¼þ¡£ This email may contain confidential information. Please do not use or disclose it in any way and delete it if you are not the intended recipient.

[-- Attachment #2: Type: text/html, Size: 7920 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

[parent not found: <52289F01.7010503@cas-well.com>]

[parent not found: <52289F01.7010503-hquedaq+nxtWk0Htik3J/w@public.gmane.org>]

* Re: DPDK & QPI performance issue in Romley platform.
       [not found] ` <52289F01.7010503-hquedaq+nxtWk0Htik3J/w@public.gmane.org>
@ 2013-09-06  7:31   ` Zachary
  0 siblings, 0 replies; 4+ messages in thread
From: Zachary @ 2013-09-06  7:31 UTC (permalink / raw)
  To: dev-VfR2kkLFssw

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="GB18030"; format=flowed, Size: 3788 bytes --]

Hi~ Bob,
Thanks for your response!
So, you think it is memory usage problem rather than QPI issue?
That means if I improve the memory usage issue, may the preformance will raise to my expected?

BTW, Have anyone every use DPDK in NUMA and use crossing CPU like my case?
If yes, could you tell me how to solve the question?
If no, I would to know the DPDK allow user to this kinds of case in their app or not?
If the answer is true, I need change a way to use DPDK.

above, it has lots of questions. I hope someone can help me to answer the questions.

On 09/04/2013 12:19 AM, Bob Chen wrote:
QPI bandwidth is definitely large enough, but it seems that QPI is only responsible for the communication between separate CPU chips. What you need to do is actually accessing the memory on the other part, probably not even hit the bandwidth. The latency can be caused by a lot of facts during a NUMA operation.
/Bob

------------------ ÔÊ¼ÓÊ ¼þ ------------------
·¢¼þÈË: "Zachary";<zachary.jen-hquedaq+nxtWk0Htik3J/w@public.gmane.org><mailto:zachary.jen@cas-well.com>;
·¢ËÍÊ±¼ä: 2013Äê9ÔÂ2ÈÕ(ÐÇÆÚÒ») ÖÐÎç11:22
ÊÕ¼þÈË: "dev"<dev-VfR2kkLFssw@public.gmane.org><mailto:dev-VfR2kkLFssw@public.gmane.org>;
³ËÍ: " "Yannic.Chou (ÖÜÕÜÕý) : 6808" <yannic.chou@cas-well.com><mailto:yannic.chou-hquedaq+nxtWk0Htik3J/w@public.gmane.org>; "Alan Yu ÓáÒà‚¥ : 6632""<Alan.Yu-hquedaq+nxtWk0Htik3J/w@public.gmane.org><mailto:Alan.Yu-hquedaq+nxtWk0Htik3J/w@public.gmane.org>;
Ö÷Ìâ: [dpdk-dev] DPDK & QPI performance issue in Romley platform.

Hi~

I have a question about DPDK & QPI performance issue in Romley  platform.
Recently, I use DPDK example, l2fwd, to test DPDK's performance in my Romley platform.
When I try to do the test, crossing used CPU, I find the performance dramatically decrease.
Is it true? Or any method can prove the phenomenon?

In my opinion, there should be no this kind of issue here due to QPI have enough bandwidth to deal the kinds of case.
Thus, I am so amaze in our results and can not explain it.
Could someone can help me to solve this problem.

Thank a lot!

My testing environment describe as below:

Platform:         Romley
CPU:                E5-2643 * 2
RAM:               Transcend 8GB PC3-1600 DDR3 * 8
OS:                 Fedora core 14
DPDK:            v1.3.1r2, example/l2fwd
Slot setting:
                      SlotA is controled by CPU1 directly.
                      SlotB is controled by CPU0 directly.

DPDK pre-setting:
a. BIOS setting:
    HT=disable
b. Kernel paramaters
    isolcpus=2,3,6,7
    default_hugepagesz=1024M
    hugepagesz=1024M
    hugepages=16
c. OS setting:
    service avahi-daemon stop
    service NetworkManager stop
    service iptables stop
    service acpid stop
    selinux disable

Example program Command:
a. SlotB(CPU0) -> CPU1
    #>./l2fwd -c 0xc -n 4 -- -q 1 -p 0xc

b. SlotA(CPU1) -> CPU0
    #>./l2fwd -c 0xc0 -n 4 -- -q 1 -p 0xc0

Results:
     use frame size 128 bytes
CPU Affinity

Slot A (CPU1)

Slot B (CPU0)

CPU0

15.9%

96.49%

CPU1

90.88%

24.78%

±¾ÐÅ¼þ¿ÉÄÜ°üº¬Èðì÷ëŠÍ¨™CÃÜÙYÓ£¬·ÇÖ¸¶¨Ö®ÊÕ¼þÕß£¬ÕˆÎðÊ¹ÓÃ»ò½ÒÂ¶±¾ÐÅ¼þƒÈÈÝ£¬KÕˆäNš§´ËÐÅ¼þ¡£ This email may contain confidential information. Please do not use or disclose it in any way and delete it if you are not the intended recipient.

--
Best Regards,
Zachary Jen

Software RD
CAS-WELL Inc.
8th Floor, No. 242, Bo-Ai St., Shu-Lin City, Taipei County 238, Taiwan
Tel: +886-2-7731-8888#6305
Fax: +886-2-7731-9988

±¾ÐÅ¼þ¿ÉÄÜ°üº¬Èðì÷ëŠÍ¨™CÃÜÙYÓ£¬·ÇÖ¸¶¨Ö®ÊÕ¼þÕß£¬ÕˆÎðÊ¹ÓÃ»ò½ÒÂ¶±¾ÐÅ¼þƒÈÈÝ£¬KÕˆäNš§´ËÐÅ¼þ¡£ This email may contain confidential information. Please do not use or disclose it in any way and delete it if you are not the intended recipient.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-09-06  7:31 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-02  3:22 DPDK & QPI performance issue in Romley platform Zachary
     [not found] ` <52240466.7050907-hquedaq+nxtWk0Htik3J/w@public.gmane.org>
2013-09-02 16:10   ` Stephen Hemminger
2013-09-03 16:19   ` =?gb18030?b?u9i4tKO6IERQREsgJiBRUEkgcGVyZm9ybWFuY2Ug?= =?gb18030?q?issue_in_Romley_platform=2E?= =?gb18030?B?Qm9iIENoZW4=?=
     [not found] <52289F01.7010503@cas-well.com>
     [not found] ` <52289F01.7010503-hquedaq+nxtWk0Htik3J/w@public.gmane.org>
2013-09-06  7:31   ` DPDK & QPI performance issue in Romley platform Zachary

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.