* rps testing questions @ 2011-01-17 9:43 mi wake 2011-01-17 9:53 ` Eric Dumazet 2011-01-17 13:08 ` Ben Hutchings 0 siblings, 2 replies; 7+ messages in thread From: mi wake @ 2011-01-17 9:43 UTC (permalink / raw) To: netdev I do a rps(Receive Packet Steering) testing on centos 5.5 with kernel 2.6.37. cpu: 8 core Intel. ethernet adapter: bnx2x Problem statement: enable rps with: echo "ff" > /sys/class/net/eth2/queues/rx-0/rps_cpus. running 1 instances of netperf TCP_RR: netperf -t TCP_RR -H 192.168.0.1 -c -C without rps: 9963.48(Trans Rate per sec) with rps: 9387.59(Trans Rate per sec) I do ab and tbench testing also find there is less tps with enable rps.but,there is more cpu using when with enable rps.when with enable rps ,softirqs is blanced on cpus. is there something wrong with my test? ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: rps testing questions 2011-01-17 9:43 rps testing questions mi wake @ 2011-01-17 9:53 ` Eric Dumazet 2011-01-18 8:34 ` mi wake 2011-01-17 13:08 ` Ben Hutchings 1 sibling, 1 reply; 7+ messages in thread From: Eric Dumazet @ 2011-01-17 9:53 UTC (permalink / raw) To: mi wake; +Cc: netdev Le lundi 17 janvier 2011 à 17:43 +0800, mi wake a écrit : > I do a rps(Receive Packet Steering) testing on centos 5.5 with kernel 2.6.37. > cpu: 8 core Intel. > ethernet adapter: bnx2x > > Problem statement: > enable rps with: > echo "ff" > /sys/class/net/eth2/queues/rx-0/rps_cpus. > bnx2x with one queue only ? > running 1 instances of netperf TCP_RR: netperf -t TCP_RR -H 192.168.0.1 -c -C > without rps: 9963.48(Trans Rate per sec) > with rps: 9387.59(Trans Rate per sec) > > I do ab and tbench testing also find there is less tps with enable > rps.but,there is more cpu using when with enable rps.when with enable > rps ,softirqs is blanced on cpus. Really ? that seems unlikely with your one flow test, unless you _also_ have hardware IRQS hitting all your cpus. (That would be very bad) > > is there something wrong with my test? > -- If you test with one flow, RPS brings nothing at all. Better handle the packet directly from the cpu handling the hardware IRQ (and NAPI) You better make sure hardware IRQ are on one cpu, instead of many cpus. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: rps testing questions 2011-01-17 9:53 ` Eric Dumazet @ 2011-01-18 8:34 ` mi wake 0 siblings, 0 replies; 7+ messages in thread From: mi wake @ 2011-01-18 8:34 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev 2011/1/17 Eric Dumazet <eric.dumazet@gmail.com>: > Le lundi 17 janvier 2011 à 17:43 +0800, mi wake a écrit : >> I do a rps(Receive Packet Steering) testing on centos 5.5 with kernel 2.6.37. >> cpu: 8 core Intel. >> ethernet adapter: bnx2x >> >> Problem statement: >> enable rps with: >> echo "ff" > /sys/class/net/eth2/queues/rx-0/rps_cpus. >> > > bnx2x with one queue only ? > >> running 1 instances of netperf TCP_RR: netperf -t TCP_RR -H 192.168.0.1 -c -C >> without rps: 9963.48(Trans Rate per sec) >> with rps: 9387.59(Trans Rate per sec) >> >> I do ab and tbench testing also find there is less tps with enable >> rps.but,there is more cpu using when with enable rps.when with enable >> rps ,softirqs is blanced on cpus. > > Really ? that seems unlikely with your one flow test, unless you _also_ > have hardware IRQS hitting all your cpus. (That would be very bad) > >> >> is there something wrong with my test? >> -- > > If you test with one flow, RPS brings nothing at all. Better handle the > packet directly from the cpu handling the hardware IRQ (and NAPI) > > You better make sure hardware IRQ are on one cpu, instead of many cpus. > > I have checked that bnx2x with one queue only and hardware IRQ are on one cpu. I do testing again with more flows.when I use ip range from 192.x.x.1 to 192.x.x.200 to send syn packets. Server can deal with : without rps + rfs : 18M/s with rps +rfs : 21M/s. Maybe in previous tests,there are less flow. I will continue testing. Thank you! ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: rps testing questions 2011-01-17 9:43 rps testing questions mi wake 2011-01-17 9:53 ` Eric Dumazet @ 2011-01-17 13:08 ` Ben Hutchings 2011-01-18 18:23 ` Rick Jones 1 sibling, 1 reply; 7+ messages in thread From: Ben Hutchings @ 2011-01-17 13:08 UTC (permalink / raw) To: mi wake; +Cc: netdev On Mon, 2011-01-17 at 17:43 +0800, mi wake wrote: > I do a rps(Receive Packet Steering) testing on centos 5.5 with kernel 2.6.37. > cpu: 8 core Intel. > ethernet adapter: bnx2x > > Problem statement: > enable rps with: > echo "ff" > /sys/class/net/eth2/queues/rx-0/rps_cpus. > > running 1 instances of netperf TCP_RR: netperf -t TCP_RR -H 192.168.0.1 -c -C > without rps: 9963.48(Trans Rate per sec) > with rps: 9387.59(Trans Rate per sec) > > I do ab and tbench testing also find there is less tps with enable > rps.but,there is more cpu using when with enable rps.when with enable > rps ,softirqs is blanced on cpus. > > is there something wrong with my test? In addition to what Eric said, check the interrupt moderation settings (ethtool -c/-C options). One-way latency for a single request/response test will be at least the interrupt moderation value. I haven't tested RPS by itself (Solarflare NICs have plenty of hardware queues) so I don't know whether it can improve latency. However, RFS certainly does when there are many flows. Ben. -- Ben Hutchings, Senior Software Engineer, Solarflare Communications Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: rps testing questions 2011-01-17 13:08 ` Ben Hutchings @ 2011-01-18 18:23 ` Rick Jones 2011-01-18 18:34 ` Ben Hutchings 0 siblings, 1 reply; 7+ messages in thread From: Rick Jones @ 2011-01-18 18:23 UTC (permalink / raw) To: Ben Hutchings; +Cc: mi wake, netdev Ben Hutchings wrote: > On Mon, 2011-01-17 at 17:43 +0800, mi wake wrote: > >>I do a rps(Receive Packet Steering) testing on centos 5.5 with kernel 2.6.37. >>cpu: 8 core Intel. >>ethernet adapter: bnx2x >> >>Problem statement: >>enable rps with: >>echo "ff" > /sys/class/net/eth2/queues/rx-0/rps_cpus. >> >>running 1 instances of netperf TCP_RR: netperf -t TCP_RR -H 192.168.0.1 -c -C >>without rps: 9963.48(Trans Rate per sec) >>with rps: 9387.59(Trans Rate per sec) Presumably there was an increase in service demand corresponding with the drop in transactions per second. Also, an unsolicited benchmarking style tip or two. I find it helpful to either do several discrete runs, or use the confidence intervals (global -i and -I options) with the TCP_RR tests when I am looking to compare two settings. I find a bit more "variability" in the _RR tests than the _STREAM tests. http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html#index-g_t_002dI_002c-Global-26 Pinning netperf/netserver is also something I tend to do, but combining that with confidence intervals, RPS is kind of difficult - the successive data connections made while running the iterations of the confidence intervals will have different port numbers and so different hashing. That would cause RPS to put the connections on different cores in turn, which would, in conjunction with netperf/netserver being pinned to a core cause the relationship between where netperf runs and where netserver runs to change. That will likely result in cache to cache (processor cache) transfers which will definitely up the service demand and drop the single-stream transactions per second. In theory :) with RFS that should not be an issue since where netperf/netserver are pinned controls where the inbound processing takes place. We are in a maze of twisty heuristics... :) >>I do ab and tbench testing also find there is less tps with enable >>rps.but,there is more cpu using when with enable rps.when with enable >>rps ,softirqs is blanced on cpus. >> >>is there something wrong with my test? > > > In addition to what Eric said, check the interrupt moderation settings > (ethtool -c/-C options). One-way latency for a single request/response > test will be at least the interrupt moderation value. > > I haven't tested RPS by itself (Solarflare NICs have plenty of hardware > queues) so I don't know whether it can improve latency. However, RFS > certainly does when there are many flows. Is there actually an expectation that either RPS or RFS would improve *latency*? Multiple-stream throughput certainly, but with the additional work done to spread things around, I wouldn't expect either to improve latency. happy benchmarking, rick jones ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: rps testing questions 2011-01-18 18:23 ` Rick Jones @ 2011-01-18 18:34 ` Ben Hutchings 2011-01-18 19:10 ` Rick Jones 0 siblings, 1 reply; 7+ messages in thread From: Ben Hutchings @ 2011-01-18 18:34 UTC (permalink / raw) To: Rick Jones; +Cc: mi wake, netdev On Tue, 2011-01-18 at 10:23 -0800, Rick Jones wrote: > Ben Hutchings wrote: > > On Mon, 2011-01-17 at 17:43 +0800, mi wake wrote: [...] > >>I do ab and tbench testing also find there is less tps with enable > >>rps.but,there is more cpu using when with enable rps.when with enable > >>rps ,softirqs is blanced on cpus. > >> > >>is there something wrong with my test? > > > > > > In addition to what Eric said, check the interrupt moderation settings > > (ethtool -c/-C options). One-way latency for a single request/response > > test will be at least the interrupt moderation value. > > > > I haven't tested RPS by itself (Solarflare NICs have plenty of hardware > > queues) so I don't know whether it can improve latency. However, RFS > > certainly does when there are many flows. > > Is there actually an expectation that either RPS or RFS would improve *latency*? > Multiple-stream throughput certainly, but with the additional work done to > spread things around, I wouldn't expect either to improve latency. Yes, it seems to make a big improvement to latency when many flows are active. Tom told me that one of his benchmarks was 200 * netperf TCP_RR in parallel, and I've seen over 40% reduction in latency for that. That said, allocating more RX queues might also help (sfc currently defaults to one per processor package rather than one per processor thread, due to concerns about CPU efficiency). Ben. -- Ben Hutchings, Senior Software Engineer, Solarflare Communications Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: rps testing questions 2011-01-18 18:34 ` Ben Hutchings @ 2011-01-18 19:10 ` Rick Jones 0 siblings, 0 replies; 7+ messages in thread From: Rick Jones @ 2011-01-18 19:10 UTC (permalink / raw) To: Ben Hutchings; +Cc: mi wake, netdev Ben Hutchings wrote: > On Tue, 2011-01-18 at 10:23 -0800, Rick Jones wrote: > >>Ben Hutchings wrote: >> >>>On Mon, 2011-01-17 at 17:43 +0800, mi wake wrote: > > [...] > >>>>I do ab and tbench testing also find there is less tps with enable >>>>rps.but,there is more cpu using when with enable rps.when with enable >>>>rps ,softirqs is blanced on cpus. >>>> >>>>is there something wrong with my test? >>> >>> >>>In addition to what Eric said, check the interrupt moderation settings >>>(ethtool -c/-C options). One-way latency for a single request/response >>>test will be at least the interrupt moderation value. >>> >>>I haven't tested RPS by itself (Solarflare NICs have plenty of hardware >>>queues) so I don't know whether it can improve latency. However, RFS >>>certainly does when there are many flows. >> >>Is there actually an expectation that either RPS or RFS would improve *latency*? >> Multiple-stream throughput certainly, but with the additional work done to >>spread things around, I wouldn't expect either to improve latency. > > > Yes, it seems to make a big improvement to latency when many flows are > active. OK, you and I were using different definitions. I was speaking to single-stream latency, but didn't say it explicitly (I may have subconsciously thought it was implicit given the OP used a single instance of netperf :). happy benchmarking, rick jones > Tom told me that one of his benchmarks was 200 * netperf TCP_RR > in parallel, and I've seen over 40% reduction in latency for that. That > said, allocating more RX queues might also help (sfc currently defaults > to one per processor package rather than one per processor thread, due > to concerns about CPU efficiency). > > Ben. > ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2011-01-18 19:10 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-01-17 9:43 rps testing questions mi wake 2011-01-17 9:53 ` Eric Dumazet 2011-01-18 8:34 ` mi wake 2011-01-17 13:08 ` Ben Hutchings 2011-01-18 18:23 ` Rick Jones 2011-01-18 18:34 ` Ben Hutchings 2011-01-18 19:10 ` Rick Jones
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).