* [RFC] implement QUEUED spinlocks on powerpc @ 2017-02-01 17:05 Eric Dumazet 2017-02-01 20:37 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 10+ messages in thread From: Eric Dumazet @ 2017-02-01 17:05 UTC (permalink / raw) To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman Cc: linuxppc-dev, Kevin Hao, Torsten Duwe, Eric Dumazet Hi all Is anybody working on adding QUEUED spinlocks to powerpc 64bit ? I've seen past attempts with ticket spinlocks ( https://patchwork.ozlabs.org/patch/449381/ and other related links ) But it looks ticket spinlocks are a thing of the past. Thanks. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] implement QUEUED spinlocks on powerpc 2017-02-01 17:05 [RFC] implement QUEUED spinlocks on powerpc Eric Dumazet @ 2017-02-01 20:37 ` Benjamin Herrenschmidt 2017-02-02 4:04 ` Michael Ellerman 0 siblings, 1 reply; 10+ messages in thread From: Benjamin Herrenschmidt @ 2017-02-01 20:37 UTC (permalink / raw) To: Eric Dumazet, Paul Mackerras, Michael Ellerman Cc: linuxppc-dev, Kevin Hao, Torsten Duwe, Eric Dumazet, Pan Xinhui On Wed, 2017-02-01 at 09:05 -0800, Eric Dumazet wrote: > Hi all > > Is anybody working on adding QUEUED spinlocks to powerpc 64bit ? > > I've seen past attempts with ticket spinlocks > ( https://patchwork.ozlabs.org/patch/449381/ and other related links > ) > > But it looks ticket spinlocks are a thing of the past. Yes, we have a tentative implementation of qspinlock and pv variants: https://patchwork.ozlabs.org/patch/703139/ https://patchwork.ozlabs.org/patch/703140/ https://patchwork.ozlabs.org/patch/703141/ https://patchwork.ozlabs.org/patch/703142/ https://patchwork.ozlabs.org/patch/703143/ https://patchwork.ozlabs.org/patch/703144/ Michael, what's the status with getting that merged ? Cheers, Ben. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] implement QUEUED spinlocks on powerpc 2017-02-01 20:37 ` Benjamin Herrenschmidt @ 2017-02-02 4:04 ` Michael Ellerman 2017-02-02 4:40 ` Eric Dumazet 0 siblings, 1 reply; 10+ messages in thread From: Michael Ellerman @ 2017-02-02 4:04 UTC (permalink / raw) To: Benjamin Herrenschmidt, Eric Dumazet, Paul Mackerras Cc: linuxppc-dev, Kevin Hao, Torsten Duwe, Eric Dumazet, Pan Xinhui Benjamin Herrenschmidt <benh@kernel.crashing.org> writes: > On Wed, 2017-02-01 at 09:05 -0800, Eric Dumazet wrote: >> Hi all >> >> Is anybody working on adding QUEUED spinlocks to powerpc 64bit ? >> >> I've seen past attempts with ticket spinlocks >> ( https://patchwork.ozlabs.org/patch/449381/ and other related links >> ) >> >> But it looks ticket spinlocks are a thing of the past. > > Yes, we have a tentative implementation of qspinlock and pv variants: > > https://patchwork.ozlabs.org/patch/703139/ > https://patchwork.ozlabs.org/patch/703140/ > https://patchwork.ozlabs.org/patch/703141/ > https://patchwork.ozlabs.org/patch/703142/ > https://patchwork.ozlabs.org/patch/703143/ > https://patchwork.ozlabs.org/patch/703144/ > > Michael, what's the status with getting that merged ? Needs a good review, and the benchmark results were not all that compelling - though perhaps they were just the wrong benchmarks. cheers ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] implement QUEUED spinlocks on powerpc 2017-02-02 4:04 ` Michael Ellerman @ 2017-02-02 4:40 ` Eric Dumazet 2017-02-02 4:42 ` Benjamin Herrenschmidt 2017-02-07 6:21 ` panxinhui 0 siblings, 2 replies; 10+ messages in thread From: Eric Dumazet @ 2017-02-02 4:40 UTC (permalink / raw) To: Michael Ellerman Cc: Benjamin Herrenschmidt, Eric Dumazet, Paul Mackerras, linuxppc-dev, Kevin Hao, Torsten Duwe, Pan Xinhui On Wed, Feb 1, 2017 at 8:04 PM, Michael Ellerman <mpe@ellerman.id.au> wrote: > Benjamin Herrenschmidt <benh@kernel.crashing.org> writes: > >> On Wed, 2017-02-01 at 09:05 -0800, Eric Dumazet wrote: >>> Hi all >>> >>> Is anybody working on adding QUEUED spinlocks to powerpc 64bit ? >>> >>> I've seen past attempts with ticket spinlocks >>> ( https://patchwork.ozlabs.org/patch/449381/ and other related links >>> ) >>> >>> But it looks ticket spinlocks are a thing of the past. >> >> Yes, we have a tentative implementation of qspinlock and pv variants: >> >> https://patchwork.ozlabs.org/patch/703139/ >> https://patchwork.ozlabs.org/patch/703140/ >> https://patchwork.ozlabs.org/patch/703141/ >> https://patchwork.ozlabs.org/patch/703142/ >> https://patchwork.ozlabs.org/patch/703143/ >> https://patchwork.ozlabs.org/patch/703144/ >> >> Michael, what's the status with getting that merged ? > > Needs a good review, and the benchmark results were not all that > compelling - though perhaps they were just the wrong benchmarks. A typical benchmark would be to use 200 concurrent netperf -t TCP_RR, through a single qdisc (protected by a spinlock) Non ticket/queued spinlocks behave quite bad in this scenario. I can try this next week if you want. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] implement QUEUED spinlocks on powerpc 2017-02-02 4:40 ` Eric Dumazet @ 2017-02-02 4:42 ` Benjamin Herrenschmidt 2017-02-07 6:21 ` panxinhui 1 sibling, 0 replies; 10+ messages in thread From: Benjamin Herrenschmidt @ 2017-02-02 4:42 UTC (permalink / raw) To: Eric Dumazet, Michael Ellerman Cc: Eric Dumazet, Paul Mackerras, linuxppc-dev, Kevin Hao, Torsten Duwe, Pan Xinhui On Wed, 2017-02-01 at 20:40 -0800, Eric Dumazet wrote: > A typical benchmark would be to use 200 concurrent netperf -t TCP_RR, > through a single qdisc (protected by a spinlock) > > Non ticket/queued spinlocks behave quite bad in this scenario. > > I can try this next week if you want. That would be great ! Cheers, Ben. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] implement QUEUED spinlocks on powerpc 2017-02-02 4:40 ` Eric Dumazet 2017-02-02 4:42 ` Benjamin Herrenschmidt @ 2017-02-07 6:21 ` panxinhui 2017-02-07 6:46 ` Eric Dumazet 1 sibling, 1 reply; 10+ messages in thread From: panxinhui @ 2017-02-07 6:21 UTC (permalink / raw) To: Eric Dumazet, Michael Ellerman Cc: Benjamin Herrenschmidt, Eric Dumazet, Paul Mackerras, linuxppc-dev, Kevin Hao, Torsten Duwe, Pan Xinhui [-- Attachment #1: Type: text/plain, Size: 2504 bytes --] 在 2017/2/2 下午12:40, Eric Dumazet 写道: > On Wed, Feb 1, 2017 at 8:04 PM, Michael Ellerman <mpe@ellerman.id.au> wrote: >> Benjamin Herrenschmidt <benh@kernel.crashing.org> writes: >> >>> On Wed, 2017-02-01 at 09:05 -0800, Eric Dumazet wrote: >>>> Hi all >>>> >>>> Is anybody working on adding QUEUED spinlocks to powerpc 64bit ? >>>> >>>> I've seen past attempts with ticket spinlocks >>>> ( https://patchwork.ozlabs.org/patch/449381/ and other related links >>>> ) >>>> >>>> But it looks ticket spinlocks are a thing of the past. >>> >>> Yes, we have a tentative implementation of qspinlock and pv variants: >>> >>> https://patchwork.ozlabs.org/patch/703139/ >>> https://patchwork.ozlabs.org/patch/703140/ >>> https://patchwork.ozlabs.org/patch/703141/ >>> https://patchwork.ozlabs.org/patch/703142/ >>> https://patchwork.ozlabs.org/patch/703143/ >>> https://patchwork.ozlabs.org/patch/703144/ >>> >>> Michael, what's the status with getting that merged ? >> >> Needs a good review, and the benchmark results were not all that >> compelling - though perhaps they were just the wrong benchmarks. > > A typical benchmark would be to use 200 concurrent netperf -t TCP_RR, > through a single qdisc (protected by a spinlock) > hi all I do some netperf tests and get some benchmark results. I also attach my test script and netperf-result(Excel) There are two machine. one runs netserver and the other runs netperf benchmark. 1000Mbps network is connected with them. #ip link infomation 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 1000 link/ether ba:68:9c:14:32:02 brd ff:ff:ff:ff:ff:ff According to the results, there is not much performance gap with each other. And as we are only testing the throughput, the pvqspinlock shows the overhead of its pv stuff. but qspinlock shows a little improvement than spinlock. My simple summary in this testcase is qspinlock > spinlock > pvqspinlock. when run 200 concurrent netperf, I paste the total throughput here. concurrent runners| total throughput | variance ------------------------------------------- spinlock | 199 | 66882.8 | 89.93 ------------------------------------------- qspinlock | 199 | 66350.4 | 72.0239 ------------------------------------------- pvqspinlock | 199 | 64740.5 | 85.7837 You could see more data in nerperf.xlsx thanks xinhui > Non ticket/queued spinlocks behave quite bad in this scenario. > > I can try this next week if you want. > [-- Attachment #2: np.sh --] [-- Type: application/x-sh, Size: 843 bytes --] [-- Attachment #3: netperf-restult.xlsx --] [-- Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet, Size: 129045 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] implement QUEUED spinlocks on powerpc 2017-02-07 6:21 ` panxinhui @ 2017-02-07 6:46 ` Eric Dumazet 2017-02-07 7:22 ` panxinhui 2017-02-13 9:08 ` panxinhui 0 siblings, 2 replies; 10+ messages in thread From: Eric Dumazet @ 2017-02-07 6:46 UTC (permalink / raw) To: panxinhui Cc: Michael Ellerman, Benjamin Herrenschmidt, Eric Dumazet, Paul Mackerras, linuxppc-dev, Kevin Hao, Torsten Duwe, Pan Xinhui On Mon, Feb 6, 2017 at 10:21 PM, panxinhui <xinhui@linux.vnet.ibm.com> wrote: > hi all > I do some netperf tests and get some benchmark results. > I also attach my test script and netperf-result(Excel) > > There are two machine. one runs netserver and the other runs netperf > benchmark. 1000Mbps network is connected with them. > > #ip link infomation > 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state > UNKNOWN mode DEFAULT group default qlen 1000 > link/ether ba:68:9c:14:32:02 brd ff:ff:ff:ff:ff:ff > > According to the results, there is not much performance gap with each other. > And as we are only testing the throughput, the pvqspinlock shows the > overhead of its pv stuff. but qspinlock shows a little improvement than > spinlock. My simple summary in this testcase is > qspinlock > spinlock > pvqspinlock. > > when run 200 concurrent netperf, I paste the total throughput here. > > concurrent runners| total throughput | variance > ------------------------------------------- > spinlock | 199 | 66882.8 | 89.93 > ------------------------------------------- > qspinlock | 199 | 66350.4 | 72.0239 > ------------------------------------------- > pvqspinlock | 199 | 64740.5 | 85.7837 > > You could see more data in nerperf.xlsx > > thanks > xinhui Hi xinhui 1Gbit NIC is too slow for this use case. I would try a 10Gbit NIC at least... Alternatively, you could use loopback interface. (netperf -H 127.0.0.1) tc qd add dev lo root pfifo limit 10000 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] implement QUEUED spinlocks on powerpc 2017-02-07 6:46 ` Eric Dumazet @ 2017-02-07 7:22 ` panxinhui 2017-02-13 9:08 ` panxinhui 1 sibling, 0 replies; 10+ messages in thread From: panxinhui @ 2017-02-07 7:22 UTC (permalink / raw) To: Eric Dumazet Cc: Michael Ellerman, Benjamin Herrenschmidt, Eric Dumazet, Paul Mackerras, linuxppc-dev, Kevin Hao, Torsten Duwe, Pan Xinhui 在 2017/2/7 下午2:46, Eric Dumazet 写道: > On Mon, Feb 6, 2017 at 10:21 PM, panxinhui <xinhui@linux.vnet.ibm.com> wrote: > >> hi all >> I do some netperf tests and get some benchmark results. >> I also attach my test script and netperf-result(Excel) >> >> There are two machine. one runs netserver and the other runs netperf >> benchmark. 1000Mbps network is connected with them. >> >> #ip link infomation >> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state >> UNKNOWN mode DEFAULT group default qlen 1000 >> link/ether ba:68:9c:14:32:02 brd ff:ff:ff:ff:ff:ff >> >> According to the results, there is not much performance gap with each other. >> And as we are only testing the throughput, the pvqspinlock shows the >> overhead of its pv stuff. but qspinlock shows a little improvement than >> spinlock. My simple summary in this testcase is >> qspinlock > spinlock > pvqspinlock. >> >> when run 200 concurrent netperf, I paste the total throughput here. >> >> concurrent runners| total throughput | variance >> ------------------------------------------- >> spinlock | 199 | 66882.8 | 89.93 >> ------------------------------------------- >> qspinlock | 199 | 66350.4 | 72.0239 >> ------------------------------------------- >> pvqspinlock | 199 | 64740.5 | 85.7837 >> >> You could see more data in nerperf.xlsx >> >> thanks >> xinhui > > > Hi xinhui > > 1Gbit NIC is too slow for this use case. I would try a 10Gbit NIC at least... > > Alternatively, you could use loopback interface. (netperf -H 127.0.0.1) > > tc qd add dev lo root pfifo limit 10000 > great, thanks xinhui ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] implement QUEUED spinlocks on powerpc 2017-02-07 6:46 ` Eric Dumazet 2017-02-07 7:22 ` panxinhui @ 2017-02-13 9:08 ` panxinhui 2017-02-15 10:17 ` panxinhui 1 sibling, 1 reply; 10+ messages in thread From: panxinhui @ 2017-02-13 9:08 UTC (permalink / raw) To: Eric Dumazet Cc: Michael Ellerman, Benjamin Herrenschmidt, Eric Dumazet, Paul Mackerras, linuxppc-dev, Kevin Hao, Torsten Duwe, Pan Xinhui [-- Attachment #1: Type: text/plain, Size: 2076 bytes --] 在 2017/2/7 下午2:46, Eric Dumazet 写道: > On Mon, Feb 6, 2017 at 10:21 PM, panxinhui <xinhui@linux.vnet.ibm.com> wrote: > >> hi all >> I do some netperf tests and get some benchmark results. >> I also attach my test script and netperf-result(Excel) >> HI, all I use loopback interface to run netperf tests, #tc qd add dev lo root pfifo limit 10000 #ip link 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc pfifo state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 and put the result in netperf.xlsx(excel) It is a 32 vcpus P8 machine, with 32Gib memory. This time spinlock is the best one, qspinlock > pvqspinlock. So sad. thanks xinhui >> There are two machine. one runs netserver and the other runs netperf >> benchmark. 1000Mbps network is connected with them. >> >> #ip link infomation >> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state >> UNKNOWN mode DEFAULT group default qlen 1000 >> link/ether ba:68:9c:14:32:02 brd ff:ff:ff:ff:ff:ff >> >> According to the results, there is not much performance gap with each other. >> And as we are only testing the throughput, the pvqspinlock shows the >> overhead of its pv stuff. but qspinlock shows a little improvement than >> spinlock. My simple summary in this testcase is >> qspinlock > spinlock > pvqspinlock. >> >> when run 200 concurrent netperf, I paste the total throughput here. >> >> concurrent runners| total throughput | variance >> ------------------------------------------- >> spinlock | 199 | 66882.8 | 89.93 >> ------------------------------------------- >> qspinlock | 199 | 66350.4 | 72.0239 >> ------------------------------------------- >> pvqspinlock | 199 | 64740.5 | 85.7837 >> >> You could see more data in nerperf.xlsx >> >> thanks >> xinhui > > > Hi xinhui > > 1Gbit NIC is too slow for this use case. I would try a 10Gbit NIC at least... > > Alternatively, you could use loopback interface. (netperf -H 127.0.0.1) > > tc qd add dev lo root pfifo limit 10000 > [-- Attachment #2: netperf.xlsx --] [-- Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet, Size: 61378 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] implement QUEUED spinlocks on powerpc 2017-02-13 9:08 ` panxinhui @ 2017-02-15 10:17 ` panxinhui 0 siblings, 0 replies; 10+ messages in thread From: panxinhui @ 2017-02-15 10:17 UTC (permalink / raw) To: Eric Dumazet Cc: Michael Ellerman, Benjamin Herrenschmidt, Eric Dumazet, Paul Mackerras, linuxppc-dev, Kevin Hao, Torsten Duwe, Pan Xinhui [-- Attachment #1: Type: text/plain, Size: 2418 bytes --] 在 2017/2/13 下午5:08, panxinhui 写道: > > > 在 2017/2/7 下午2:46, Eric Dumazet 写道: >> On Mon, Feb 6, 2017 at 10:21 PM, panxinhui <xinhui@linux.vnet.ibm.com> wrote: >> >>> hi all >>> I do some netperf tests and get some benchmark results. >>> I also attach my test script and netperf-result(Excel) >>> > HI, all > I use loopback interface to run netperf tests, > #tc qd add dev lo root pfifo limit 10000 > #ip link > 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc pfifo state UNKNOWN mode DEFAULT group default qlen 1000 > link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 > > and put the result in netperf.xlsx(excel) > > It is a 32 vcpus P8 machine, with 32Gib memory. > > This time spinlock is the best one, qspinlock > pvqspinlock. So sad. > This time, I have appiled some optimising patches on pvqspinlock. When there is a high contention, the performance has a good improvement ans is very similar to spinlock. Result is attached in netperf.xlsx thanks xinhui > thanks > xinhui >>> There are two machine. one runs netserver and the other runs netperf >>> benchmark. 1000Mbps network is connected with them. >>> >>> #ip link infomation >>> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state >>> UNKNOWN mode DEFAULT group default qlen 1000 >>> link/ether ba:68:9c:14:32:02 brd ff:ff:ff:ff:ff:ff >>> >>> According to the results, there is not much performance gap with each other. >>> And as we are only testing the throughput, the pvqspinlock shows the >>> overhead of its pv stuff. but qspinlock shows a little improvement than >>> spinlock. My simple summary in this testcase is >>> qspinlock > spinlock > pvqspinlock. >>> >>> when run 200 concurrent netperf, I paste the total throughput here. >>> >>> concurrent runners| total throughput | variance >>> ------------------------------------------- >>> spinlock | 199 | 66882.8 | 89.93 >>> ------------------------------------------- >>> qspinlock | 199 | 66350.4 | 72.0239 >>> ------------------------------------------- >>> pvqspinlock | 199 | 64740.5 | 85.7837 >>> >>> You could see more data in nerperf.xlsx >>> >>> thanks >>> xinhui >> >> >> Hi xinhui >> >> 1Gbit NIC is too slow for this use case. I would try a 10Gbit NIC at least... >> >> Alternatively, you could use loopback interface. (netperf -H 127.0.0.1) >> >> tc qd add dev lo root pfifo limit 10000 >> [-- Attachment #2: netperf.xlsx --] [-- Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet, Size: 72195 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2017-02-15 10:17 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-02-01 17:05 [RFC] implement QUEUED spinlocks on powerpc Eric Dumazet 2017-02-01 20:37 ` Benjamin Herrenschmidt 2017-02-02 4:04 ` Michael Ellerman 2017-02-02 4:40 ` Eric Dumazet 2017-02-02 4:42 ` Benjamin Herrenschmidt 2017-02-07 6:21 ` panxinhui 2017-02-07 6:46 ` Eric Dumazet 2017-02-07 7:22 ` panxinhui 2017-02-13 9:08 ` panxinhui 2017-02-15 10:17 ` panxinhui
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).