linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [RFC] implement QUEUED spinlocks on powerpc
@ 2017-02-01 17:05 Eric Dumazet
  2017-02-01 20:37 ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2017-02-01 17:05 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, Kevin Hao, Torsten Duwe, Eric Dumazet

Hi all

Is anybody working on adding QUEUED spinlocks to powerpc 64bit ?

I've seen past attempts with ticket spinlocks
( https://patchwork.ozlabs.org/patch/449381/ and other related links )

But it looks ticket spinlocks are a thing of the past.

Thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] implement QUEUED spinlocks on powerpc
  2017-02-01 17:05 [RFC] implement QUEUED spinlocks on powerpc Eric Dumazet
@ 2017-02-01 20:37 ` Benjamin Herrenschmidt
  2017-02-02  4:04   ` Michael Ellerman
  0 siblings, 1 reply; 10+ messages in thread
From: Benjamin Herrenschmidt @ 2017-02-01 20:37 UTC (permalink / raw)
  To: Eric Dumazet, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, Kevin Hao, Torsten Duwe, Eric Dumazet, Pan Xinhui

On Wed, 2017-02-01 at 09:05 -0800, Eric Dumazet wrote:
> Hi all
> 
> Is anybody working on adding QUEUED spinlocks to powerpc 64bit ?
> 
> I've seen past attempts with ticket spinlocks
> ( https://patchwork.ozlabs.org/patch/449381/ and other related links
> )
> 
> But it looks ticket spinlocks are a thing of the past.

Yes, we have a tentative implementation of qspinlock and pv variants:

https://patchwork.ozlabs.org/patch/703139/
https://patchwork.ozlabs.org/patch/703140/
https://patchwork.ozlabs.org/patch/703141/
https://patchwork.ozlabs.org/patch/703142/
https://patchwork.ozlabs.org/patch/703143/
https://patchwork.ozlabs.org/patch/703144/

Michael, what's the status with getting that merged ?

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] implement QUEUED spinlocks on powerpc
  2017-02-01 20:37 ` Benjamin Herrenschmidt
@ 2017-02-02  4:04   ` Michael Ellerman
  2017-02-02  4:40     ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: Michael Ellerman @ 2017-02-02  4:04 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Eric Dumazet, Paul Mackerras
  Cc: linuxppc-dev, Kevin Hao, Torsten Duwe, Eric Dumazet, Pan Xinhui

Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

> On Wed, 2017-02-01 at 09:05 -0800, Eric Dumazet wrote:
>> Hi all
>> 
>> Is anybody working on adding QUEUED spinlocks to powerpc 64bit ?
>> 
>> I've seen past attempts with ticket spinlocks
>> ( https://patchwork.ozlabs.org/patch/449381/ and other related links
>> )
>> 
>> But it looks ticket spinlocks are a thing of the past.
>
> Yes, we have a tentative implementation of qspinlock and pv variants:
>
> https://patchwork.ozlabs.org/patch/703139/
> https://patchwork.ozlabs.org/patch/703140/
> https://patchwork.ozlabs.org/patch/703141/
> https://patchwork.ozlabs.org/patch/703142/
> https://patchwork.ozlabs.org/patch/703143/
> https://patchwork.ozlabs.org/patch/703144/
>
> Michael, what's the status with getting that merged ?

Needs a good review, and the benchmark results were not all that
compelling - though perhaps they were just the wrong benchmarks.

cheers

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] implement QUEUED spinlocks on powerpc
  2017-02-02  4:04   ` Michael Ellerman
@ 2017-02-02  4:40     ` Eric Dumazet
  2017-02-02  4:42       ` Benjamin Herrenschmidt
  2017-02-07  6:21       ` panxinhui
  0 siblings, 2 replies; 10+ messages in thread
From: Eric Dumazet @ 2017-02-02  4:40 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Benjamin Herrenschmidt, Eric Dumazet, Paul Mackerras,
	linuxppc-dev, Kevin Hao, Torsten Duwe, Pan Xinhui

On Wed, Feb 1, 2017 at 8:04 PM, Michael Ellerman <mpe@ellerman.id.au> wrote:
> Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:
>
>> On Wed, 2017-02-01 at 09:05 -0800, Eric Dumazet wrote:
>>> Hi all
>>>
>>> Is anybody working on adding QUEUED spinlocks to powerpc 64bit ?
>>>
>>> I've seen past attempts with ticket spinlocks
>>> ( https://patchwork.ozlabs.org/patch/449381/ and other related links
>>> )
>>>
>>> But it looks ticket spinlocks are a thing of the past.
>>
>> Yes, we have a tentative implementation of qspinlock and pv variants:
>>
>> https://patchwork.ozlabs.org/patch/703139/
>> https://patchwork.ozlabs.org/patch/703140/
>> https://patchwork.ozlabs.org/patch/703141/
>> https://patchwork.ozlabs.org/patch/703142/
>> https://patchwork.ozlabs.org/patch/703143/
>> https://patchwork.ozlabs.org/patch/703144/
>>
>> Michael, what's the status with getting that merged ?
>
> Needs a good review, and the benchmark results were not all that
> compelling - though perhaps they were just the wrong benchmarks.

A typical benchmark would be to use 200 concurrent netperf -t TCP_RR,
through a single qdisc (protected by a spinlock)

Non ticket/queued spinlocks behave quite bad in this scenario.

I can try this next week if you want.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] implement QUEUED spinlocks on powerpc
  2017-02-02  4:40     ` Eric Dumazet
@ 2017-02-02  4:42       ` Benjamin Herrenschmidt
  2017-02-07  6:21       ` panxinhui
  1 sibling, 0 replies; 10+ messages in thread
From: Benjamin Herrenschmidt @ 2017-02-02  4:42 UTC (permalink / raw)
  To: Eric Dumazet, Michael Ellerman
  Cc: Eric Dumazet, Paul Mackerras, linuxppc-dev, Kevin Hao,
	Torsten Duwe, Pan Xinhui

On Wed, 2017-02-01 at 20:40 -0800, Eric Dumazet wrote:
> A typical benchmark would be to use 200 concurrent netperf -t TCP_RR,
> through a single qdisc (protected by a spinlock)
> 
> Non ticket/queued spinlocks behave quite bad in this scenario.
> 
> I can try this next week if you want.

That would be great !

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] implement QUEUED spinlocks on powerpc
  2017-02-02  4:40     ` Eric Dumazet
  2017-02-02  4:42       ` Benjamin Herrenschmidt
@ 2017-02-07  6:21       ` panxinhui
  2017-02-07  6:46         ` Eric Dumazet
  1 sibling, 1 reply; 10+ messages in thread
From: panxinhui @ 2017-02-07  6:21 UTC (permalink / raw)
  To: Eric Dumazet, Michael Ellerman
  Cc: Benjamin Herrenschmidt, Eric Dumazet, Paul Mackerras,
	linuxppc-dev, Kevin Hao, Torsten Duwe, Pan Xinhui

[-- Attachment #1: Type: text/plain, Size: 2504 bytes --]


在 2017/2/2 下午12:40, Eric Dumazet 写道:
> On Wed, Feb 1, 2017 at 8:04 PM, Michael Ellerman <mpe@ellerman.id.au> wrote:
>> Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:
>>
>>> On Wed, 2017-02-01 at 09:05 -0800, Eric Dumazet wrote:
>>>> Hi all
>>>>
>>>> Is anybody working on adding QUEUED spinlocks to powerpc 64bit ?
>>>>
>>>> I've seen past attempts with ticket spinlocks
>>>> ( https://patchwork.ozlabs.org/patch/449381/ and other related links
>>>> )
>>>>
>>>> But it looks ticket spinlocks are a thing of the past.
>>>
>>> Yes, we have a tentative implementation of qspinlock and pv variants:
>>>
>>> https://patchwork.ozlabs.org/patch/703139/
>>> https://patchwork.ozlabs.org/patch/703140/
>>> https://patchwork.ozlabs.org/patch/703141/
>>> https://patchwork.ozlabs.org/patch/703142/
>>> https://patchwork.ozlabs.org/patch/703143/
>>> https://patchwork.ozlabs.org/patch/703144/
>>>
>>> Michael, what's the status with getting that merged ?
>>
>> Needs a good review, and the benchmark results were not all that
>> compelling - though perhaps they were just the wrong benchmarks.
>
> A typical benchmark would be to use 200 concurrent netperf -t TCP_RR,
> through a single qdisc (protected by a spinlock)
>
hi all
	I do some netperf tests and get some benchmark results.
I also attach my test script and netperf-result(Excel)

There are two machine. one runs netserver and the other runs netperf 
benchmark. 1000Mbps network is connected with them.

#ip link infomation
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast 
state UNKNOWN mode DEFAULT group default qlen 1000
      link/ether ba:68:9c:14:32:02 brd ff:ff:ff:ff:ff:ff

According to the results, there is not much performance gap with each 
other. And as we are only testing the throughput, the pvqspinlock shows 
the overhead of its pv stuff. but qspinlock shows a little improvement 
than spinlock. My simple summary in this testcase is
qspinlock > spinlock > pvqspinlock.

when run 200 concurrent netperf, I paste the total throughput here.

	concurrent runners| total throughput | variance
-------------------------------------------
spinlock	| 199 | 66882.8 | 89.93
-------------------------------------------
qspinlock 	| 199 | 66350.4 | 72.0239
-------------------------------------------
pvqspinlock	| 199 | 64740.5 | 85.7837

You could see more data in nerperf.xlsx

thanks
xinhui

> Non ticket/queued spinlocks behave quite bad in this scenario.
>
> I can try this next week if you want.
>


[-- Attachment #2: np.sh --]
[-- Type: application/x-sh, Size: 843 bytes --]

[-- Attachment #3: netperf-restult.xlsx --]
[-- Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet, Size: 129045 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] implement QUEUED spinlocks on powerpc
  2017-02-07  6:21       ` panxinhui
@ 2017-02-07  6:46         ` Eric Dumazet
  2017-02-07  7:22           ` panxinhui
  2017-02-13  9:08           ` panxinhui
  0 siblings, 2 replies; 10+ messages in thread
From: Eric Dumazet @ 2017-02-07  6:46 UTC (permalink / raw)
  To: panxinhui
  Cc: Michael Ellerman, Benjamin Herrenschmidt, Eric Dumazet,
	Paul Mackerras, linuxppc-dev, Kevin Hao, Torsten Duwe, Pan Xinhui

On Mon, Feb 6, 2017 at 10:21 PM, panxinhui <xinhui@linux.vnet.ibm.com> wrote:

> hi all
>         I do some netperf tests and get some benchmark results.
> I also attach my test script and netperf-result(Excel)
>
> There are two machine. one runs netserver and the other runs netperf
> benchmark. 1000Mbps network is connected with them.
>
> #ip link infomation
> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state
> UNKNOWN mode DEFAULT group default qlen 1000
>      link/ether ba:68:9c:14:32:02 brd ff:ff:ff:ff:ff:ff
>
> According to the results, there is not much performance gap with each other.
> And as we are only testing the throughput, the pvqspinlock shows the
> overhead of its pv stuff. but qspinlock shows a little improvement than
> spinlock. My simple summary in this testcase is
> qspinlock > spinlock > pvqspinlock.
>
> when run 200 concurrent netperf, I paste the total throughput here.
>
>         concurrent runners| total throughput | variance
> -------------------------------------------
> spinlock        | 199 | 66882.8 | 89.93
> -------------------------------------------
> qspinlock       | 199 | 66350.4 | 72.0239
> -------------------------------------------
> pvqspinlock     | 199 | 64740.5 | 85.7837
>
> You could see more data in nerperf.xlsx
>
> thanks
> xinhui


Hi xinhui

1Gbit NIC is too slow for this use case. I would try a 10Gbit NIC at least...

Alternatively, you could use loopback interface.  (netperf -H 127.0.0.1)

tc qd add dev lo root pfifo limit 10000

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] implement QUEUED spinlocks on powerpc
  2017-02-07  6:46         ` Eric Dumazet
@ 2017-02-07  7:22           ` panxinhui
  2017-02-13  9:08           ` panxinhui
  1 sibling, 0 replies; 10+ messages in thread
From: panxinhui @ 2017-02-07  7:22 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Michael Ellerman, Benjamin Herrenschmidt, Eric Dumazet,
	Paul Mackerras, linuxppc-dev, Kevin Hao, Torsten Duwe, Pan Xinhui



在 2017/2/7 下午2:46, Eric Dumazet 写道:
> On Mon, Feb 6, 2017 at 10:21 PM, panxinhui <xinhui@linux.vnet.ibm.com> wrote:
>
>> hi all
>>         I do some netperf tests and get some benchmark results.
>> I also attach my test script and netperf-result(Excel)
>>
>> There are two machine. one runs netserver and the other runs netperf
>> benchmark. 1000Mbps network is connected with them.
>>
>> #ip link infomation
>> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state
>> UNKNOWN mode DEFAULT group default qlen 1000
>>      link/ether ba:68:9c:14:32:02 brd ff:ff:ff:ff:ff:ff
>>
>> According to the results, there is not much performance gap with each other.
>> And as we are only testing the throughput, the pvqspinlock shows the
>> overhead of its pv stuff. but qspinlock shows a little improvement than
>> spinlock. My simple summary in this testcase is
>> qspinlock > spinlock > pvqspinlock.
>>
>> when run 200 concurrent netperf, I paste the total throughput here.
>>
>>         concurrent runners| total throughput | variance
>> -------------------------------------------
>> spinlock        | 199 | 66882.8 | 89.93
>> -------------------------------------------
>> qspinlock       | 199 | 66350.4 | 72.0239
>> -------------------------------------------
>> pvqspinlock     | 199 | 64740.5 | 85.7837
>>
>> You could see more data in nerperf.xlsx
>>
>> thanks
>> xinhui
>
>
> Hi xinhui
>
> 1Gbit NIC is too slow for this use case. I would try a 10Gbit NIC at least...
>
> Alternatively, you could use loopback interface.  (netperf -H 127.0.0.1)
>
> tc qd add dev lo root pfifo limit 10000
>

great, thanks
xinhui

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] implement QUEUED spinlocks on powerpc
  2017-02-07  6:46         ` Eric Dumazet
  2017-02-07  7:22           ` panxinhui
@ 2017-02-13  9:08           ` panxinhui
  2017-02-15 10:17             ` panxinhui
  1 sibling, 1 reply; 10+ messages in thread
From: panxinhui @ 2017-02-13  9:08 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Michael Ellerman, Benjamin Herrenschmidt, Eric Dumazet,
	Paul Mackerras, linuxppc-dev, Kevin Hao, Torsten Duwe, Pan Xinhui

[-- Attachment #1: Type: text/plain, Size: 2076 bytes --]



在 2017/2/7 下午2:46, Eric Dumazet 写道:
> On Mon, Feb 6, 2017 at 10:21 PM, panxinhui <xinhui@linux.vnet.ibm.com> wrote:
> 
>> hi all
>>         I do some netperf tests and get some benchmark results.
>> I also attach my test script and netperf-result(Excel)
>>
HI, all
I use loopback interface to run netperf tests,
#tc qd add dev lo root pfifo limit 10000
#ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc pfifo state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

and put the result in netperf.xlsx(excel)

It is a 32 vcpus P8 machine, with 32Gib memory.

This time spinlock is the best one, qspinlock > pvqspinlock. So sad.

thanks
xinhui
>> There are two machine. one runs netserver and the other runs netperf
>> benchmark. 1000Mbps network is connected with them.
>>
>> #ip link infomation
>> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state
>> UNKNOWN mode DEFAULT group default qlen 1000
>>      link/ether ba:68:9c:14:32:02 brd ff:ff:ff:ff:ff:ff
>>
>> According to the results, there is not much performance gap with each other.
>> And as we are only testing the throughput, the pvqspinlock shows the
>> overhead of its pv stuff. but qspinlock shows a little improvement than
>> spinlock. My simple summary in this testcase is
>> qspinlock > spinlock > pvqspinlock.
>>
>> when run 200 concurrent netperf, I paste the total throughput here.
>>
>>         concurrent runners| total throughput | variance
>> -------------------------------------------
>> spinlock        | 199 | 66882.8 | 89.93
>> -------------------------------------------
>> qspinlock       | 199 | 66350.4 | 72.0239
>> -------------------------------------------
>> pvqspinlock     | 199 | 64740.5 | 85.7837
>>
>> You could see more data in nerperf.xlsx
>>
>> thanks
>> xinhui
> 
> 
> Hi xinhui
> 
> 1Gbit NIC is too slow for this use case. I would try a 10Gbit NIC at least...
> 
> Alternatively, you could use loopback interface.  (netperf -H 127.0.0.1)
> 
> tc qd add dev lo root pfifo limit 10000
> 

[-- Attachment #2: netperf.xlsx --]
[-- Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet, Size: 61378 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] implement QUEUED spinlocks on powerpc
  2017-02-13  9:08           ` panxinhui
@ 2017-02-15 10:17             ` panxinhui
  0 siblings, 0 replies; 10+ messages in thread
From: panxinhui @ 2017-02-15 10:17 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Michael Ellerman, Benjamin Herrenschmidt, Eric Dumazet,
	Paul Mackerras, linuxppc-dev, Kevin Hao, Torsten Duwe, Pan Xinhui

[-- Attachment #1: Type: text/plain, Size: 2418 bytes --]



在 2017/2/13 下午5:08, panxinhui 写道:
> 
> 
> 在 2017/2/7 下午2:46, Eric Dumazet 写道:
>> On Mon, Feb 6, 2017 at 10:21 PM, panxinhui <xinhui@linux.vnet.ibm.com> wrote:
>>
>>> hi all
>>>         I do some netperf tests and get some benchmark results.
>>> I also attach my test script and netperf-result(Excel)
>>>
> HI, all
> I use loopback interface to run netperf tests,
> #tc qd add dev lo root pfifo limit 10000
> #ip link
> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc pfifo state UNKNOWN mode DEFAULT group default qlen 1000
>     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> 
> and put the result in netperf.xlsx(excel)
> 
> It is a 32 vcpus P8 machine, with 32Gib memory.
> 
> This time spinlock is the best one, qspinlock > pvqspinlock. So sad.
> 
This time, I have appiled some optimising patches on pvqspinlock.
When there is a high contention, the performance has a good improvement ans is very similar to spinlock.

Result is attached in netperf.xlsx

thanks
xinhui

> thanks
> xinhui
>>> There are two machine. one runs netserver and the other runs netperf
>>> benchmark. 1000Mbps network is connected with them.
>>>
>>> #ip link infomation
>>> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state
>>> UNKNOWN mode DEFAULT group default qlen 1000
>>>      link/ether ba:68:9c:14:32:02 brd ff:ff:ff:ff:ff:ff
>>>
>>> According to the results, there is not much performance gap with each other.
>>> And as we are only testing the throughput, the pvqspinlock shows the
>>> overhead of its pv stuff. but qspinlock shows a little improvement than
>>> spinlock. My simple summary in this testcase is
>>> qspinlock > spinlock > pvqspinlock.
>>>
>>> when run 200 concurrent netperf, I paste the total throughput here.
>>>
>>>         concurrent runners| total throughput | variance
>>> -------------------------------------------
>>> spinlock        | 199 | 66882.8 | 89.93
>>> -------------------------------------------
>>> qspinlock       | 199 | 66350.4 | 72.0239
>>> -------------------------------------------
>>> pvqspinlock     | 199 | 64740.5 | 85.7837
>>>
>>> You could see more data in nerperf.xlsx
>>>
>>> thanks
>>> xinhui
>>
>>
>> Hi xinhui
>>
>> 1Gbit NIC is too slow for this use case. I would try a 10Gbit NIC at least...
>>
>> Alternatively, you could use loopback interface.  (netperf -H 127.0.0.1)
>>
>> tc qd add dev lo root pfifo limit 10000
>>

[-- Attachment #2: netperf.xlsx --]
[-- Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet, Size: 72195 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-02-15 10:17 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-02-01 17:05 [RFC] implement QUEUED spinlocks on powerpc Eric Dumazet
2017-02-01 20:37 ` Benjamin Herrenschmidt
2017-02-02  4:04   ` Michael Ellerman
2017-02-02  4:40     ` Eric Dumazet
2017-02-02  4:42       ` Benjamin Herrenschmidt
2017-02-07  6:21       ` panxinhui
2017-02-07  6:46         ` Eric Dumazet
2017-02-07  7:22           ` panxinhui
2017-02-13  9:08           ` panxinhui
2017-02-15 10:17             ` panxinhui

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).