From: Shaokun Zhang <zhangshaokun@hisilicon.com>
To: Will Deacon <will@kernel.org>
Cc: Yuqi Jin <jinyuqi@huawei.com>,
Andrew Murray <amurray@thegoodpenguin.co.uk>,
linux-arm-kernel@lists.infradead.org,
Catalin Marinas <catalin.marinas@arm.com>
Subject: Re: [PATCH] arm64: atomics: Fix the issue on xchg when switch to atomic instruction
Date: Thu, 7 May 2020 15:54:32 +0800 [thread overview]
Message-ID: <a44c1318-bace-36ec-457e-c2856ad3e191@hisilicon.com> (raw)
In-Reply-To: <295b5b2c-ae8d-3db2-3621-5d5ba15481fd@hisilicon.com>
Hi Will,
On 2020/5/6 19:30, Shaokun Zhang wrote:
> Hi Will,
>
> On 2020/5/6 18:44, Will Deacon wrote:
>> On Wed, May 06, 2020 at 06:39:16PM +0800, Shaokun Zhang wrote:
>>> Apologies for my noise, you are right and it's my mistake.
>>
>> No need to apologise, but thanks for letting me know.
>>
>>> On 2020/5/6 15:53, Will Deacon wrote:
>>>> On Wed, May 06, 2020 at 03:00:39PM +0800, Shaokun Zhang wrote:
>>>>> On 2020/5/5 17:15, Will Deacon wrote:
>>>>>> On Tue, May 05, 2020 at 05:02:35PM +0800, Shaokun Zhang wrote:
>>>>>>> From: Yuqi Jin <jinyuqi@huawei.com>
>>>>>>>
>>>>>>> Since commit addfc38672c7 ("arm64: atomics: avoid out-of-line ll/sc atomics"),
>>>>>>> it has provided inline implementations of both LSE and ll/sc and used a static
>>>>>>> key to select between them, which allows the compiler to generate better
>>>>>>> atomics code.
>>>>>>> However, xchg still uses the original method which would fail to switch to
>>>>>>> the atomic instruction correctly, Let's fix this issue.
>>>>>>
>>>>>> Please can you elaborate on the failure mode? The current code looks alright
>>>>>
>>>>> When enable CONFIG_ARM64_LSE_ATOMICS, xchg is failed to switch to swp instruction
>>>>> or dynamic replacement instructions are not seen.
>>>>>
>>>>> We do some tests on the copy of xchg_tail,:
>>>>> u32 xchg_tail_my(struct qspinlock *lock, u32 tail)
>>>>> {
>>>>> return (u32)xchg_relaxed(&lock->tail,
>>>>> tail >> _Q_TAIL_OFFSET) << _Q_TAIL_OFFSET;
>>>>> }
>>>>> and the asm code is as follows:
>>>>>
>>>>> ffff80001015b050 <xchg_tail_my>:
>>>>> ffff80001015b050: a9be7bfd stp x29, x30, [sp, #-32]!
>>>>> ffff80001015b054: 910003fd mov x29, sp
>>>>> ffff80001015b058: a90153f3 stp x19, x20, [sp, #16]
>>>>> ffff80001015b05c: 2a0103f3 mov w19, w1
>>>>> ffff80001015b060: aa0003f4 mov x20, x0
>>>>> ffff80001015b064: aa1e03e0 mov x0, x30
>>>>> ffff80001015b068: 97fd07ee bl ffff80001009d020 <_mcount>
>>>>> ffff80001015b06c: 53107e61 lsr w1, w19, #16
>>>>> ffff80001015b070: 91000a83 add x3, x20, #0x2
>>>>> ffff80001015b074: f9800071 prfm pstl1strm, [x3]
>>>>> ffff80001015b078: 485f7c60 ldxrh w0, [x3]
>>>>> ffff80001015b07c: 48027c61 stxrh w2, w1, [x3]
>>>>> ffff80001015b080: 35ffffc2 cbnz w2, ffff80001015b078 <xchg_tail_my+0x28>
>>>>> ffff80001015b084: 53103c00 lsl w0, w0, #16
>>>>> ffff80001015b088: a94153f3 ldp x19, x20, [sp, #16]
>>>>> ffff80001015b08c: a8c27bfd ldp x29, x30, [sp], #32
>>>>> ffff80001015b090: d65f03c0 ret
>>>>
>>>> This should get patched at runtime, but you're saying that's not happening?
>>>>
>>>
>>> My mistake, I didn't check the runtime carefully.
>>
>> Good to hear there's not a bug, but if you see a performance benefit from
>> using the static-key for xchg() then I'd obviously be open to changing it
>
> Thanks your reply, if I follow the two methods correctly, static-key will
> not consume '__nops(3)', others are the same.
>
> I will run some tests to check the performance ;-)
>
We compare the two methods on Huawei Kunpeng920 and the throughput per second
as follows:
one core |without delay| 200ns delay|
--------------------------------------
static-key| 55294942 | 3937156 |
--------------------------------------
runtime | 54706282 | 3918188 |
--------------------------------------
If we run this test using 32-cores, the result is almost the same.
Test code is followed:
if(delay_o) {
while (get_cycles() <= (time_temp + t_cnt)) {
(void)atomic64_xchg(&wk_in->num, 1);
myndelay(delay_o);
(void)atomic64_xchg(&wk_in->num, 2);
myndelay(delay_o);
w_cnt+=2;
}
} else {
while (get_cycles() <= (time_temp + t_cnt)){
(void)atomic64_xchg(&wk_in->num, 1);
(void)atomic64_xchg(&wk_in->num, 2);
w_cnt+=2;
}
}
Thanks,
Shaokun
> Thanks,
> Shaokun
>
>
>> over as well.
>>
>> Thanks,
>>
>> Will
>>
>> .
>>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2020-05-07 7:54 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-05 9:02 [PATCH] arm64: atomics: Fix the issue on xchg when switch to atomic instruction Shaokun Zhang
2020-05-05 9:15 ` Will Deacon
2020-05-06 7:00 ` Shaokun Zhang
2020-05-06 7:53 ` Will Deacon
2020-05-06 10:39 ` Shaokun Zhang
2020-05-06 10:44 ` Will Deacon
2020-05-06 11:30 ` Shaokun Zhang
2020-05-07 7:54 ` Shaokun Zhang [this message]
2020-05-25 9:27 ` Shaokun Zhang
2020-05-26 19:55 ` Will Deacon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a44c1318-bace-36ec-457e-c2856ad3e191@hisilicon.com \
--to=zhangshaokun@hisilicon.com \
--cc=amurray@thegoodpenguin.co.uk \
--cc=catalin.marinas@arm.com \
--cc=jinyuqi@huawei.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox