* Commit 81a43adae3b9 (locking/mutex: Use acquire/release semantics) causing failures on arm64 (ThunderX) @ 2015-12-10 19:43 David Daney [not found] ` <SN1PR07MB21577C72379C8440A208D6BC9EEA0@SN1PR07MB2157.namprd07.prod.outlook.com> 2015-12-11 9:59 ` Will Deacon 0 siblings, 2 replies; 20+ messages in thread From: David Daney @ 2015-12-10 19:43 UTC (permalink / raw) To: linux-arm-kernel Hi, We are getting soft lockup OOPs on Cavium CN88XX (A.K.A. ThunderX), which is an arm64 implementation. A typical failure shows multiple threads stuck in mutex operations like this: . . . [ 68.909873] Task dump for CPU 18: [ 68.909876] systemd-udevd R running task 0 537 534 0x00000002 [ 68.909877] Call trace: [ 68.909880] [<fffffe0000088858>] dump_backtrace+0x0/0x17c [ 68.909883] [<fffffe00000889f8>] show_stack+0x24/0x2c [ 68.909885] [<fffffe00000c4210>] sched_show_task+0xb0/0x104 [ 68.909888] [<fffffe00000c682c>] dump_cpu_task+0x48/0x54 [ 68.909890] [<fffffe00000ee5e0>] rcu_dump_cpu_stacks+0x9c/0xec [ 68.909893] [<fffffe00000f2c9c>] rcu_check_callbacks+0x524/0xa18 [ 68.909896] [<fffffe00000f83a0>] update_process_times+0x44/0x74 [ 68.909899] [<fffffe00001078d4>] tick_sched_timer+0x78/0x1ac [ 68.909901] [<fffffe00000f8b74>] __hrtimer_run_queues+0x148/0x2d4 [ 68.909903] [<fffffe00000f9464>] hrtimer_interrupt+0xb0/0x1f4 [ 68.909906] [<fffffe000056e6e8>] arch_timer_handler_phys+0x3c/0x48 [ 68.909909] [<fffffe00000e7fd4>] handle_percpu_devid_irq+0xb0/0x1b0 [ 68.909912] [<fffffe00000e33c4>] generic_handle_irq+0x34/0x4c [ 68.909914] [<fffffe00000e3738>] __handle_domain_irq+0x90/0xfc [ 68.909916] [<fffffe0000081d80>] gic_handle_irq+0x90/0x18c [ 68.909918] Exception stack(0xfffffe03f14e3920 to 0xfffffe03f14e3a40) [ 68.909921] 3920: fffffe03fd5c5800 fffffe0000c55800 fffffe03f14e3a80 fffffe00000dabd8 [ 68.909924] 3940: 00000000a0000145 0000000000000015 fffffe03e9602400 fffffe00002fddb0 [ 68.909927] 3960: 0000000000000000 0000000000000000 fffffe03fd5c5810 fffffe03f14e0000 [ 68.909929] 3980: 0000000000000001 ffffffffff000000 fffffe03db307e38 0000000000000000 [ 68.909932] 39a0: 0000000000737973 00000000ffffffff 0000000000000000 000000003b364d50 [ 68.909935] 39c0: 0000000000000018 ffffffffa99641af 0016fd71b6000000 003b9aca00000000 [ 68.909937] 39e0: fffffe00001f1508 000003ff9b9fd028 000003ffed7a0a10 fffffe03fd5c5800 [ 68.909940] 3a00: fffffe0000c55800 fffffe0000cea1c8 fffffe03fd5a5800 fffffe0000ca2eb0 [ 68.909943] 3a20: 0000000000000015 fffffe03e9602400 fffffe0000cea1c8 fffffe0000712000 [ 68.909945] [<fffffe0000084ce8>] el1_irq+0x68/0xd8 [ 68.909948] [<fffffe00000da03c>] mutex_optimistic_spin+0x9c/0x1d0 [ 68.909951] [<fffffe00006fe4b8>] __mutex_lock_slowpath+0x44/0x158 [ 68.909953] [<fffffe00006fe620>] mutex_lock+0x54/0x58 [ 68.909956] [<fffffe0000265efc>] kernfs_iop_permission+0x38/0x70 [ 68.909959] [<fffffe00001fbf50>] __inode_permission+0x88/0xd8 [ 68.909961] [<fffffe00001fbfd0>] inode_permission+0x30/0x6c [ 68.909964] [<fffffe00001fe26c>] link_path_walk+0x68/0x4d4 [ 68.909966] [<fffffe00001ffa14>] path_openat+0xb4/0x2bc [ 68.909968] [<fffffe000020123c>] do_filp_open+0x74/0xd0 [ 68.909971] [<fffffe00001f13e4>] do_sys_open+0x14c/0x228 [ 68.909973] [<fffffe00001f1544>] SyS_openat+0x3c/0x48 [ 68.909976] [<fffffe00000851f0>] el0_svc_naked+0x24/0x28 . . . Reverting 81a43adae3b9 (locking/mutex: Use acquire/release semantics) Makes the problem go away. At this point it is unknown if this patch is incorrect, or if the underlying ARM64 atomic_*_{acquire,release} primitives are defective, or if the problem lies elsewhere. I am not requesting any specific action with this e-mail, but wanted to draw attention to the issue. Undoubtedly we will be able to provide more detailed information about the issue in the coming days. Thanks, David Daney ^ permalink raw reply [flat|nested] 20+ messages in thread
[parent not found: <SN1PR07MB21577C72379C8440A208D6BC9EEA0@SN1PR07MB2157.namprd07.prod.outlook.com>]
* FW: Commit 81a43adae3b9 (locking/mutex: Use acquire/release semantics) causing failures on arm64 (ThunderX) [not found] ` <SN1PR07MB21577C72379C8440A208D6BC9EEA0@SN1PR07MB2157.namprd07.prod.outlook.com> @ 2015-12-11 3:29 ` Andrew Pinski 2015-12-11 4:51 ` Andrew Pinski 2015-12-11 7:33 ` Peter Zijlstra 0 siblings, 2 replies; 20+ messages in thread From: Andrew Pinski @ 2015-12-11 3:29 UTC (permalink / raw) To: linux-arm-kernel On Thu, Dec 10, 2015 at 11:44 AM, David Danny wrote: > > Hi, > > We are getting soft lockup OOPs on Cavium CN88XX (A.K.A. ThunderX), which is an arm64 implementation. I get a slightly different OOPs and reverting c55a6ffa6285e29f874ed403979472631ec70bff I was able to boot. What I saw with osq_lock.c was that osq_wait_next is called for both lock and unlock case so it might need both barriers. The other question comes does atomic_cmpxchg_release have release semantics when the compare fails? Right now it does not. Thanks, Andrew > > A typical failure shows multiple threads stuck in mutex operations like > this: > > . > . > . > [ 68.909873] Task dump for CPU 18: > [ 68.909876] systemd-udevd R running task 0 537 534 > 0x00000002 > [ 68.909877] Call trace: > [ 68.909880] [<fffffe0000088858>] dump_backtrace+0x0/0x17c > [ 68.909883] [<fffffe00000889f8>] show_stack+0x24/0x2c > [ 68.909885] [<fffffe00000c4210>] sched_show_task+0xb0/0x104 > [ 68.909888] [<fffffe00000c682c>] dump_cpu_task+0x48/0x54 > [ 68.909890] [<fffffe00000ee5e0>] rcu_dump_cpu_stacks+0x9c/0xec > [ 68.909893] [<fffffe00000f2c9c>] rcu_check_callbacks+0x524/0xa18 > [ 68.909896] [<fffffe00000f83a0>] update_process_times+0x44/0x74 > [ 68.909899] [<fffffe00001078d4>] tick_sched_timer+0x78/0x1ac > [ 68.909901] [<fffffe00000f8b74>] __hrtimer_run_queues+0x148/0x2d4 > [ 68.909903] [<fffffe00000f9464>] hrtimer_interrupt+0xb0/0x1f4 > [ 68.909906] [<fffffe000056e6e8>] arch_timer_handler_phys+0x3c/0x48 > [ 68.909909] [<fffffe00000e7fd4>] handle_percpu_devid_irq+0xb0/0x1b0 > [ 68.909912] [<fffffe00000e33c4>] generic_handle_irq+0x34/0x4c > [ 68.909914] [<fffffe00000e3738>] __handle_domain_irq+0x90/0xfc > [ 68.909916] [<fffffe0000081d80>] gic_handle_irq+0x90/0x18c > [ 68.909918] Exception stack(0xfffffe03f14e3920 to 0xfffffe03f14e3a40) > [ 68.909921] 3920: fffffe03fd5c5800 fffffe0000c55800 fffffe03f14e3a80 > fffffe00000dabd8 > [ 68.909924] 3940: 00000000a0000145 0000000000000015 fffffe03e9602400 > fffffe00002fddb0 > [ 68.909927] 3960: 0000000000000000 0000000000000000 fffffe03fd5c5810 > fffffe03f14e0000 > [ 68.909929] 3980: 0000000000000001 ffffffffff000000 fffffe03db307e38 > 0000000000000000 > [ 68.909932] 39a0: 0000000000737973 00000000ffffffff 0000000000000000 > 000000003b364d50 > [ 68.909935] 39c0: 0000000000000018 ffffffffa99641af 0016fd71b6000000 > 003b9aca00000000 > [ 68.909937] 39e0: fffffe00001f1508 000003ff9b9fd028 000003ffed7a0a10 > fffffe03fd5c5800 > [ 68.909940] 3a00: fffffe0000c55800 fffffe0000cea1c8 fffffe03fd5a5800 > fffffe0000ca2eb0 > [ 68.909943] 3a20: 0000000000000015 fffffe03e9602400 fffffe0000cea1c8 > fffffe0000712000 > [ 68.909945] [<fffffe0000084ce8>] el1_irq+0x68/0xd8 > [ 68.909948] [<fffffe00000da03c>] mutex_optimistic_spin+0x9c/0x1d0 > [ 68.909951] [<fffffe00006fe4b8>] __mutex_lock_slowpath+0x44/0x158 > [ 68.909953] [<fffffe00006fe620>] mutex_lock+0x54/0x58 > [ 68.909956] [<fffffe0000265efc>] kernfs_iop_permission+0x38/0x70 > [ 68.909959] [<fffffe00001fbf50>] __inode_permission+0x88/0xd8 > [ 68.909961] [<fffffe00001fbfd0>] inode_permission+0x30/0x6c > [ 68.909964] [<fffffe00001fe26c>] link_path_walk+0x68/0x4d4 > [ 68.909966] [<fffffe00001ffa14>] path_openat+0xb4/0x2bc > [ 68.909968] [<fffffe000020123c>] do_filp_open+0x74/0xd0 > [ 68.909971] [<fffffe00001f13e4>] do_sys_open+0x14c/0x228 > [ 68.909973] [<fffffe00001f1544>] SyS_openat+0x3c/0x48 > [ 68.909976] [<fffffe00000851f0>] el0_svc_naked+0x24/0x28 > . > . > . > > Reverting 81a43adae3b9 (locking/mutex: Use acquire/release semantics) Makes the problem go away. > > At this point it is unknown if this patch is incorrect, or if the underlying ARM64 atomic_*_{acquire,release} primitives are defective, or if the problem lies elsewhere. > > I am not requesting any specific action with this e-mail, but wanted to draw attention to the issue. Undoubtedly we will be able to provide more detailed information about the issue in the coming days. > > Thanks, > David Daney > ^ permalink raw reply [flat|nested] 20+ messages in thread
* FW: Commit 81a43adae3b9 (locking/mutex: Use acquire/release semantics) causing failures on arm64 (ThunderX) 2015-12-11 3:29 ` FW: " Andrew Pinski @ 2015-12-11 4:51 ` Andrew Pinski 2015-12-11 8:41 ` Peter Zijlstra 2015-12-11 7:33 ` Peter Zijlstra 1 sibling, 1 reply; 20+ messages in thread From: Andrew Pinski @ 2015-12-11 4:51 UTC (permalink / raw) To: linux-arm-kernel On Thu, Dec 10, 2015 at 7:29 PM, Andrew Pinski <pinskia@gmail.com> wrote: > On Thu, Dec 10, 2015 at 11:44 AM, David Danny wrote: >> >> Hi, >> >> We are getting soft lockup OOPs on Cavium CN88XX (A.K.A. ThunderX), which is an arm64 implementation. > > I get a slightly different OOPs and reverting > c55a6ffa6285e29f874ed403979472631ec70bff I was able to boot. > What I saw with osq_lock.c was that osq_wait_next is called for both > lock and unlock case so it might need both barriers. > The other question comes does atomic_cmpxchg_release have release > semantics when the compare fails? Right now it does not. So looking further I think I understand what is going wrong and why c55a6ffa6285e29f874ed403979472631ec70bff is incorrect. The compare and swap inside osq_lock needs to be both release and acquire semantics memory barriers because the stores (to node) need to be visible to the other cores before the setting of lock->tail happens. Because if node->next is is up to date, we might end up in osq_wait_next and waiting in an infinite loop while waiting on ourselves. I think we should revert c55a6ffa6285e29f874ed403979472631ec70bff fully as mentioned for the reasons above. Thanks, Andrew Pinski > > Thanks, > Andrew > > >> >> A typical failure shows multiple threads stuck in mutex operations like >> this: >> >> . >> . >> . >> [ 68.909873] Task dump for CPU 18: >> [ 68.909876] systemd-udevd R running task 0 537 534 >> 0x00000002 >> [ 68.909877] Call trace: >> [ 68.909880] [<fffffe0000088858>] dump_backtrace+0x0/0x17c >> [ 68.909883] [<fffffe00000889f8>] show_stack+0x24/0x2c >> [ 68.909885] [<fffffe00000c4210>] sched_show_task+0xb0/0x104 >> [ 68.909888] [<fffffe00000c682c>] dump_cpu_task+0x48/0x54 >> [ 68.909890] [<fffffe00000ee5e0>] rcu_dump_cpu_stacks+0x9c/0xec >> [ 68.909893] [<fffffe00000f2c9c>] rcu_check_callbacks+0x524/0xa18 >> [ 68.909896] [<fffffe00000f83a0>] update_process_times+0x44/0x74 >> [ 68.909899] [<fffffe00001078d4>] tick_sched_timer+0x78/0x1ac >> [ 68.909901] [<fffffe00000f8b74>] __hrtimer_run_queues+0x148/0x2d4 >> [ 68.909903] [<fffffe00000f9464>] hrtimer_interrupt+0xb0/0x1f4 >> [ 68.909906] [<fffffe000056e6e8>] arch_timer_handler_phys+0x3c/0x48 >> [ 68.909909] [<fffffe00000e7fd4>] handle_percpu_devid_irq+0xb0/0x1b0 >> [ 68.909912] [<fffffe00000e33c4>] generic_handle_irq+0x34/0x4c >> [ 68.909914] [<fffffe00000e3738>] __handle_domain_irq+0x90/0xfc >> [ 68.909916] [<fffffe0000081d80>] gic_handle_irq+0x90/0x18c >> [ 68.909918] Exception stack(0xfffffe03f14e3920 to 0xfffffe03f14e3a40) >> [ 68.909921] 3920: fffffe03fd5c5800 fffffe0000c55800 fffffe03f14e3a80 >> fffffe00000dabd8 >> [ 68.909924] 3940: 00000000a0000145 0000000000000015 fffffe03e9602400 >> fffffe00002fddb0 >> [ 68.909927] 3960: 0000000000000000 0000000000000000 fffffe03fd5c5810 >> fffffe03f14e0000 >> [ 68.909929] 3980: 0000000000000001 ffffffffff000000 fffffe03db307e38 >> 0000000000000000 >> [ 68.909932] 39a0: 0000000000737973 00000000ffffffff 0000000000000000 >> 000000003b364d50 >> [ 68.909935] 39c0: 0000000000000018 ffffffffa99641af 0016fd71b6000000 >> 003b9aca00000000 >> [ 68.909937] 39e0: fffffe00001f1508 000003ff9b9fd028 000003ffed7a0a10 >> fffffe03fd5c5800 >> [ 68.909940] 3a00: fffffe0000c55800 fffffe0000cea1c8 fffffe03fd5a5800 >> fffffe0000ca2eb0 >> [ 68.909943] 3a20: 0000000000000015 fffffe03e9602400 fffffe0000cea1c8 >> fffffe0000712000 >> [ 68.909945] [<fffffe0000084ce8>] el1_irq+0x68/0xd8 >> [ 68.909948] [<fffffe00000da03c>] mutex_optimistic_spin+0x9c/0x1d0 >> [ 68.909951] [<fffffe00006fe4b8>] __mutex_lock_slowpath+0x44/0x158 >> [ 68.909953] [<fffffe00006fe620>] mutex_lock+0x54/0x58 >> [ 68.909956] [<fffffe0000265efc>] kernfs_iop_permission+0x38/0x70 >> [ 68.909959] [<fffffe00001fbf50>] __inode_permission+0x88/0xd8 >> [ 68.909961] [<fffffe00001fbfd0>] inode_permission+0x30/0x6c >> [ 68.909964] [<fffffe00001fe26c>] link_path_walk+0x68/0x4d4 >> [ 68.909966] [<fffffe00001ffa14>] path_openat+0xb4/0x2bc >> [ 68.909968] [<fffffe000020123c>] do_filp_open+0x74/0xd0 >> [ 68.909971] [<fffffe00001f13e4>] do_sys_open+0x14c/0x228 >> [ 68.909973] [<fffffe00001f1544>] SyS_openat+0x3c/0x48 >> [ 68.909976] [<fffffe00000851f0>] el0_svc_naked+0x24/0x28 >> . >> . >> . >> >> Reverting 81a43adae3b9 (locking/mutex: Use acquire/release semantics) Makes the problem go away. >> >> At this point it is unknown if this patch is incorrect, or if the underlying ARM64 atomic_*_{acquire,release} primitives are defective, or if the problem lies elsewhere. >> >> I am not requesting any specific action with this e-mail, but wanted to draw attention to the issue. Undoubtedly we will be able to provide more detailed information about the issue in the coming days. >> >> Thanks, >> David Daney >> ^ permalink raw reply [flat|nested] 20+ messages in thread
* FW: Commit 81a43adae3b9 (locking/mutex: Use acquire/release semantics) causing failures on arm64 (ThunderX) 2015-12-11 4:51 ` Andrew Pinski @ 2015-12-11 8:41 ` Peter Zijlstra 2015-12-11 12:04 ` Will Deacon 0 siblings, 1 reply; 20+ messages in thread From: Peter Zijlstra @ 2015-12-11 8:41 UTC (permalink / raw) To: linux-arm-kernel On Thu, Dec 10, 2015 at 08:51:34PM -0800, Andrew Pinski wrote: > So looking further I think I understand what is going wrong and why > c55a6ffa6285e29f874ed403979472631ec70bff is incorrect. The osq_wait_next() call in osq_lock() is when we fail the lock. This is effectively trylock() semantics and like for cmpxchg a failed trylock has no implied barrier semantics. So from that POV osq_wait_next() does not need to provide ACQUIRE semantics. In osq_unlock() there's an xchg() in front, which implies full barriers and thereby provides RELEASE semantics for that part of osq_unlock(), so again, from this POV osq_wait_next() does not need to provide RELEASE semantics. > The compare and swap inside osq_lock needs to be both release and > acquire semantics memory barriers because the stores (to node) need to > be visible to the other cores before the setting of lock->tail > happens. I'm a wee bit confused on what exactly you mean. Both stores to @node: 1) osq_wait_next(): next = xchg(&node->next, NULL) 2) osq_unlock(): next = xchg(&node->next, NULL) are xchg() calls which imply full ordering (sequential consistency). Similarly the store before osq_wait_next() in osq_lock(), namely: cmpxchg(&prev->node, node, NULL) is fully ordered. So I cannot see any store being delayed past the atomic_cmpxchg_acquire(). Now you mention 'compare and swap inside osq_lock' which I take to be the latter; and it _is_ fully ordered. ^ permalink raw reply [flat|nested] 20+ messages in thread
* FW: Commit 81a43adae3b9 (locking/mutex: Use acquire/release semantics) causing failures on arm64 (ThunderX) 2015-12-11 8:41 ` Peter Zijlstra @ 2015-12-11 12:04 ` Will Deacon 2015-12-11 12:13 ` Peter Zijlstra ` (2 more replies) 0 siblings, 3 replies; 20+ messages in thread From: Will Deacon @ 2015-12-11 12:04 UTC (permalink / raw) To: linux-arm-kernel Hi all, On Fri, Dec 11, 2015 at 09:41:33AM +0100, Peter Zijlstra wrote: > On Thu, Dec 10, 2015 at 08:51:34PM -0800, Andrew Pinski wrote: > > > So looking further I think I understand what is going wrong and why > > c55a6ffa6285e29f874ed403979472631ec70bff is incorrect. > > The osq_wait_next() call in osq_lock() is when we fail the lock. This is > effectively trylock() semantics and like for cmpxchg a failed trylock > has no implied barrier semantics. So from that POV osq_wait_next() does > not need to provide ACQUIRE semantics. > > In osq_unlock() there's an xchg() in front, which implies full barriers > and thereby provides RELEASE semantics for that part of osq_unlock(), so > again, from this POV osq_wait_next() does not need to provide RELEASE > semantics. > > > The compare and swap inside osq_lock needs to be both release and > > acquire semantics memory barriers because the stores (to node) need to > > be visible to the other cores before the setting of lock->tail > > happens. > > I'm a wee bit confused on what exactly you mean. Both stores to @node: > > 1) osq_wait_next(): next = xchg(&node->next, NULL) > 2) osq_unlock(): next = xchg(&node->next, NULL) > > are xchg() calls which imply full ordering (sequential consistency). I think Andrew meant the atomic_xchg_acquire at the start of osq_lock, as opposed to "compare and swap". In which case, it does look like there's a bug here because there is nothing to order the initialisation of the node fields with publishing of the node, whether that's indirectly as a result of setting the tail to the current CPU or directly as a result of the WRITE_ONCE. Andrew, David: does making that atomic_xchg_acquire and atomic_xchg fix things for you? I don't fully grok what 81a43adae3b9 has to do with any of this, so maybe there's another bug too. Will --->8 diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c index d092a0c9c2d4..05a37857ab55 100644 --- a/kernel/locking/osq_lock.c +++ b/kernel/locking/osq_lock.c @@ -93,10 +93,12 @@ bool osq_lock(struct optimistic_spin_queue *lock) node->cpu = curr; /* - * ACQUIRE semantics, pairs with corresponding RELEASE - * in unlock() uncontended, or fastpath. + * We need both ACQUIRE (pairs with corresponding RELEASE in + * unlock() uncontended, or fastpath) and RELEASE (to publish + * the node fields we just initialised) semantics when updating + * the lock tail. */ - old = atomic_xchg_acquire(&lock->tail, curr); + old = atomic_xchg(&lock->tail, curr); if (old == OSQ_UNLOCKED_VAL) return true; ^ permalink raw reply related [flat|nested] 20+ messages in thread
* FW: Commit 81a43adae3b9 (locking/mutex: Use acquire/release semantics) causing failures on arm64 (ThunderX) 2015-12-11 12:04 ` Will Deacon @ 2015-12-11 12:13 ` Peter Zijlstra 2015-12-11 12:18 ` Will Deacon 2015-12-11 14:17 ` Davidlohr Bueso 2015-12-17 21:52 ` Jeremy Linton 2 siblings, 1 reply; 20+ messages in thread From: Peter Zijlstra @ 2015-12-11 12:13 UTC (permalink / raw) To: linux-arm-kernel On Fri, Dec 11, 2015 at 12:04:19PM +0000, Will Deacon wrote: > I think Andrew meant the atomic_xchg_acquire at the start of osq_lock, > as opposed to "compare and swap". In which case, it does look like > there's a bug here because there is nothing to order the initialisation > of the node fields with publishing of the node, whether that's > indirectly as a result of setting the tail to the current CPU or > directly as a result of the WRITE_ONCE. Agreed, this does indeed look like a bug. If confirmed please write a shiny changelog and I'll queue asap. > diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c > index d092a0c9c2d4..05a37857ab55 100644 > --- a/kernel/locking/osq_lock.c > +++ b/kernel/locking/osq_lock.c > @@ -93,10 +93,12 @@ bool osq_lock(struct optimistic_spin_queue *lock) > node->cpu = curr; > > /* > - * ACQUIRE semantics, pairs with corresponding RELEASE > - * in unlock() uncontended, or fastpath. > + * We need both ACQUIRE (pairs with corresponding RELEASE in > + * unlock() uncontended, or fastpath) and RELEASE (to publish > + * the node fields we just initialised) semantics when updating > + * the lock tail. > */ > - old = atomic_xchg_acquire(&lock->tail, curr); > + old = atomic_xchg(&lock->tail, curr); > if (old == OSQ_UNLOCKED_VAL) > return true; > ^ permalink raw reply [flat|nested] 20+ messages in thread
* FW: Commit 81a43adae3b9 (locking/mutex: Use acquire/release semantics) causing failures on arm64 (ThunderX) 2015-12-11 12:13 ` Peter Zijlstra @ 2015-12-11 12:18 ` Will Deacon 2015-12-11 12:26 ` Peter Zijlstra 0 siblings, 1 reply; 20+ messages in thread From: Will Deacon @ 2015-12-11 12:18 UTC (permalink / raw) To: linux-arm-kernel On Fri, Dec 11, 2015 at 01:13:19PM +0100, Peter Zijlstra wrote: > On Fri, Dec 11, 2015 at 12:04:19PM +0000, Will Deacon wrote: > > I think Andrew meant the atomic_xchg_acquire at the start of osq_lock, > > as opposed to "compare and swap". In which case, it does look like > > there's a bug here because there is nothing to order the initialisation > > of the node fields with publishing of the node, whether that's > > indirectly as a result of setting the tail to the current CPU or > > directly as a result of the WRITE_ONCE. > > Agreed, this does indeed look like a bug. If confirmed please write a > shiny changelog and I'll queue asap. Yup. I've failed to reproduce the issue locally, so we'll need to wait for Andrew and/or David to get back to us first. Will > > diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c > > index d092a0c9c2d4..05a37857ab55 100644 > > --- a/kernel/locking/osq_lock.c > > +++ b/kernel/locking/osq_lock.c > > @@ -93,10 +93,12 @@ bool osq_lock(struct optimistic_spin_queue *lock) > > node->cpu = curr; > > > > /* > > - * ACQUIRE semantics, pairs with corresponding RELEASE > > - * in unlock() uncontended, or fastpath. > > + * We need both ACQUIRE (pairs with corresponding RELEASE in > > + * unlock() uncontended, or fastpath) and RELEASE (to publish > > + * the node fields we just initialised) semantics when updating > > + * the lock tail. > > */ > > - old = atomic_xchg_acquire(&lock->tail, curr); > > + old = atomic_xchg(&lock->tail, curr); > > if (old == OSQ_UNLOCKED_VAL) > > return true; > > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* FW: Commit 81a43adae3b9 (locking/mutex: Use acquire/release semantics) causing failures on arm64 (ThunderX) 2015-12-11 12:18 ` Will Deacon @ 2015-12-11 12:26 ` Peter Zijlstra 2015-12-11 13:33 ` Will Deacon 0 siblings, 1 reply; 20+ messages in thread From: Peter Zijlstra @ 2015-12-11 12:26 UTC (permalink / raw) To: linux-arm-kernel On Fri, Dec 11, 2015 at 12:18:00PM +0000, Will Deacon wrote: > On Fri, Dec 11, 2015 at 01:13:19PM +0100, Peter Zijlstra wrote: > > On Fri, Dec 11, 2015 at 12:04:19PM +0000, Will Deacon wrote: > > > I think Andrew meant the atomic_xchg_acquire at the start of osq_lock, > > > as opposed to "compare and swap". In which case, it does look like > > > there's a bug here because there is nothing to order the initialisation > > > of the node fields with publishing of the node, whether that's > > > indirectly as a result of setting the tail to the current CPU or > > > directly as a result of the WRITE_ONCE. > > > > Agreed, this does indeed look like a bug. If confirmed please write a > > shiny changelog and I'll queue asap. > > Yup. I've failed to reproduce the issue locally, so we'll need to wait > for Andrew and/or David to get back to us first. While we're there, the acquire in osq_wait_next() seems somewhat ill documented too. I _think_ we need ACQUIRE semantics there because we want to strictly order the lock-unqueue A,B,C steps and we get that with: A: SC B: ACQ C: Relaxed Similarly for unlock we want the WRITE_ONCE to happen after osq_wait_next, but in that case we can even rely on the control dependency there. As noted in a previous email, the ACQUIRE for osq_wait_next() does not come from its use in lock since its on the fail path, and trylock failure doesn't imply any barriers. Not should it have RELEASE semantics for its use in unlock, since we already have that covered by the xchg() done prior. ^ permalink raw reply [flat|nested] 20+ messages in thread
* FW: Commit 81a43adae3b9 (locking/mutex: Use acquire/release semantics) causing failures on arm64 (ThunderX) 2015-12-11 12:26 ` Peter Zijlstra @ 2015-12-11 13:33 ` Will Deacon 2015-12-11 13:48 ` Peter Zijlstra 0 siblings, 1 reply; 20+ messages in thread From: Will Deacon @ 2015-12-11 13:33 UTC (permalink / raw) To: linux-arm-kernel On Fri, Dec 11, 2015 at 01:26:47PM +0100, Peter Zijlstra wrote: > On Fri, Dec 11, 2015 at 12:18:00PM +0000, Will Deacon wrote: > > On Fri, Dec 11, 2015 at 01:13:19PM +0100, Peter Zijlstra wrote: > > > On Fri, Dec 11, 2015 at 12:04:19PM +0000, Will Deacon wrote: > > > > I think Andrew meant the atomic_xchg_acquire at the start of osq_lock, > > > > as opposed to "compare and swap". In which case, it does look like > > > > there's a bug here because there is nothing to order the initialisation > > > > of the node fields with publishing of the node, whether that's > > > > indirectly as a result of setting the tail to the current CPU or > > > > directly as a result of the WRITE_ONCE. > > > > > > Agreed, this does indeed look like a bug. If confirmed please write a > > > shiny changelog and I'll queue asap. > > > > Yup. I've failed to reproduce the issue locally, so we'll need to wait > > for Andrew and/or David to get back to us first. > > While we're there, the acquire in osq_wait_next() seems somewhat ill > documented too. > > I _think_ we need ACQUIRE semantics there because we want to strictly > order the lock-unqueue A,B,C steps and we get that with: > > A: SC > B: ACQ > C: Relaxed > > Similarly for unlock we want the WRITE_ONCE to happen after > osq_wait_next, but in that case we can even rely on the control > dependency there. Even for the lock-unqueue case, isn't B->C ordered by a control dependency because C consists only of stores? Will ^ permalink raw reply [flat|nested] 20+ messages in thread
* FW: Commit 81a43adae3b9 (locking/mutex: Use acquire/release semantics) causing failures on arm64 (ThunderX) 2015-12-11 13:33 ` Will Deacon @ 2015-12-11 13:48 ` Peter Zijlstra 2015-12-11 14:06 ` Will Deacon 2015-12-11 22:35 ` Paul E. McKenney 0 siblings, 2 replies; 20+ messages in thread From: Peter Zijlstra @ 2015-12-11 13:48 UTC (permalink / raw) To: linux-arm-kernel On Fri, Dec 11, 2015 at 01:33:14PM +0000, Will Deacon wrote: > On Fri, Dec 11, 2015 at 01:26:47PM +0100, Peter Zijlstra wrote: > > While we're there, the acquire in osq_wait_next() seems somewhat ill > > documented too. > > > > I _think_ we need ACQUIRE semantics there because we want to strictly > > order the lock-unqueue A,B,C steps and we get that with: > > > > A: SC > > B: ACQ > > C: Relaxed > > > > Similarly for unlock we want the WRITE_ONCE to happen after > > osq_wait_next, but in that case we can even rely on the control > > dependency there. > > Even for the lock-unqueue case, isn't B->C ordered by a control dependency > because C consists only of stores? Hmm, indeed. So we could go fully relaxed on it I suppose, since the same is true for the unlock site. ^ permalink raw reply [flat|nested] 20+ messages in thread
* FW: Commit 81a43adae3b9 (locking/mutex: Use acquire/release semantics) causing failures on arm64 (ThunderX) 2015-12-11 13:48 ` Peter Zijlstra @ 2015-12-11 14:06 ` Will Deacon 2015-12-11 17:11 ` Peter Zijlstra 2015-12-11 22:35 ` Paul E. McKenney 1 sibling, 1 reply; 20+ messages in thread From: Will Deacon @ 2015-12-11 14:06 UTC (permalink / raw) To: linux-arm-kernel On Fri, Dec 11, 2015 at 02:48:03PM +0100, Peter Zijlstra wrote: > On Fri, Dec 11, 2015 at 01:33:14PM +0000, Will Deacon wrote: > > On Fri, Dec 11, 2015 at 01:26:47PM +0100, Peter Zijlstra wrote: > > > > While we're there, the acquire in osq_wait_next() seems somewhat ill > > > documented too. > > > > > > I _think_ we need ACQUIRE semantics there because we want to strictly > > > order the lock-unqueue A,B,C steps and we get that with: > > > > > > A: SC > > > B: ACQ > > > C: Relaxed > > > > > > Similarly for unlock we want the WRITE_ONCE to happen after > > > osq_wait_next, but in that case we can even rely on the control > > > dependency there. > > > > Even for the lock-unqueue case, isn't B->C ordered by a control dependency > > because C consists only of stores? > > Hmm, indeed. So we could go fully relaxed on it I suppose, since the > same is true for the unlock site. In which case, we should be able to relax the xchg in there (osq_wait_next) too, right? Will ^ permalink raw reply [flat|nested] 20+ messages in thread
* FW: Commit 81a43adae3b9 (locking/mutex: Use acquire/release semantics) causing failures on arm64 (ThunderX) 2015-12-11 14:06 ` Will Deacon @ 2015-12-11 17:11 ` Peter Zijlstra 2015-12-11 17:24 ` Will Deacon 0 siblings, 1 reply; 20+ messages in thread From: Peter Zijlstra @ 2015-12-11 17:11 UTC (permalink / raw) To: linux-arm-kernel On Fri, Dec 11, 2015 at 02:06:49PM +0000, Will Deacon wrote: > On Fri, Dec 11, 2015 at 02:48:03PM +0100, Peter Zijlstra wrote: > > On Fri, Dec 11, 2015 at 01:33:14PM +0000, Will Deacon wrote: > > > On Fri, Dec 11, 2015 at 01:26:47PM +0100, Peter Zijlstra wrote: > > > > > > While we're there, the acquire in osq_wait_next() seems somewhat ill > > > > documented too. > > > > > > > > I _think_ we need ACQUIRE semantics there because we want to strictly > > > > order the lock-unqueue A,B,C steps and we get that with: > > > > > > > > A: SC > > > > B: ACQ > > > > C: Relaxed > > > > > > > > Similarly for unlock we want the WRITE_ONCE to happen after > > > > osq_wait_next, but in that case we can even rely on the control > > > > dependency there. > > > > > > Even for the lock-unqueue case, isn't B->C ordered by a control dependency > > > because C consists only of stores? > > > > Hmm, indeed. So we could go fully relaxed on it I suppose, since the > > same is true for the unlock site. > > In which case, we should be able to relax the xchg in there (osq_wait_next) > too, right? Can I have second thoughts an confuse matters again? ;-) A RmW-acq is a load-acquire+store. That means the store is _after_ the load and thus not required for the completion of the control dependency. Therefore the store in question can reorder inside the conditional control block's stores. Hmm? Suggesting this acquire is in fact also wrong, since we need full order ops to guarantee full order both in lock and unlock. ^ permalink raw reply [flat|nested] 20+ messages in thread
* FW: Commit 81a43adae3b9 (locking/mutex: Use acquire/release semantics) causing failures on arm64 (ThunderX) 2015-12-11 17:11 ` Peter Zijlstra @ 2015-12-11 17:24 ` Will Deacon 0 siblings, 0 replies; 20+ messages in thread From: Will Deacon @ 2015-12-11 17:24 UTC (permalink / raw) To: linux-arm-kernel On Fri, Dec 11, 2015 at 06:11:28PM +0100, Peter Zijlstra wrote: > On Fri, Dec 11, 2015 at 02:06:49PM +0000, Will Deacon wrote: > > On Fri, Dec 11, 2015 at 02:48:03PM +0100, Peter Zijlstra wrote: > > > On Fri, Dec 11, 2015 at 01:33:14PM +0000, Will Deacon wrote: > > > > On Fri, Dec 11, 2015 at 01:26:47PM +0100, Peter Zijlstra wrote: > > > > > > > > While we're there, the acquire in osq_wait_next() seems somewhat ill > > > > > documented too. > > > > > > > > > > I _think_ we need ACQUIRE semantics there because we want to strictly > > > > > order the lock-unqueue A,B,C steps and we get that with: > > > > > > > > > > A: SC > > > > > B: ACQ > > > > > C: Relaxed > > > > > > > > > > Similarly for unlock we want the WRITE_ONCE to happen after > > > > > osq_wait_next, but in that case we can even rely on the control > > > > > dependency there. > > > > > > > > Even for the lock-unqueue case, isn't B->C ordered by a control dependency > > > > because C consists only of stores? > > > > > > Hmm, indeed. So we could go fully relaxed on it I suppose, since the > > > same is true for the unlock site. > > > > In which case, we should be able to relax the xchg in there (osq_wait_next) > > too, right? > > Can I have second thoughts an confuse matters again? ;-) > > A RmW-acq is a load-acquire+store. That means the store is _after_ the > load and thus not required for the completion of the control dependency. > > Therefore the store in question can reorder inside the conditional > control block's stores. > > Hmm? Ah yeah, it's the same thing we were discussing the other day! Whilst there is a form of control dependency from the SC part of the LL/SC sequence, it doesn't guarantee ordering in the same way that a load->store control dependency does. That is, it orders subsequent writes to be afterwards in the coherence order but it doesn't ensure multi-copy atomicity for readers. Now, in this case, &lock->tail is only ever accessed by other cmpxchg operations, so I think it does actually work using just the control dependency. Worst case, a concurrent osq_wait_next gets a stale value in the atomic_read, but that's not a correctness problem. Will ^ permalink raw reply [flat|nested] 20+ messages in thread
* FW: Commit 81a43adae3b9 (locking/mutex: Use acquire/release semantics) causing failures on arm64 (ThunderX) 2015-12-11 13:48 ` Peter Zijlstra 2015-12-11 14:06 ` Will Deacon @ 2015-12-11 22:35 ` Paul E. McKenney 2015-12-14 20:28 ` Peter Zijlstra 1 sibling, 1 reply; 20+ messages in thread From: Paul E. McKenney @ 2015-12-11 22:35 UTC (permalink / raw) To: linux-arm-kernel On Fri, Dec 11, 2015 at 02:48:03PM +0100, Peter Zijlstra wrote: > On Fri, Dec 11, 2015 at 01:33:14PM +0000, Will Deacon wrote: > > On Fri, Dec 11, 2015 at 01:26:47PM +0100, Peter Zijlstra wrote: > > > > While we're there, the acquire in osq_wait_next() seems somewhat ill > > > documented too. > > > > > > I _think_ we need ACQUIRE semantics there because we want to strictly > > > order the lock-unqueue A,B,C steps and we get that with: > > > > > > A: SC > > > B: ACQ > > > C: Relaxed > > > > > > Similarly for unlock we want the WRITE_ONCE to happen after > > > osq_wait_next, but in that case we can even rely on the control > > > dependency there. > > > > Even for the lock-unqueue case, isn't B->C ordered by a control dependency > > because C consists only of stores? > > Hmm, indeed. So we could go fully relaxed on it I suppose, since the > same is true for the unlock site. I am probably missing quite a bit on this thread, but don't x86 MMIO accesses to frame buffers need to interact with something more heavyweight than an x86 release store or acquire load in order to remain confined to the resulting critical section? Thanx, Paul ^ permalink raw reply [flat|nested] 20+ messages in thread
* FW: Commit 81a43adae3b9 (locking/mutex: Use acquire/release semantics) causing failures on arm64 (ThunderX) 2015-12-11 22:35 ` Paul E. McKenney @ 2015-12-14 20:28 ` Peter Zijlstra 2015-12-15 4:36 ` Paul E. McKenney 0 siblings, 1 reply; 20+ messages in thread From: Peter Zijlstra @ 2015-12-14 20:28 UTC (permalink / raw) To: linux-arm-kernel On Fri, Dec 11, 2015 at 02:35:40PM -0800, Paul E. McKenney wrote: > On Fri, Dec 11, 2015 at 02:48:03PM +0100, Peter Zijlstra wrote: > > On Fri, Dec 11, 2015 at 01:33:14PM +0000, Will Deacon wrote: > > > On Fri, Dec 11, 2015 at 01:26:47PM +0100, Peter Zijlstra wrote: > > > > > > While we're there, the acquire in osq_wait_next() seems somewhat ill > > > > documented too. > > > > > > > > I _think_ we need ACQUIRE semantics there because we want to strictly > > > > order the lock-unqueue A,B,C steps and we get that with: > > > > > > > > A: SC > > > > B: ACQ > > > > C: Relaxed > > > > > > > > Similarly for unlock we want the WRITE_ONCE to happen after > > > > osq_wait_next, but in that case we can even rely on the control > > > > dependency there. > > > > > > Even for the lock-unqueue case, isn't B->C ordered by a control dependency > > > because C consists only of stores? > > > > Hmm, indeed. So we could go fully relaxed on it I suppose, since the > > same is true for the unlock site. > > I am probably missing quite a bit on this thread, but don't x86 MMIO > accesses to frame buffers need to interact with something more heavyweight > than an x86 release store or acquire load in order to remain confined > to the resulting critical section? So on x86 there really isn't a problem because every atomic op (and there's plenty here) will be a full barrier. That is, even if you were to replace everything with _relaxed() ops, it would still work as 'expected' on x86. ppc/arm64 will crash and burn, but that's another story. But the important point here was that osq_wait_next() is never relied upon to provide either the ACQUIRE semantics for osq_lock() not the RELEASE semantics for osq_unlock(). Those are provided by other ops. ^ permalink raw reply [flat|nested] 20+ messages in thread
* FW: Commit 81a43adae3b9 (locking/mutex: Use acquire/release semantics) causing failures on arm64 (ThunderX) 2015-12-14 20:28 ` Peter Zijlstra @ 2015-12-15 4:36 ` Paul E. McKenney 0 siblings, 0 replies; 20+ messages in thread From: Paul E. McKenney @ 2015-12-15 4:36 UTC (permalink / raw) To: linux-arm-kernel On Mon, Dec 14, 2015 at 09:28:55PM +0100, Peter Zijlstra wrote: > On Fri, Dec 11, 2015 at 02:35:40PM -0800, Paul E. McKenney wrote: > > On Fri, Dec 11, 2015 at 02:48:03PM +0100, Peter Zijlstra wrote: > > > On Fri, Dec 11, 2015 at 01:33:14PM +0000, Will Deacon wrote: > > > > On Fri, Dec 11, 2015 at 01:26:47PM +0100, Peter Zijlstra wrote: > > > > > > > > While we're there, the acquire in osq_wait_next() seems somewhat ill > > > > > documented too. > > > > > > > > > > I _think_ we need ACQUIRE semantics there because we want to strictly > > > > > order the lock-unqueue A,B,C steps and we get that with: > > > > > > > > > > A: SC > > > > > B: ACQ > > > > > C: Relaxed > > > > > > > > > > Similarly for unlock we want the WRITE_ONCE to happen after > > > > > osq_wait_next, but in that case we can even rely on the control > > > > > dependency there. > > > > > > > > Even for the lock-unqueue case, isn't B->C ordered by a control dependency > > > > because C consists only of stores? > > > > > > Hmm, indeed. So we could go fully relaxed on it I suppose, since the > > > same is true for the unlock site. > > > > I am probably missing quite a bit on this thread, but don't x86 MMIO > > accesses to frame buffers need to interact with something more heavyweight > > than an x86 release store or acquire load in order to remain confined > > to the resulting critical section? > > So on x86 there really isn't a problem because every atomic op (and > there's plenty here) will be a full barrier. > > That is, even if you were to replace everything with _relaxed() ops, it > would still work as 'expected' on x86. > > ppc/arm64 will crash and burn, but that's another story. > > But the important point here was that osq_wait_next() is never relied > upon to provide either the ACQUIRE semantics for osq_lock() not the > RELEASE semantics for osq_unlock(). Those are provided by other ops. OK, good to know! Thanx, Paul ^ permalink raw reply [flat|nested] 20+ messages in thread
* FW: Commit 81a43adae3b9 (locking/mutex: Use acquire/release semantics) causing failures on arm64 (ThunderX) 2015-12-11 12:04 ` Will Deacon 2015-12-11 12:13 ` Peter Zijlstra @ 2015-12-11 14:17 ` Davidlohr Bueso 2015-12-17 21:52 ` Jeremy Linton 2 siblings, 0 replies; 20+ messages in thread From: Davidlohr Bueso @ 2015-12-11 14:17 UTC (permalink / raw) To: linux-arm-kernel On Fri, 11 Dec 2015, Will Deacon wrote: >I think Andrew meant the atomic_xchg_acquire at the start of osq_lock, >as opposed to "compare and swap". In which case, it does look like >there's a bug here because there is nothing to order the initialisation >of the node fields with publishing of the node, whether that's >indirectly as a result of setting the tail to the current CPU or >directly as a result of the WRITE_ONCE. Sorry I'm late to the party. Duh yes this is obviously bogus, and worse I recall triggering a similar tail initialization issue in osq_lock on some experimental work on x86, so this is very much a point of failure. Ack. > >Andrew, David: does making that atomic_xchg_acquire and atomic_xchg >fix things for you? > >I don't fully grok what 81a43adae3b9 has to do with any of this, so >maybe there's another bug too. I think this is mainly because mutex_optimistic_spin is where the stack shows the lockup, which really translates to c55a6ffa62. Thanks, Davidlohr ^ permalink raw reply [flat|nested] 20+ messages in thread
* FW: Commit 81a43adae3b9 (locking/mutex: Use acquire/release semantics) causing failures on arm64 (ThunderX) 2015-12-11 12:04 ` Will Deacon 2015-12-11 12:13 ` Peter Zijlstra 2015-12-11 14:17 ` Davidlohr Bueso @ 2015-12-17 21:52 ` Jeremy Linton 2 siblings, 0 replies; 20+ messages in thread From: Jeremy Linton @ 2015-12-17 21:52 UTC (permalink / raw) To: linux-arm-kernel On 12/11/2015 06:04 AM, Will Deacon wrote: > I think Andrew meant the atomic_xchg_acquire at the start of osq_lock, > as opposed to "compare and swap". In which case, it does look like > there's a bug here because there is nothing to order the initialisation > of the node fields with publishing of the node, whether that's > indirectly as a result of setting the tail to the current CPU or > directly as a result of the WRITE_ONCE. > > Andrew, David: does making that atomic_xchg_acquire and atomic_xchg > fix things for you? > > I don't fully grok what 81a43adae3b9 has to do with any of this, so > maybe there's another bug too. Will, I ran into this problem while getting a machine to boot with ACPI yesterday. Your suggested fix works on that machine. Thanks, so for that patch. Tested-by: Jeremy Linton <jeremy.linton@arm.com> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ^ permalink raw reply [flat|nested] 20+ messages in thread
* FW: Commit 81a43adae3b9 (locking/mutex: Use acquire/release semantics) causing failures on arm64 (ThunderX) 2015-12-11 3:29 ` FW: " Andrew Pinski 2015-12-11 4:51 ` Andrew Pinski @ 2015-12-11 7:33 ` Peter Zijlstra 1 sibling, 0 replies; 20+ messages in thread From: Peter Zijlstra @ 2015-12-11 7:33 UTC (permalink / raw) To: linux-arm-kernel On Thu, Dec 10, 2015 at 07:29:34PM -0800, Andrew Pinski wrote: > On Thu, Dec 10, 2015 at 11:44 AM, David Danny wrote: > > > > Hi, > > > > We are getting soft lockup OOPs on Cavium CN88XX (A.K.A. ThunderX), which is an arm64 implementation. > > I get a slightly different OOPs and reverting > c55a6ffa6285e29f874ed403979472631ec70bff I was able to boot. > What I saw with osq_lock.c was that osq_wait_next is called for both > lock and unlock case so it might need both barriers. > The other question comes does atomic_cmpxchg_release have release > semantics when the compare fails? Right now it does not. > Out cmpxchg primites imply no barrier on failure, this is documented somewhere.. /me searches.. --- commit ed2de9f74ecbbf3063d29b2334e7b455d7f35189 Author: Will Deacon <will.deacon@arm.com> Date: Thu Jul 16 16:10:06 2015 +0100 locking/Documentation: Clarify failed cmpxchg() memory ordering semantics A failed cmpxchg does not provide any memory ordering guarantees, a property that is used to optimise the cmpxchg implementations on Alpha, PowerPC and arm64. This patch updates atomic_ops.txt and memory-barriers.txt to reflect this. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Douglas Hatch <doug.hatch@hp.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Scott J Norton <scott.norton@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <waiman.long@hp.com> Link: http://lkml.kernel.org/r/20150716151006.GH26390 at arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> diff --git a/Documentation/atomic_ops.txt b/Documentation/atomic_ops.txt index dab6da3382d9..b19fc34efdb1 100644 --- a/Documentation/atomic_ops.txt +++ b/Documentation/atomic_ops.txt @@ -266,7 +266,9 @@ with the given old and new values. Like all atomic_xxx operations, atomic_cmpxchg will only satisfy its atomicity semantics as long as all other accesses of *v are performed through atomic_xxx operations. -atomic_cmpxchg must provide explicit memory barriers around the operation. +atomic_cmpxchg must provide explicit memory barriers around the operation, +although if the comparison fails then no memory ordering guarantees are +required. The semantics for atomic_cmpxchg are the same as those defined for 'cas' below. diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index 13feb697271f..18fc860df1be 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt @@ -2383,9 +2383,7 @@ about the state (old or new) implies an SMP-conditional general memory barrier explicit lock operations, described later). These include: xchg(); - cmpxchg(); atomic_xchg(); atomic_long_xchg(); - atomic_cmpxchg(); atomic_long_cmpxchg(); atomic_inc_return(); atomic_long_inc_return(); atomic_dec_return(); atomic_long_dec_return(); atomic_add_return(); atomic_long_add_return(); @@ -2398,7 +2396,9 @@ about the state (old or new) implies an SMP-conditional general memory barrier test_and_clear_bit(); test_and_change_bit(); - /* when succeeds (returns 1) */ + /* when succeeds */ + cmpxchg(); + atomic_cmpxchg(); atomic_long_cmpxchg(); atomic_add_unless(); atomic_long_add_unless(); These are used for such things as implementing ACQUIRE-class and RELEASE-class ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Commit 81a43adae3b9 (locking/mutex: Use acquire/release semantics) causing failures on arm64 (ThunderX) 2015-12-10 19:43 Commit 81a43adae3b9 (locking/mutex: Use acquire/release semantics) causing failures on arm64 (ThunderX) David Daney [not found] ` <SN1PR07MB21577C72379C8440A208D6BC9EEA0@SN1PR07MB2157.namprd07.prod.outlook.com> @ 2015-12-11 9:59 ` Will Deacon 1 sibling, 0 replies; 20+ messages in thread From: Will Deacon @ 2015-12-11 9:59 UTC (permalink / raw) To: linux-arm-kernel On Thu, Dec 10, 2015 at 11:43:46AM -0800, David Daney wrote: > We are getting soft lockup OOPs on Cavium CN88XX (A.K.A. ThunderX), which is > an arm64 implementation. [...] > At this point it is unknown if this patch is incorrect, or if the underlying > ARM64 atomic_*_{acquire,release} primitives are defective, or if the problem > lies elsewhere. Are you using the ll/sc or lse versions of the atomics? In the case of the former, are they inline or out-of-line (this depends on whether or not you've selected CONFIG_ARM64_LSE_ATOMICS and whether or not you have toolchain support)? Will ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2015-12-17 21:52 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-12-10 19:43 Commit 81a43adae3b9 (locking/mutex: Use acquire/release semantics) causing failures on arm64 (ThunderX) David Daney [not found] ` <SN1PR07MB21577C72379C8440A208D6BC9EEA0@SN1PR07MB2157.namprd07.prod.outlook.com> 2015-12-11 3:29 ` FW: " Andrew Pinski 2015-12-11 4:51 ` Andrew Pinski 2015-12-11 8:41 ` Peter Zijlstra 2015-12-11 12:04 ` Will Deacon 2015-12-11 12:13 ` Peter Zijlstra 2015-12-11 12:18 ` Will Deacon 2015-12-11 12:26 ` Peter Zijlstra 2015-12-11 13:33 ` Will Deacon 2015-12-11 13:48 ` Peter Zijlstra 2015-12-11 14:06 ` Will Deacon 2015-12-11 17:11 ` Peter Zijlstra 2015-12-11 17:24 ` Will Deacon 2015-12-11 22:35 ` Paul E. McKenney 2015-12-14 20:28 ` Peter Zijlstra 2015-12-15 4:36 ` Paul E. McKenney 2015-12-11 14:17 ` Davidlohr Bueso 2015-12-17 21:52 ` Jeremy Linton 2015-12-11 7:33 ` Peter Zijlstra 2015-12-11 9:59 ` Will Deacon
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).