* [RFC PATCHv2] 64bit LWS CAS
@ 2014-07-29 19:13 Guy Martin
2014-07-29 21:24 ` Helge Deller
0 siblings, 1 reply; 6+ messages in thread
From: Guy Martin @ 2014-07-29 19:13 UTC (permalink / raw)
To: linux-parisc
[-- Attachment #1: Type: text/plain, Size: 1231 bytes --]
Hi all,
Following the discussion about broken CAS for size != 4, I took a new
approach and implemented in a different way.
The new ABI takes the oldval, newval and mem as pointers plus a size
parameter. This means that a single LWS can now handle all types of
variable size.
Note that the 32bit CAS for 64bit size has not been tested (not even
compiled) since I can't compile a 32bit kernel a the moment.
My approach for 64bit CAS on 32bit is be the following :
- Load old into 2 registers
- Compare low and high part and bail out if different
- Load new into a FPU register
- Store the content of the FPU register to the memory
The point here being to do the store in the last step in a single
instruction.
I think the same approach can be used for 128bit CAS as well but I
don't think it's needed at the moment.
Regading the GCC counterpart of the implementation, I'm not sure about
the way to proceed.
Should I try to detect the presence of the new LWS and use it for all
CAS operations at init time ?
So far I only used the new LWS for 64bit CAS.
I guess that using the new LWS unconditionally for all CAS operations
isn't an option since it will break for newer gcc on old kernels.
Regards,
Guy
[-- Attachment #2: gcc-64bit-atomic-cas2.patch --]
[-- Type: text/x-patch, Size: 5666 bytes --]
--- libgcc/config/pa/linux-atomic.c 2011-11-02 15:23:48.000000000 +0000
+++ /root/gcc-trunk/libgcc/config/pa/linux-atomic.c 2014-07-29 16:05:21.932078161 +0000
@@ -1,5 +1,5 @@
/* Linux-specific atomic operations for PA Linux.
- Copyright (C) 2008, 2009, 2010 Free Software Foundation, Inc.
+ Copyright (C) 2008-2014 Free Software Foundation, Inc.
Based on code contributed by CodeSourcery for ARM EABI Linux.
Modifications for PA Linux by Helge Deller <deller@gmx.de>
@@ -75,6 +75,31 @@
return lws_errno;
}
+static inline long
+__kernel_cmpxchg2 (void * oldval, void * newval, void *mem, int val_size)
+{
+
+ register unsigned long lws_mem asm("r26") = (unsigned long) (mem);
+ register long lws_ret asm("r28");
+ register long lws_errno asm("r21");
+ register unsigned long lws_old asm("r25") = (unsigned long) oldval;
+ register unsigned long lws_new asm("r24") = (unsigned long) newval;
+ register int lws_size asm("r23") = val_size;
+ asm volatile ( "ble 0xb0(%%sr2, %%r0) \n\t"
+ "ldi %2, %%r20 \n\t"
+ : "=r" (lws_ret), "=r" (lws_errno)
+ : "i" (2), "r" (lws_mem), "r" (lws_old), "r" (lws_new), "r" (lws_size)
+ : "r1", "r20", "r22", "r29", "r31", "fr4", "memory"
+ );
+ if (__builtin_expect (lws_errno == -EFAULT || lws_errno == -ENOSYS, 0))
+ ABORT_INSTRUCTION;
+
+ /* If the kernel LWS call fails, retrun EBUSY */
+ if (!lws_errno && lws_ret)
+ lws_errno = -EBUSY;
+
+ return lws_errno;
+}
#define HIDDEN __attribute__ ((visibility ("hidden")))
/* Big endian masks */
@@ -84,6 +109,29 @@
#define MASK_1 0xffu
#define MASK_2 0xffffu
+#define FETCH_AND_OP_DWORD(OP, PFX_OP, INF_OP) \
+ long long HIDDEN \
+ __sync_fetch_and_##OP##_8 (long long *ptr, long long val) \
+ { \
+ long long tmp, newval; \
+ int failure; \
+ \
+ do { \
+ tmp = *ptr; \
+ newval = PFX_OP (tmp INF_OP val); \
+ failure = __kernel_cmpxchg2 (&tmp, &newval, ptr, 3); \
+ } while (failure != 0); \
+ \
+ return tmp; \
+ }
+
+FETCH_AND_OP_DWORD (add, , +)
+FETCH_AND_OP_DWORD (sub, , -)
+FETCH_AND_OP_DWORD (or, , |)
+FETCH_AND_OP_DWORD (and, , &)
+FETCH_AND_OP_DWORD (xor, , ^)
+FETCH_AND_OP_DWORD (nand, ~, &)
+
#define FETCH_AND_OP_WORD(OP, PFX_OP, INF_OP) \
int HIDDEN \
__sync_fetch_and_##OP##_4 (int *ptr, int val) \
@@ -147,6 +195,29 @@
SUBWORD_SYNC_OP (xor, , ^, unsigned char, 1, oldval)
SUBWORD_SYNC_OP (nand, ~, &, unsigned char, 1, oldval)
+#define OP_AND_FETCH_DWORD(OP, PFX_OP, INF_OP) \
+ long long HIDDEN \
+ __sync_##OP##_and_fetch_8 (long long *ptr, long long val) \
+ { \
+ long long tmp, newval; \
+ int failure; \
+ \
+ do { \
+ tmp = *ptr; \
+ newval = PFX_OP (tmp INF_OP val); \
+ failure = __kernel_cmpxchg2 (&tmp, &newval, ptr, 3); \
+ } while (failure != 0); \
+ \
+ return PFX_OP (tmp INF_OP val); \
+ }
+
+OP_AND_FETCH_DWORD (add, , +)
+OP_AND_FETCH_DWORD (sub, , -)
+OP_AND_FETCH_DWORD (or, , |)
+OP_AND_FETCH_DWORD (and, , &)
+OP_AND_FETCH_DWORD (xor, , ^)
+OP_AND_FETCH_DWORD (nand, ~, &)
+
#define OP_AND_FETCH_WORD(OP, PFX_OP, INF_OP) \
int HIDDEN \
__sync_##OP##_and_fetch_4 (int *ptr, int val) \
@@ -182,6 +253,26 @@
SUBWORD_SYNC_OP (xor, , ^, unsigned char, 1, newval)
SUBWORD_SYNC_OP (nand, ~, &, unsigned char, 1, newval)
+long long HIDDEN
+__sync_val_compare_and_swap_8 (long long *ptr, long long oldval, long long newval)
+{
+ long long actual_oldval;
+ int fail;
+
+ while (1)
+ {
+ actual_oldval = *ptr;
+
+ if (__builtin_expect (oldval != actual_oldval, 0))
+ return actual_oldval;
+
+ fail = __kernel_cmpxchg2 (&actual_oldval, &newval, ptr, 3);
+
+ if (__builtin_expect (!fail, 1))
+ return actual_oldval;
+ }
+}
+
int HIDDEN
__sync_val_compare_and_swap_4 (int *ptr, int oldval, int newval)
{
@@ -256,6 +347,20 @@
SUBWORD_BOOL_CAS (unsigned short, 2)
SUBWORD_BOOL_CAS (unsigned char, 1)
+long long HIDDEN
+__sync_lock_test_and_set_8 (long long *ptr, long long val)
+{
+ long long oldval;
+ int failure;
+
+ do {
+ oldval = *ptr;
+ failure = __kernel_cmpxchg2 (&oldval, &val, ptr, 3);
+ } while (failure != 0);
+
+ return oldval;
+}
+
int HIDDEN
__sync_lock_test_and_set_4 (int *ptr, int val)
{
@@ -293,13 +398,45 @@
SUBWORD_TEST_AND_SET (unsigned short, 2)
SUBWORD_TEST_AND_SET (unsigned char, 1)
+void HIDDEN
+__sync_lock_release_8 (int *ptr)
+{
+ long long failure, oldval, zero = 0;
+
+ do {
+ oldval = *ptr;
+ failure = __kernel_cmpxchg2 (&oldval, &zero, ptr, 3);
+ } while (failure != 0);
+}
+
+void HIDDEN
+__sync_lock_release_4 (int *ptr)
+{
+ int failure, oldval;
+
+ do {
+ oldval = *ptr;
+ failure = __kernel_cmpxchg (oldval, 0, ptr);
+ } while (failure != 0);
+}
+
#define SYNC_LOCK_RELEASE(TYPE, WIDTH) \
void HIDDEN \
__sync_lock_release_##WIDTH (TYPE *ptr) \
{ \
- *ptr = 0; \
+ int failure; \
+ unsigned int oldval, newval, shift, mask; \
+ int *wordptr = (int *) ((unsigned long) ptr & ~3); \
+ \
+ shift = (((unsigned long) ptr & 3) << 3) ^ INVERT_MASK_##WIDTH; \
+ mask = MASK_##WIDTH << shift; \
+ \
+ do { \
+ oldval = *wordptr; \
+ newval = oldval & ~mask; \
+ failure = __kernel_cmpxchg (oldval, newval, wordptr); \
+ } while (failure != 0); \
}
-SYNC_LOCK_RELEASE (int, 4)
SYNC_LOCK_RELEASE (short, 2)
SYNC_LOCK_RELEASE (char, 1)
[-- Attachment #3: linux-hppa-atomic-cas2.patch --]
[-- Type: text/x-patch, Size: 8714 bytes --]
--- arch/parisc/kernel/syscall.S.orig 2014-06-08 20:19:54.000000000 +0200
+++ arch/parisc/kernel/syscall.S 2014-07-25 23:45:10.544853275 +0200
@@ -74,7 +74,7 @@
/* ADDRESS 0xb0 to 0xb8, lws uses two insns for entry */
/* Light-weight-syscall entry must always be located at 0xb0 */
/* WARNING: Keep this number updated with table size changes */
-#define __NR_lws_entries (2)
+#define __NR_lws_entries (3)
lws_entry:
gate lws_start, %r0 /* increase privilege */
@@ -502,7 +502,7 @@
/***************************************************
- Implementing CAS as an atomic operation:
+ Implementing 32bit CAS as an atomic operation:
%r26 - Address to examine
%r25 - Old value to check (old)
@@ -658,6 +658,274 @@
ASM_EXCEPTIONTABLE_ENTRY(1b-linux_gateway_page, 3b-linux_gateway_page)
ASM_EXCEPTIONTABLE_ENTRY(2b-linux_gateway_page, 3b-linux_gateway_page)
+
+ /***************************************************
+ New CAS implementation which uses pointers and variable size information.
+ The value pointed by old and new MUST NOT change while performing CAS.
+ The lock only protect the value at %r26.
+
+ %r26 - Address to examine
+ %r25 - Pointer to the value to check (old)
+ %r24 - Pointer to the value to set (new)
+ %r23 - Size of the variable (8bit = 0, 16bit = 1, 32bit = 2, 64bit = 4)
+ %r28 - Return non-zero on failure
+ %r21 - Kernel error code
+
+ If debugging is DISabled:
+
+ %r21 has the following meanings:
+
+ EAGAIN - CAS is busy, ldcw failed, try again.
+ EFAULT - Read or write failed.
+
+ If debugging is enabled:
+
+ EDEADLOCK - CAS called recursively.
+ EAGAIN && r28 == 1 - CAS is busy. Lock contended.
+ EAGAIN && r28 == 2 - CAS is busy. ldcw failed.
+ EFAULT - Read or write failed.
+
+ Scratch: r20, r22, r28, r29, r1, fr4 (32bit for 64bit CAS only)
+
+ ****************************************************/
+
+ /* ELF32 Process entry path */
+lws_compare_and_swap_2:
+#ifdef CONFIG_64BIT
+ /* Clip the input registers */
+ depdi 0, 31, 32, %r26
+ depdi 0, 31, 32, %r25
+ depdi 0, 31, 32, %r24
+ depdi 0, 31, 32, %r23
+#endif
+
+ /* Check the validity of the size pointer */
+ subi,>>= 4, %r23, %r0
+ b,n lws_exit_nosys
+
+ /* Jump to the functions which will load the old and new values into
+ registers depending on the their size */
+ shlw %r23, 2, %r29
+ blr %r29, %r0
+ nop
+
+ /* 8bit load */
+4: ldb 0(%sr3,%r25), %r25
+ b cas2_lock_start
+5: ldb 0(%sr3,%r24), %r24
+ nop
+ nop
+ nop
+ nop
+ nop
+
+ /* 16bit load */
+6: ldh 0(%sr3,%r25), %r25
+ b cas2_lock_start
+7: ldh 0(%sr3,%r24), %r24
+ nop
+ nop
+ nop
+ nop
+ nop
+
+ /* 32bit load */
+8: ldw 0(%sr3,%r25), %r25
+ b cas2_lock_start
+9: ldw 0(%sr3,%r24), %r24
+ nop
+ nop
+ nop
+ nop
+ nop
+
+ /* 64bit load */
+#ifdef CONFIG_64BIT
+10: ldd 0(%sr3,%r25), %r25
+11: ldd 0(%sr3,%r24), %r24
+#else
+ /* Load new value into r22/r23 - high/low */
+10: ldw 0(%sr3,%r25), %r22
+11: ldw 4(%sr3,%r25), %r23
+#endif
+
+cas2_lock_start:
+ /* Load start of lock table */
+ ldil L%lws_lock_start, %r20
+ ldo R%lws_lock_start(%r20), %r28
+
+ /* Extract four bits from r26 and hash lock (Bits 4-7) */
+ extru %r26, 27, 4, %r20
+
+ /* Find lock to use, the hash is either one of 0 to
+ 15, multiplied by 16 (keep it 16-byte aligned)
+ and add to the lock table offset. */
+ shlw %r20, 4, %r20
+ add %r20, %r28, %r20
+
+# if ENABLE_LWS_DEBUG
+ /*
+ DEBUG, check for deadlock!
+ If the thread register values are the same
+ then we were the one that locked it last and
+ this is a recurisve call that will deadlock.
+ We *must* giveup this call and fail.
+ */
+ ldw 4(%sr2,%r20), %r28 /* Load thread register */
+ /* WARNING: If cr27 cycles to the same value we have problems */
+ mfctl %cr27, %r21 /* Get current thread register */
+ cmpb,<>,n %r21, %r28, cas2_lock /* Called recursive? */
+ b lws_exit /* Return error! */
+ ldo -EDEADLOCK(%r0), %r21
+cas2_lock:
+ cmpb,=,n %r0, %r28, cas2_nocontend /* Is nobody using it? */
+ ldo 1(%r0), %r28 /* 1st case */
+ b lws_exit /* Contended... */
+ ldo -EAGAIN(%r0), %r21 /* Spin in userspace */
+cas2_nocontend:
+# endif
+/* ENABLE_LWS_DEBUG */
+
+ rsm PSW_SM_I, %r0 /* Disable interrupts */
+ /* COW breaks can cause contention on UP systems */
+ LDCW 0(%sr2,%r20), %r28 /* Try to acquire the lock */
+ cmpb,<>,n %r0, %r28, cas2_action /* Did we get it? */
+cas2_wouldblock:
+ ldo 2(%r0), %r28 /* 2nd case */
+ ssm PSW_SM_I, %r0
+ b lws_exit /* Contended... */
+ ldo -EAGAIN(%r0), %r21 /* Spin in userspace */
+
+ /*
+ prev = *addr;
+ if ( prev == old )
+ *addr = new;
+ return prev;
+ */
+
+ /* NOTES:
+ This all works becuse intr_do_signal
+ and schedule both check the return iasq
+ and see that we are on the kernel page
+ so this process is never scheduled off
+ or is ever sent any signal of any sort,
+ thus it is wholly atomic from usrspaces
+ perspective
+ */
+cas2_action:
+#if defined CONFIG_SMP && ENABLE_LWS_DEBUG
+ /* DEBUG */
+ mfctl %cr27, %r1
+ stw %r1, 4(%sr2,%r20)
+#endif
+
+ /* Jump to the correct function */
+ blr %r29, %r0
+ /* Set %r28 as non-zero for now */
+ ldo 1(%r0),%r28
+
+ /* 8bit CAS */
+12: ldb,ma 0(%sr3,%r26), %r29
+ sub,= %r29, %r25, %r0
+ b,n cas2_end
+13: stb,ma %r24, 0(%sr3,%r26)
+ b cas2_end
+ copy %r0, %r28
+ nop
+ nop
+
+ /* 16bit CAS */
+14: ldh,ma 0(%sr3,%r26), %r29
+ sub,= %r29, %r25, %r0
+ b,n cas2_end
+15: sth,ma %r24, 0(%sr3,%r26)
+ b cas2_end
+ copy %r0, %r28
+ nop
+ nop
+
+ /* 32bit CAS */
+16: ldw,ma 0(%sr3,%r26), %r29
+ sub,= %r29, %r25, %r0
+ b,n cas2_end
+17: stw,ma %r24, 0(%sr3,%r26)
+ b cas2_end
+ copy %r0, %r28
+ nop
+ nop
+
+ /* 64bit CAS */
+#ifdef CONFIG_64BIT
+18: ldd,ma 0(%sr3,%r26), %r29
+ sub,= %r29, %r25, %r0
+ b,n cas2_end
+19: std,ma %r24, 0(%sr3,%r26)
+ copy %r0, %r28
+#else
+ /* Compare first word */
+18: ldd,ma 0(%sr3,%r26), %r29
+ sub,= %r29, %r22, %r0
+ b,n cas2_end
+ /* Compare second word */
+19: ldd,ma 4(%sr3,%r26), %r29
+ sub,= %r29, %r23, %r0
+ b,n cas2_end
+ /* Performe the store */
+20: flddx 0(%sr3,%r24), %fr4
+21: fstdx %fr4, 0(%sr3,%r26)
+ copy %r0, %r28
+#endif
+
+cas2_end:
+ /* Free lock */
+ stw,ma %r20, 0(%sr2,%r20)
+#if ENABLE_LWS_DEBUG
+ /* Clear thread register indicator */
+ stw %r0, 4(%sr2,%r20)
+#endif
+ /* Enable interrupts */
+ ssm PSW_SM_I, %r0
+ /* Return to userspace, set no error */
+ b lws_exit
+ copy %r0, %r21
+
+22:
+ /* Error occurred on load or store */
+ /* Free lock */
+ stw %r20, 0(%sr2,%r20)
+#if ENABLE_LWS_DEBUG
+ stw %r0, 4(%sr2,%r20)
+#endif
+ ssm PSW_SM_I, %r0
+ ldo 1(%r0),%r28
+ b lws_exit
+ ldo -EFAULT(%r0),%r21 /* set errno */
+ nop
+ nop
+ nop
+
+ /* Exception table entries, for the load and store, return EFAULT.
+ Each of the entries must be relocated. */
+ ASM_EXCEPTIONTABLE_ENTRY(4b-linux_gateway_page, 22b-linux_gateway_page)
+ ASM_EXCEPTIONTABLE_ENTRY(5b-linux_gateway_page, 22b-linux_gateway_page)
+ ASM_EXCEPTIONTABLE_ENTRY(6b-linux_gateway_page, 22b-linux_gateway_page)
+ ASM_EXCEPTIONTABLE_ENTRY(7b-linux_gateway_page, 22b-linux_gateway_page)
+ ASM_EXCEPTIONTABLE_ENTRY(8b-linux_gateway_page, 22b-linux_gateway_page)
+ ASM_EXCEPTIONTABLE_ENTRY(9b-linux_gateway_page, 22b-linux_gateway_page)
+ ASM_EXCEPTIONTABLE_ENTRY(10b-linux_gateway_page, 22b-linux_gateway_page)
+ ASM_EXCEPTIONTABLE_ENTRY(11b-linux_gateway_page, 22b-linux_gateway_page)
+ ASM_EXCEPTIONTABLE_ENTRY(12b-linux_gateway_page, 22b-linux_gateway_page)
+ ASM_EXCEPTIONTABLE_ENTRY(13b-linux_gateway_page, 22b-linux_gateway_page)
+ ASM_EXCEPTIONTABLE_ENTRY(14b-linux_gateway_page, 22b-linux_gateway_page)
+ ASM_EXCEPTIONTABLE_ENTRY(15b-linux_gateway_page, 22b-linux_gateway_page)
+ ASM_EXCEPTIONTABLE_ENTRY(16b-linux_gateway_page, 22b-linux_gateway_page)
+ ASM_EXCEPTIONTABLE_ENTRY(17b-linux_gateway_page, 22b-linux_gateway_page)
+ ASM_EXCEPTIONTABLE_ENTRY(18b-linux_gateway_page, 22b-linux_gateway_page)
+ ASM_EXCEPTIONTABLE_ENTRY(19b-linux_gateway_page, 22b-linux_gateway_page)
+#ifndef CONFIG_64BIT
+ ASM_EXCEPTIONTABLE_ENTRY(20b-linux_gateway_page, 22b-linux_gateway_page)
+ ASM_EXCEPTIONTABLE_ENTRY(21b-linux_gateway_page, 22b-linux_gateway_page)
+#endif
/* Make sure nothing else is placed on this page */
.align PAGE_SIZE
@@ -675,8 +943,9 @@
/* Light-weight-syscall table */
/* Start of lws table. */
ENTRY(lws_table)
- LWS_ENTRY(compare_and_swap32) /* 0 - ELF32 Atomic compare and swap */
- LWS_ENTRY(compare_and_swap64) /* 1 - ELF64 Atomic compare and swap */
+ LWS_ENTRY(compare_and_swap32) /* 0 - ELF32 Atomic 32bit compare and swap */
+ LWS_ENTRY(compare_and_swap64) /* 1 - ELF64 Atomic 32bit compare and swap */
+ LWS_ENTRY(compare_and_swap_2) /* 2 - ELF32 Atomic 64bit compare and swap */
END(lws_table)
/* End of lws table */
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCHv2] 64bit LWS CAS
2014-07-29 19:13 [RFC PATCHv2] 64bit LWS CAS Guy Martin
@ 2014-07-29 21:24 ` Helge Deller
2014-07-30 9:17 ` Guy Martin
0 siblings, 1 reply; 6+ messages in thread
From: Helge Deller @ 2014-07-29 21:24 UTC (permalink / raw)
To: Guy Martin, linux-parisc
Hi Guy,
Very nice work !
On 07/29/2014 09:13 PM, Guy Martin wrote:
> Following the discussion about broken CAS for size != 4, I took a new
> approach and implemented in a different way.
>
> The new ABI takes the oldval, newval and mem as pointers plus a size
> parameter. This means that a single LWS can now handle all types of
> variable size.
> Note that the 32bit CAS for 64bit size has not been tested (not even
> compiled) since I can't compile a 32bit kernel a the moment.
I compile-tested it...(but didn't runtime tested it yet):
AS arch/parisc/kernel/syscall.o
/home/cvs/LINUX/git-kernel/linux-2.6/arch/parisc/kernel/syscall.S: Assembler messages:
/home/cvs/LINUX/git-kernel/linux-2.6/arch/parisc/kernel/syscall.S:866: Error: Invalid operands
/home/cvs/LINUX/git-kernel/linux-2.6/arch/parisc/kernel/syscall.S:870: Error: Invalid operands
Line 866 is the jump label 18:
18: ldd,ma 0(%sr3,%r26), %r29
Line 870 is 19:
19: ldd,ma 4(%sr3,%r26), %r29
I think both should be ldw ?
An idea:
Maybe the config option CONFIG_PA8X00 (which enables -march=2.0) can be used in
some places to use the 64bit assembler even on 32bit kernel (instead of using CONFIG_64BIT) ?
> My approach for 64bit CAS on 32bit is be the following :
> - Load old into 2 registers
> - Compare low and high part and bail out if different
> - Load new into a FPU register
> - Store the content of the FPU register to the memory
>
> The point here being to do the store in the last step in a single
> instruction.
> I think the same approach can be used for 128bit CAS as well but I
> don't think it's needed at the moment.
Since the 64bit CAS on 32bit currently uses 9 asms, it can't be added as is right now anyway.
Maybe it makes sense to pull out the "flddx 0(%sr3,%r24), %fr4" from this content and to preload
it to where you set up r22/r23-high/low ?
> Regading the GCC counterpart of the implementation, I'm not sure about
> the way to proceed.
>
> Should I try to detect the presence of the new LWS and use it for all
> CAS operations at init time ?
I leave this up to Dave & Carlos to answer.
> So far I only used the new LWS for 64bit CAS.
+ /***************************************************
+ New CAS implementation which uses pointers and variable size information.
+ The value pointed by old and new MUST NOT change while performing CAS.
+ The lock only protect the value at %r26.
+
+ %r26 - Address to examine
+ %r25 - Pointer to the value to check (old)
+ %r24 - Pointer to the value to set (new)
+ %r23 - Size of the variable (8bit = 0, 16bit = 1, 32bit = 2, 64bit = 4)
Since you shift %r23 in your code, I think the comment above is wrong for 64bit which should be 3 instead of 4 ?
You use nop's in your code to align to 32 bytes to be able to jump.
Does it make sense to use .align 32 instead ? I'm not sure myself about that...
Should we maybe drop the whole ENABLE_LWS_DEBUG thing? Was it ever used/enabled?
> I guess that using the new LWS unconditionally for all CAS operations
> isn't an option since it will break for newer gcc on old kernels.
Up to now we only had the 32bit CAS working correctly, so we shouldn't care much
about the other CAS anyway.
And if we get it backported into all relevant kernels before we change gcc
I would prefer this hard break...
Helge
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCHv2] 64bit LWS CAS
2014-07-29 21:24 ` Helge Deller
@ 2014-07-30 9:17 ` Guy Martin
2014-07-30 9:45 ` Guy Martin
2014-08-26 20:11 ` Helge Deller
0 siblings, 2 replies; 6+ messages in thread
From: Guy Martin @ 2014-07-30 9:17 UTC (permalink / raw)
To: Helge Deller; +Cc: linux-parisc
[-- Attachment #1: Type: text/plain, Size: 4137 bytes --]
Hi Helge,
On 2014-07-29 23:24, Helge Deller wrote:
> Hi Guy,
>
> Very nice work !
Thanks !
> I compile-tested it...(but didn't runtime tested it yet):
I have attached the small test program I use to test the LWS outside gcc
for easy testing.
Moreover, the gcc patch I attached previously is not the right one. It
also contained JDA's changes.
I have attached a patch that applies against gcc's trunk.
> AS arch/parisc/kernel/syscall.o
> /home/cvs/LINUX/git-kernel/linux-2.6/arch/parisc/kernel/syscall.S:
> Assembler messages:
> /home/cvs/LINUX/git-kernel/linux-2.6/arch/parisc/kernel/syscall.S:866:
> Error: Invalid operands
> /home/cvs/LINUX/git-kernel/linux-2.6/arch/parisc/kernel/syscall.S:870:
> Error: Invalid operands
>
> Line 866 is the jump label 18:
> 18: ldd,ma 0(%sr3,%r26), %r29
>
> Line 870 is 19:
> 19: ldd,ma 4(%sr3,%r26), %r29
>
> I think both should be ldw ?
Indeed, fixed in the new patch attached.
> An idea:
> Maybe the config option CONFIG_PA8X00 (which enables -march=2.0) can be
> used in
> some places to use the 64bit assembler even on 32bit kernel (instead
> of using CONFIG_64BIT) ?
Indeed, this is definitely a good idea. But it should probably be a
separate patch since we need to touch the existing code to enable wide
mode when entering the LWS.
>
>> My approach for 64bit CAS on 32bit is be the following :
>> - Load old into 2 registers
>> - Compare low and high part and bail out if different
>> - Load new into a FPU register
>> - Store the content of the FPU register to the memory
>>
>> The point here being to do the store in the last step in a single
>> instruction.
>> I think the same approach can be used for 128bit CAS as well but I
>> don't think it's needed at the moment.
>
> Since the 64bit CAS on 32bit currently uses 9 asms, it can't be added
> as is right now anyway.
> Maybe it makes sense to pull out the "flddx 0(%sr3,%r24), %fr4" from
> this content and to preload
> it to where you set up r22/r23-high/low ?
Good point, I changed that too. However, having 9 asm isns shouldn't be
a problem there because we won't jump past that.
>> Regading the GCC counterpart of the implementation, I'm not sure about
>> the way to proceed.
>>
>> Should I try to detect the presence of the new LWS and use it for all
>> CAS operations at init time ?
>
> I leave this up to Dave & Carlos to answer.
>
>> So far I only used the new LWS for 64bit CAS.
>
> + /***************************************************
> + New CAS implementation which uses pointers and
> variable size information.
> + The value pointed by old and new MUST NOT change while
> performing CAS.
> + The lock only protect the value at %r26.
> +
> + %r26 - Address to examine
> + %r25 - Pointer to the value to check (old)
> + %r24 - Pointer to the value to set (new)
> + %r23 - Size of the variable (8bit = 0, 16bit = 1,
> 32bit = 2, 64bit = 4)
>
> Since you shift %r23 in your code, I think the comment above is wrong
> for 64bit which should be 3 instead of 4 ?
Fixed.
> You use nop's in your code to align to 32 bytes to be able to jump.
> Does it make sense to use .align 32 instead ? I'm not sure myself about
> that...
I don't know either, nop seemed like the easiest and cleanest way to do
so for me.
> Should we maybe drop the whole ENABLE_LWS_DEBUG thing? Was it ever
> used/enabled?
Indeed, I have not tested it and I dropped it in this new patch.
>> I guess that using the new LWS unconditionally for all CAS operations
>> isn't an option since it will break for newer gcc on old kernels.
>
> Up to now we only had the 32bit CAS working correctly, so we shouldn't
> care much
> about the other CAS anyway.
> And if we get it backported into all relevant kernels before we change
> gcc
> I would prefer this hard break...
32bit CAS works but as far as I understand, 8 and 16 bit is broken since
the stw in the current LWS will overwrite 3 and 2 bytes of memory
respectively with zeros.
So IMHO this should definitely be changed.
Guy
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: linux-hppa-atomic-cas2_v2.patch --]
[-- Type: text/x-patch; name=linux-hppa-atomic-cas2_v2.patch, Size: 7767 bytes --]
diff --git a/arch/parisc/kernel/syscall.S b/arch/parisc/kernel/syscall.S
index 8387860..d565e1f 100644
--- a/arch/parisc/kernel/syscall.S
+++ b/arch/parisc/kernel/syscall.S
@@ -74,7 +74,7 @@ ENTRY(linux_gateway_page)
/* ADDRESS 0xb0 to 0xb8, lws uses two insns for entry */
/* Light-weight-syscall entry must always be located at 0xb0 */
/* WARNING: Keep this number updated with table size changes */
-#define __NR_lws_entries (2)
+#define __NR_lws_entries (3)
lws_entry:
gate lws_start, %r0 /* increase privilege */
@@ -502,7 +502,7 @@ lws_exit:
/***************************************************
- Implementing CAS as an atomic operation:
+ Implementing 32bit CAS as an atomic operation:
%r26 - Address to examine
%r25 - Old value to check (old)
@@ -659,6 +659,239 @@ cas_action:
ASM_EXCEPTIONTABLE_ENTRY(2b-linux_gateway_page, 3b-linux_gateway_page)
+ /***************************************************
+ New CAS implementation which uses pointers and variable size information.
+ The value pointed by old and new MUST NOT change while performing CAS.
+ The lock only protect the value at %r26.
+
+ %r26 - Address to examine
+ %r25 - Pointer to the value to check (old)
+ %r24 - Pointer to the value to set (new)
+ %r23 - Size of the variable (8bit = 0, 16bit = 1, 32bit = 2, 64bit = 3)
+ %r28 - Return non-zero on failure
+ %r21 - Kernel error code
+
+ If debugging is DISabled:
+
+ %r21 has the following meanings:
+
+ EAGAIN - CAS is busy, ldcw failed, try again.
+ EFAULT - Read or write failed.
+
+ If debugging is enabled:
+
+ EDEADLOCK - CAS called recursively.
+ EAGAIN && r28 == 1 - CAS is busy. Lock contended.
+ EAGAIN && r28 == 2 - CAS is busy. ldcw failed.
+ EFAULT - Read or write failed.
+
+ Scratch: r20, r22, r28, r29, r1, fr4 (32bit for 64bit CAS only)
+
+ ****************************************************/
+
+ /* ELF32 Process entry path */
+lws_compare_and_swap_2:
+#ifdef CONFIG_64BIT
+ /* Clip the input registers */
+ depdi 0, 31, 32, %r26
+ depdi 0, 31, 32, %r25
+ depdi 0, 31, 32, %r24
+ depdi 0, 31, 32, %r23
+#endif
+
+ /* Check the validity of the size pointer */
+ subi,>>= 4, %r23, %r0
+ b,n lws_exit_nosys
+
+ /* Jump to the functions which will load the old and new values into
+ registers depending on the their size */
+ shlw %r23, 2, %r29
+ blr %r29, %r0
+ nop
+
+ /* 8bit load */
+4: ldb 0(%sr3,%r25), %r25
+ b cas2_lock_start
+5: ldb 0(%sr3,%r24), %r24
+ nop
+ nop
+ nop
+ nop
+ nop
+
+ /* 16bit load */
+6: ldh 0(%sr3,%r25), %r25
+ b cas2_lock_start
+7: ldh 0(%sr3,%r24), %r24
+ nop
+ nop
+ nop
+ nop
+ nop
+
+ /* 32bit load */
+8: ldw 0(%sr3,%r25), %r25
+ b cas2_lock_start
+9: ldw 0(%sr3,%r24), %r24
+ nop
+ nop
+ nop
+ nop
+ nop
+
+ /* 64bit load */
+#ifdef CONFIG_64BIT
+10: ldd 0(%sr3,%r25), %r25
+11: ldd 0(%sr3,%r24), %r24
+#else
+ /* Load new value into r22/r23 - high/low */
+10: ldw 0(%sr3,%r25), %r22
+11: ldw 4(%sr3,%r25), %r23
+ /* Load new value into fr4 for atomic store later */
+12: flddx 0(%sr3,%r24), %fr4
+#endif
+
+cas2_lock_start:
+ /* Load start of lock table */
+ ldil L%lws_lock_start, %r20
+ ldo R%lws_lock_start(%r20), %r28
+
+ /* Extract four bits from r26 and hash lock (Bits 4-7) */
+ extru %r26, 27, 4, %r20
+
+ /* Find lock to use, the hash is either one of 0 to
+ 15, multiplied by 16 (keep it 16-byte aligned)
+ and add to the lock table offset. */
+ shlw %r20, 4, %r20
+ add %r20, %r28, %r20
+
+ rsm PSW_SM_I, %r0 /* Disable interrupts */
+ /* COW breaks can cause contention on UP systems */
+ LDCW 0(%sr2,%r20), %r28 /* Try to acquire the lock */
+ cmpb,<>,n %r0, %r28, cas2_action /* Did we get it? */
+cas2_wouldblock:
+ ldo 2(%r0), %r28 /* 2nd case */
+ ssm PSW_SM_I, %r0
+ b lws_exit /* Contended... */
+ ldo -EAGAIN(%r0), %r21 /* Spin in userspace */
+
+ /*
+ prev = *addr;
+ if ( prev == old )
+ *addr = new;
+ return prev;
+ */
+
+ /* NOTES:
+ This all works becuse intr_do_signal
+ and schedule both check the return iasq
+ and see that we are on the kernel page
+ so this process is never scheduled off
+ or is ever sent any signal of any sort,
+ thus it is wholly atomic from usrspaces
+ perspective
+ */
+cas2_action:
+ /* Jump to the correct function */
+ blr %r29, %r0
+ /* Set %r28 as non-zero for now */
+ ldo 1(%r0),%r28
+
+ /* 8bit CAS */
+13: ldb,ma 0(%sr3,%r26), %r29
+ sub,= %r29, %r25, %r0
+ b,n cas2_end
+14: stb,ma %r24, 0(%sr3,%r26)
+ b cas2_end
+ copy %r0, %r28
+ nop
+ nop
+
+ /* 16bit CAS */
+15: ldh,ma 0(%sr3,%r26), %r29
+ sub,= %r29, %r25, %r0
+ b,n cas2_end
+16: sth,ma %r24, 0(%sr3,%r26)
+ b cas2_end
+ copy %r0, %r28
+ nop
+ nop
+
+ /* 32bit CAS */
+17: ldw,ma 0(%sr3,%r26), %r29
+ sub,= %r29, %r25, %r0
+ b,n cas2_end
+18: stw,ma %r24, 0(%sr3,%r26)
+ b cas2_end
+ copy %r0, %r28
+ nop
+ nop
+
+ /* 64bit CAS */
+#ifdef CONFIG_64BIT
+19: ldd,ma 0(%sr3,%r26), %r29
+ sub,= %r29, %r25, %r0
+ b,n cas2_end
+20: std,ma %r24, 0(%sr3,%r26)
+ copy %r0, %r28
+#else
+ /* Compare first word */
+19: ldw,ma 0(%sr3,%r26), %r29
+ sub,= %r29, %r22, %r0
+ b,n cas2_end
+ /* Compare second word */
+20: ldw,ma 4(%sr3,%r26), %r29
+ sub,= %r29, %r23, %r0
+ b,n cas2_end
+ /* Perform the store */
+21: fstdx %fr4, 0(%sr3,%r26)
+ copy %r0, %r28
+#endif
+
+cas2_end:
+ /* Free lock */
+ stw,ma %r20, 0(%sr2,%r20)
+ /* Enable interrupts */
+ ssm PSW_SM_I, %r0
+ /* Return to userspace, set no error */
+ b lws_exit
+ copy %r0, %r21
+
+22:
+ /* Error occurred on load or store */
+ /* Free lock */
+ stw %r20, 0(%sr2,%r20)
+ ssm PSW_SM_I, %r0
+ ldo 1(%r0),%r28
+ b lws_exit
+ ldo -EFAULT(%r0),%r21 /* set errno */
+ nop
+ nop
+ nop
+
+ /* Exception table entries, for the load and store, return EFAULT.
+ Each of the entries must be relocated. */
+ ASM_EXCEPTIONTABLE_ENTRY(4b-linux_gateway_page, 22b-linux_gateway_page)
+ ASM_EXCEPTIONTABLE_ENTRY(5b-linux_gateway_page, 22b-linux_gateway_page)
+ ASM_EXCEPTIONTABLE_ENTRY(6b-linux_gateway_page, 22b-linux_gateway_page)
+ ASM_EXCEPTIONTABLE_ENTRY(7b-linux_gateway_page, 22b-linux_gateway_page)
+ ASM_EXCEPTIONTABLE_ENTRY(8b-linux_gateway_page, 22b-linux_gateway_page)
+ ASM_EXCEPTIONTABLE_ENTRY(9b-linux_gateway_page, 22b-linux_gateway_page)
+ ASM_EXCEPTIONTABLE_ENTRY(10b-linux_gateway_page, 22b-linux_gateway_page)
+ ASM_EXCEPTIONTABLE_ENTRY(11b-linux_gateway_page, 22b-linux_gateway_page)
+ ASM_EXCEPTIONTABLE_ENTRY(13b-linux_gateway_page, 22b-linux_gateway_page)
+ ASM_EXCEPTIONTABLE_ENTRY(14b-linux_gateway_page, 22b-linux_gateway_page)
+ ASM_EXCEPTIONTABLE_ENTRY(15b-linux_gateway_page, 22b-linux_gateway_page)
+ ASM_EXCEPTIONTABLE_ENTRY(16b-linux_gateway_page, 22b-linux_gateway_page)
+ ASM_EXCEPTIONTABLE_ENTRY(17b-linux_gateway_page, 22b-linux_gateway_page)
+ ASM_EXCEPTIONTABLE_ENTRY(18b-linux_gateway_page, 22b-linux_gateway_page)
+ ASM_EXCEPTIONTABLE_ENTRY(19b-linux_gateway_page, 22b-linux_gateway_page)
+ ASM_EXCEPTIONTABLE_ENTRY(20b-linux_gateway_page, 22b-linux_gateway_page)
+#ifndef CONFIG_64BIT
+ ASM_EXCEPTIONTABLE_ENTRY(12b-linux_gateway_page, 22b-linux_gateway_page)
+ ASM_EXCEPTIONTABLE_ENTRY(21b-linux_gateway_page, 22b-linux_gateway_page)
+#endif
+
/* Make sure nothing else is placed on this page */
.align PAGE_SIZE
END(linux_gateway_page)
@@ -675,8 +908,9 @@ ENTRY(end_linux_gateway_page)
/* Light-weight-syscall table */
/* Start of lws table. */
ENTRY(lws_table)
- LWS_ENTRY(compare_and_swap32) /* 0 - ELF32 Atomic compare and swap */
- LWS_ENTRY(compare_and_swap64) /* 1 - ELF64 Atomic compare and swap */
+ LWS_ENTRY(compare_and_swap32) /* 0 - ELF32 Atomic 32bit compare and swap */
+ LWS_ENTRY(compare_and_swap64) /* 1 - ELF64 Atomic 32bit compare and swap */
+ LWS_ENTRY(compare_and_swap_2) /* 2 - ELF32 Atomic 64bit compare and swap */
END(lws_table)
/* End of lws table */
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [RFC PATCHv2] 64bit LWS CAS
2014-07-30 9:17 ` Guy Martin
@ 2014-07-30 9:45 ` Guy Martin
2014-08-26 20:11 ` Helge Deller
1 sibling, 0 replies; 6+ messages in thread
From: Guy Martin @ 2014-07-30 9:45 UTC (permalink / raw)
To: Helge Deller; +Cc: linux-parisc, linux-parisc-owner
[-- Attachment #1: Type: text/plain, Size: 52 bytes --]
Even better with all the files attached ....
Guy
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: gcc-64bit-atomic-cas2_v2.patch --]
[-- Type: text/x-patch; name=gcc-64bit-atomic-cas2_v2.patch, Size: 4521 bytes --]
Index: libgcc/config/pa/linux-atomic.c
===================================================================
--- libgcc/config/pa/linux-atomic.c (revision 212942)
+++ libgcc/config/pa/linux-atomic.c (working copy)
@@ -75,6 +75,31 @@
return lws_errno;
}
+static inline long
+__kernel_cmpxchg2 (void * oldval, void * newval, void *mem, int val_size)
+{
+
+ register unsigned long lws_mem asm("r26") = (unsigned long) (mem);
+ register long lws_ret asm("r28");
+ register long lws_errno asm("r21");
+ register unsigned long lws_old asm("r25") = (unsigned long) oldval;
+ register unsigned long lws_new asm("r24") = (unsigned long) newval;
+ register int lws_size asm("r23") = val_size;
+ asm volatile ( "ble 0xb0(%%sr2, %%r0) \n\t"
+ "ldi %2, %%r20 \n\t"
+ : "=r" (lws_ret), "=r" (lws_errno)
+ : "i" (2), "r" (lws_mem), "r" (lws_old), "r" (lws_new), "r" (lws_size)
+ : "r1", "r20", "r22", "r29", "r31", "fr4", "memory"
+ );
+ if (__builtin_expect (lws_errno == -EFAULT || lws_errno == -ENOSYS, 0))
+ ABORT_INSTRUCTION;
+
+ /* If the kernel LWS call fails, retrun EBUSY */
+ if (!lws_errno && lws_ret)
+ lws_errno = -EBUSY;
+
+ return lws_errno;
+}
#define HIDDEN __attribute__ ((visibility ("hidden")))
/* Big endian masks */
@@ -84,6 +109,29 @@
#define MASK_1 0xffu
#define MASK_2 0xffffu
+#define FETCH_AND_OP_DWORD(OP, PFX_OP, INF_OP) \
+ long long HIDDEN \
+ __sync_fetch_and_##OP##_8 (long long *ptr, long long val) \
+ { \
+ long long tmp, newval; \
+ int failure; \
+ \
+ do { \
+ tmp = *ptr; \
+ newval = PFX_OP (tmp INF_OP val); \
+ failure = __kernel_cmpxchg2 (&tmp, &newval, ptr, 3); \
+ } while (failure != 0); \
+ \
+ return tmp; \
+ }
+
+FETCH_AND_OP_DWORD (add, , +)
+FETCH_AND_OP_DWORD (sub, , -)
+FETCH_AND_OP_DWORD (or, , |)
+FETCH_AND_OP_DWORD (and, , &)
+FETCH_AND_OP_DWORD (xor, , ^)
+FETCH_AND_OP_DWORD (nand, ~, &)
+
#define FETCH_AND_OP_WORD(OP, PFX_OP, INF_OP) \
int HIDDEN \
__sync_fetch_and_##OP##_4 (int *ptr, int val) \
@@ -147,6 +195,29 @@
SUBWORD_SYNC_OP (xor, , ^, unsigned char, 1, oldval)
SUBWORD_SYNC_OP (nand, ~, &, unsigned char, 1, oldval)
+#define OP_AND_FETCH_DWORD(OP, PFX_OP, INF_OP) \
+ long long HIDDEN \
+ __sync_##OP##_and_fetch_8 (long long *ptr, long long val) \
+ { \
+ long long tmp, newval; \
+ int failure; \
+ \
+ do { \
+ tmp = *ptr; \
+ newval = PFX_OP (tmp INF_OP val); \
+ failure = __kernel_cmpxchg2 (&tmp, &newval, ptr, 3); \
+ } while (failure != 0); \
+ \
+ return PFX_OP (tmp INF_OP val); \
+ }
+
+OP_AND_FETCH_DWORD (add, , +)
+OP_AND_FETCH_DWORD (sub, , -)
+OP_AND_FETCH_DWORD (or, , |)
+OP_AND_FETCH_DWORD (and, , &)
+OP_AND_FETCH_DWORD (xor, , ^)
+OP_AND_FETCH_DWORD (nand, ~, &)
+
#define OP_AND_FETCH_WORD(OP, PFX_OP, INF_OP) \
int HIDDEN \
__sync_##OP##_and_fetch_4 (int *ptr, int val) \
@@ -182,6 +253,26 @@
SUBWORD_SYNC_OP (xor, , ^, unsigned char, 1, newval)
SUBWORD_SYNC_OP (nand, ~, &, unsigned char, 1, newval)
+long long HIDDEN
+__sync_val_compare_and_swap_8 (long long *ptr, long long oldval, long long newval)
+{
+ long long actual_oldval;
+ int fail;
+
+ while (1)
+ {
+ actual_oldval = *ptr;
+
+ if (__builtin_expect (oldval != actual_oldval, 0))
+ return actual_oldval;
+
+ fail = __kernel_cmpxchg2 (&actual_oldval, &newval, ptr, 3);
+
+ if (__builtin_expect (!fail, 1))
+ return actual_oldval;
+ }
+}
+
int HIDDEN
__sync_val_compare_and_swap_4 (int *ptr, int oldval, int newval)
{
@@ -256,6 +347,20 @@
SUBWORD_BOOL_CAS (unsigned short, 2)
SUBWORD_BOOL_CAS (unsigned char, 1)
+long long HIDDEN
+__sync_lock_test_and_set_8 (long long *ptr, long long val)
+{
+ long long oldval;
+ int failure;
+
+ do {
+ oldval = *ptr;
+ failure = __kernel_cmpxchg2 (&oldval, &val, ptr, 3);
+ } while (failure != 0);
+
+ return oldval;
+}
+
int HIDDEN
__sync_lock_test_and_set_4 (int *ptr, int val)
{
@@ -294,6 +399,17 @@
SUBWORD_TEST_AND_SET (unsigned char, 1)
void HIDDEN
+__sync_lock_release_8 (int *ptr)
+{
+ long long failure, oldval, zero = 0;
+
+ do {
+ oldval = *ptr;
+ failure = __kernel_cmpxchg2 (&oldval, &zero, ptr, 3);
+ } while (failure != 0);
+}
+
+void HIDDEN
__sync_lock_release_4 (int *ptr)
{
int failure, oldval;
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: hppa-cas2-test.c --]
[-- Type: text/x-csrc; name=hppa-cas2-test.c, Size: 2903 bytes --]
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <errno.h>
/* Kernel helper for compare-and-exchange values. */
static inline long
kernel_cmpxchg2 (void * oldval, void * newval, void *mem, int val_size)
{
printf("Oldval : %p, Newval : %p, mem : %p, size : %u\n", oldval, newval, mem, val_size);
register unsigned long lws_mem asm("r26") = (unsigned long) (mem);
register long lws_ret asm("r28");
register long lws_errno asm("r21");
register unsigned long lws_old asm("r25") = (unsigned long) oldval;
register unsigned long lws_new asm("r24") = (unsigned long) newval;
register int lws_size asm("r23") = val_size;
asm volatile ( "ble 0xb0(%%sr2, %%r0) \n\t"
"ldi %2, %%r20 \n\t"
: "=r" (lws_ret), "=r" (lws_errno)
: "i" (2), "r" (lws_mem), "r" (lws_old), "r" (lws_new), "r" (lws_size)
: "r1", "r20", "r22", "r29", "r31", "fr4", "memory"
);
if (lws_errno == -EFAULT || lws_errno == -ENOSYS)
abort();
/* If the kernel LWS call fails, retrun EBUSY */
if (!lws_errno && lws_ret)
lws_errno = -EBUSY;
printf("lws_errno : %ld, lws_ret : %ld\n", lws_errno, lws_ret);
return lws_errno;
}
long long sync_fetch_and_add_8(long long *ptr, long long val) {
int failure;
long long tmp, newval;
do {
tmp = *ptr;
newval = tmp + val;
failure = kernel_cmpxchg2(&tmp, &newval, ptr, 3);
printf("Failure : %d\n", failure);
} while (failure != 0);
return tmp;
}
int sync_fetch_and_add_4(int *ptr, int val) {
int failure;
int tmp, newval;
do {
tmp = *ptr;
newval = tmp + val;
failure = kernel_cmpxchg2(&tmp, &newval, ptr, 2);
printf("Failure : %d\n", failure);
} while (failure != 0);
return tmp;
}
int sync_fetch_and_add_2(unsigned short *ptr, unsigned short val) {
int failure;
unsigned short tmp, newval;
do {
tmp = *ptr;
newval = tmp + val;
failure = kernel_cmpxchg2(&tmp, &newval, ptr, 1);
printf("Failure : %d\n", failure);
} while (failure != 0);
return tmp;
}
int sync_fetch_and_add_1(unsigned char *ptr, unsigned char val) {
int failure;
unsigned char tmp, newval;
do {
tmp = *ptr;
newval = tmp + val;
failure = kernel_cmpxchg2(&tmp, &newval, ptr, 0);
printf("Failure : %d\n", failure);
} while (failure != 0);
return tmp;
}
int main() {
unsigned char a1 = 0x2;
unsigned char b1 = sync_fetch_and_add_1(&a1, 1);
unsigned char *c1 = &a1;
printf("1 | a : %hhx, b : %hhx, *c : %hhx\n", a1, b1, *c1);
unsigned short a2 = 0x1002;
unsigned short b2 = sync_fetch_and_add_2(&a2, 1);
unsigned short *c2 = &a2;
printf("2 | a : %hx, b : %hx, *c : %hx\n", a2, b2, *c2);
int a4 = 0x3000002;
int b4 = sync_fetch_and_add_4(&a4, 1);
int *c4 = &a4;
printf("4 | a : %x, b : %x, *c : %x\n", a4, b4, *c4);
long long a8 = 0x30000000012;
long long b8 = __sync_fetch_and_add_8(&a8, 1);
long long *c8 = &a8;
printf("8 | a : %llx, b : %llx, *c : %llx\n", a8, b8, *c8);
return 0;
}
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCHv2] 64bit LWS CAS
2014-07-30 9:17 ` Guy Martin
2014-07-30 9:45 ` Guy Martin
@ 2014-08-26 20:11 ` Helge Deller
2014-08-27 8:35 ` Guy Martin
1 sibling, 1 reply; 6+ messages in thread
From: Helge Deller @ 2014-08-26 20:11 UTC (permalink / raw)
To: Guy Martin; +Cc: linux-parisc
Hi Guy,
should we try to do a last round of cleanup of this patch, since
I would like to include it in the next push to Linus...
On 07/30/2014 11:17 AM, Guy Martin wrote:
>>> Regading the GCC counterpart of the implementation, I'm not sure about
>>> the way to proceed.
>>>
>>> Should I try to detect the presence of the new LWS and use it for all
>>> CAS operations at init time ?
>>
>> I leave this up to Dave & Carlos to answer.
I think it's OK to stay using the existing 32bit implementation for 32bit CAS,
and just use the new one for 8/16/64 bit CAS.
And maybe: If the LWS fails, just crash the application.
>> Should we maybe drop the whole ENABLE_LWS_DEBUG thing? Was it ever used/enabled?
>
> Indeed, I have not tested it and I dropped it in this new patch.
Good.
In comments at the top still include info about the debug case:
-> "If debugging is DISabled:..."
>>> I guess that using the new LWS unconditionally for all CAS operations
>>> isn't an option since it will break for newer gcc on old kernels.
>>
>> Up to now we only had the 32bit CAS working correctly, so we shouldn't care much
>> about the other CAS anyway.
>> And if we get it backported into all relevant kernels before we change gcc
>> I would prefer this hard break...
I'm still thinking this is the right way.
Your patch had some whitespace errors too.
Please run it through the scripts/checkpatch tool in the kernel tree.
If you like I can take care of the suggested changes and send a revised patch for you?
Just let me know.
Helge
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCHv2] 64bit LWS CAS
2014-08-26 20:11 ` Helge Deller
@ 2014-08-27 8:35 ` Guy Martin
0 siblings, 0 replies; 6+ messages in thread
From: Guy Martin @ 2014-08-27 8:35 UTC (permalink / raw)
To: Helge Deller; +Cc: linux-parisc
Hi Helge,
On Tue, 26 Aug 2014 22:11:43 +0200
Helge Deller <deller@gmx.de> wrote:
> should we try to do a last round of cleanup of this patch, since
> I would like to include it in the next push to Linus...
>
> On 07/30/2014 11:17 AM, Guy Martin wrote:
> >>> Regading the GCC counterpart of the implementation, I'm not sure
> >>> about the way to proceed.
> >>>
> >>> Should I try to detect the presence of the new LWS and use it for
> >>> all CAS operations at init time ?
> >>
> >> I leave this up to Dave & Carlos to answer.
>
> I think it's OK to stay using the existing 32bit implementation for
> 32bit CAS, and just use the new one for 8/16/64 bit CAS.
> And maybe: If the LWS fails, just crash the application.
I seems like the best solution, at least apps will not use a broken
implementation for 8/16bit.
> >> Should we maybe drop the whole ENABLE_LWS_DEBUG thing? Was it ever
> >> used/enabled?
> >
> > Indeed, I have not tested it and I dropped it in this new patch.
>
> Good.
> In comments at the top still include info about the debug case:
> -> "If debugging is DISabled:..."
Indeed.
> >>> I guess that using the new LWS unconditionally for all CAS
> >>> operations isn't an option since it will break for newer gcc on
> >>> old kernels.
> >>
> >> Up to now we only had the 32bit CAS working correctly, so we
> >> shouldn't care much about the other CAS anyway.
> >> And if we get it backported into all relevant kernels before we
> >> change gcc I would prefer this hard break...
>
> I'm still thinking this is the right way.
>
> Your patch had some whitespace errors too.
> Please run it through the scripts/checkpatch tool in the kernel tree.
>
> If you like I can take care of the suggested changes and send a
> revised patch for you? Just let me know.
Please go ahead ! I just became a father and as you can imagine, I have
very little time to spend on hacking things up :)
Also, I'm not sure that the asm exception tables are working
correctly. I managed to crash my kernel on several occasions while
passing bad values, causing an invalid pointer dereference.
I'll try to work on the gcc patches but it won't be this week, maybe
next week, or I'll have more time in 2 weeks. If someone wants to step
in, feel free to do so too.
Regards,
Guy
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-08-27 8:35 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-29 19:13 [RFC PATCHv2] 64bit LWS CAS Guy Martin
2014-07-29 21:24 ` Helge Deller
2014-07-30 9:17 ` Guy Martin
2014-07-30 9:45 ` Guy Martin
2014-08-26 20:11 ` Helge Deller
2014-08-27 8:35 ` Guy Martin
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.