* [PATCH RFC] arm64: 32-bit tolerant sync bitops
@ 2014-04-17  8:38 Vladimir Murzin
  2014-04-17  8:41 ` [PATCH] xen: use sync_clear_bit instead of clear_bit Vladimir Murzin
  2014-04-17 10:42 ` [Xen-devel] [PATCH RFC] arm64: 32-bit tolerant sync bitops David Vrabel
  0 siblings, 2 replies; 12+ messages in thread
From: Vladimir Murzin @ 2014-04-17  8:38 UTC (permalink / raw)
  To: linux-arm-kernel
Xen assumes that bit operations are able to operate on 32-bit size and
alignment [1]. For arm64 bitops are based on atomic exclusive load/store
instructions to guarantee that changes are made atomically. However, these
instructions require that address to be aligned to the data size. Because, by
default, bitops operates on 64-bit size it implies that address should be
aligned appropriately. All these lead to breakage of Xen assumption for bitops
properties.
With this patch 32-bit sized/aligned bitops is implemented. 
[1] http://www.gossamer-threads.com/lists/xen/devel/325613
Signed-off-by: Vladimir Murzin <murzin.v@gmail.com>
---
 Apart this patch other approaches were implemented:
 1. turn bitops to be 32-bit size/align tolerant.
    the changes are minimal, but I'm not sure how broad side effect might be
 2. separate 32-bit size/aligned operations.
    it exports new API, which might not be good
 All implementations based on arm64 version of bitops and were boot tested
 only. Hope, I didn't miss something ;) 
 arch/arm64/include/asm/sync_bitops.h | 60 ++++++++++++++++++++++++++++++++----
 1 file changed, 54 insertions(+), 6 deletions(-)
diff --git a/arch/arm64/include/asm/sync_bitops.h b/arch/arm64/include/asm/sync_bitops.h
index 8da0bf4..809926f 100644
--- a/arch/arm64/include/asm/sync_bitops.h
+++ b/arch/arm64/include/asm/sync_bitops.h
@@ -3,6 +3,7 @@
 
 #include <asm/bitops.h>
 #include <asm/cmpxchg.h>
+#include <linux/stringify.h>
 
 /* sync_bitops functions are equivalent to the SMP implementation of the
  * original functions, independently from CONFIG_SMP being defined.
@@ -12,14 +13,61 @@
  * who might be on another CPU (e.g. two uniprocessor guests communicating
  * via event channels and grant tables). So we need a variant of the bit
  * ops which are SMP safe even on a UP kernel.
+ *
+ * Xen assumes that bitops are 32-bit sized/aligned
  */
 
-#define sync_set_bit(nr, p)            set_bit(nr, p)
-#define sync_clear_bit(nr, p)          clear_bit(nr, p)
-#define sync_change_bit(nr, p)         change_bit(nr, p)
-#define sync_test_and_set_bit(nr, p)   test_and_set_bit(nr, p)
-#define sync_test_and_clear_bit(nr, p) test_and_clear_bit(nr, p)
-#define sync_test_and_change_bit(nr, p)        test_and_change_bit(nr, p)
+#define sync_bitop32(name, instr)					\
+static inline void sync_##name(int nr, volatile unsigned long *addr)	\
+{									\
+	unsigned tmp1, tmp2;						\
+	asm volatile(							\
+	"	and	%w1, %w2, #31\n"				\
+	"	eor	%w2, %w2, %w1\n"				\
+	"	mov	%w0, #1\n"					\
+	"	add	%3, %3, %2, lsr #2\n"				\
+	"	lsl	%w1, %w0, %w1\n"				\
+	"1:	ldxr	%w0, [%3]\n"					\
+	__stringify(instr)" %w0, %w0, %w1\n"				\
+	"	stxr	%w2, %w0, [%3]\n"				\
+	"	cbnz	%w2, 1b\n"					\
+	: "=&r"(tmp1), "=&r"(tmp2)		 			\
+	: "r"(nr), "r"(addr)						\
+	: "memory");							\
+}
+
+#define sync_testop32(name, instr)					\
+static inline int sync_##name(int nr, volatile unsigned long *addr)	\
+{									\
+	int oldbit;							\
+	unsigned tmp1, tmp2, tmp3;	  				\
+	asm volatile(							\
+	"	and	%w1, %w4, #31\n"				\
+	"	eor	%w4, %w4, %w1\n"				\
+	"	mov	%w0, #1\n"					\
+	"	add	%5, %5, %4, lsr #2\n"				\
+	"	lsl	%w2, %w0, %w1\n"				\
+	"1:	ldxr	%w0, [%5]\n"					\
+	"	lsr	%w3, %w0, %w1\n"				\
+	__stringify(instr)" %w0, %w0, %w2\n"				\
+	"	stlxr	%w4, %w0, [%5]\n"				\
+	"	cbnz	%w4, 1b\n"					\
+	"	dmb	ish\n"						\
+	"	and	%w3, %w3, #1\n"					\
+	: "=&r"(tmp1), "=&r"(tmp2), "=&r"(tmp3), "=&r"(oldbit)		\
+	: "r"(nr), "r"(addr)						\
+	: "memory");							\
+	return oldbit;							\
+}
+
+sync_bitop32(set_bit, orr)
+sync_bitop32(clear_bit, bic)
+sync_bitop32(change_bit, eor)
+
+sync_testop32(test_and_set_bit, orr)
+sync_testop32(test_and_clear_bit, bic)
+sync_testop32(test_and_change_bit, eor)
+
 #define sync_test_bit(nr, addr)                test_bit(nr, addr)
 #define sync_cmpxchg                   cmpxchg
 
-- 
1.8.3.2
^ permalink raw reply related	[flat|nested] 12+ messages in thread
* [PATCH] xen: use sync_clear_bit instead of clear_bit
  2014-04-17  8:38 [PATCH RFC] arm64: 32-bit tolerant sync bitops Vladimir Murzin
@ 2014-04-17  8:41 ` Vladimir Murzin
  2014-04-17 10:23   ` David Vrabel
  2014-04-17 10:42 ` [Xen-devel] [PATCH RFC] arm64: 32-bit tolerant sync bitops David Vrabel
  1 sibling, 1 reply; 12+ messages in thread
From: Vladimir Murzin @ 2014-04-17  8:41 UTC (permalink / raw)
  To: linux-arm-kernel
Signed-off-by: Vladimir Murzin <murzin.v@gmail.com>
---
 For those who whant to play with original issue [1]
 
 [1] http://www.gossamer-threads.com/lists/xen/devel/325613
 drivers/xen/events/events_fifo.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/xen/events/events_fifo.c b/drivers/xen/events/events_fifo.c
index 96109a9..91248d8 100644
--- a/drivers/xen/events/events_fifo.c
+++ b/drivers/xen/events/events_fifo.c
@@ -273,7 +273,7 @@ static void consume_one_event(unsigned cpu,
 	 * copy of the ready word.
 	 */
 	if (head == 0)
-		clear_bit(priority, BM(ready));
+		sync_clear_bit(priority, BM(ready));
 
 	if (sync_test_bit(EVTCHN_FIFO_PENDING, BM(word))
 	    && !sync_test_bit(EVTCHN_FIFO_MASKED, BM(word)))
-- 
1.8.3.2
^ permalink raw reply related	[flat|nested] 12+ messages in thread
* [PATCH] xen: use sync_clear_bit instead of clear_bit
  2014-04-17  8:41 ` [PATCH] xen: use sync_clear_bit instead of clear_bit Vladimir Murzin
@ 2014-04-17 10:23   ` David Vrabel
  0 siblings, 0 replies; 12+ messages in thread
From: David Vrabel @ 2014-04-17 10:23 UTC (permalink / raw)
  To: linux-arm-kernel
On 17/04/14 09:41, Vladimir Murzin wrote:
> Signed-off-by: Vladimir Murzin <murzin.v@gmail.com>
> ---
>  For those who whant to play with original issue [1]
>  
>  [1] http://www.gossamer-threads.com/lists/xen/devel/325613
> 
>  drivers/xen/events/events_fifo.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/xen/events/events_fifo.c b/drivers/xen/events/events_fifo.c
> index 96109a9..91248d8 100644
> --- a/drivers/xen/events/events_fifo.c
> +++ b/drivers/xen/events/events_fifo.c
> @@ -273,7 +273,7 @@ static void consume_one_event(unsigned cpu,
>  	 * copy of the ready word.
>  	 */
>  	if (head == 0)
> -		clear_bit(priority, BM(ready));
> +		sync_clear_bit(priority, BM(ready));
I prefer the fix that makes ready an unsigned long.
David
^ permalink raw reply	[flat|nested] 12+ messages in thread
* [Xen-devel] [PATCH RFC] arm64: 32-bit tolerant sync bitops
  2014-04-17  8:38 [PATCH RFC] arm64: 32-bit tolerant sync bitops Vladimir Murzin
  2014-04-17  8:41 ` [PATCH] xen: use sync_clear_bit instead of clear_bit Vladimir Murzin
@ 2014-04-17 10:42 ` David Vrabel
  2014-04-21 16:18   ` Vladimir Murzin
  2014-04-23  8:31   ` Ian Campbell
  1 sibling, 2 replies; 12+ messages in thread
From: David Vrabel @ 2014-04-17 10:42 UTC (permalink / raw)
  To: linux-arm-kernel
On 17/04/14 09:38, Vladimir Murzin wrote:
> Xen assumes that bit operations are able to operate on 32-bit size and
> alignment [1]. For arm64 bitops are based on atomic exclusive load/store
> instructions to guarantee that changes are made atomically. However, these
> instructions require that address to be aligned to the data size. Because, by
> default, bitops operates on 64-bit size it implies that address should be
> aligned appropriately. All these lead to breakage of Xen assumption for bitops
> properties.
> 
> With this patch 32-bit sized/aligned bitops is implemented. 
> 
> [1] http://www.gossamer-threads.com/lists/xen/devel/325613
> 
> Signed-off-by: Vladimir Murzin <murzin.v@gmail.com>
> ---
>  Apart this patch other approaches were implemented:
>  1. turn bitops to be 32-bit size/align tolerant.
>     the changes are minimal, but I'm not sure how broad side effect might be
>  2. separate 32-bit size/aligned operations.
>     it exports new API, which might not be good
I've never been particularly happy with the way the events_fifo.c uses
casts for the sync_*_bit() calls and I think we should do option 2.
A generic implementation could be something like:
bool sync_test_bit32(uint32_t *v, unsigned bit)
{
     if (sizeof(unsigned long) == 8 && (unsigned long)v & 0x4)
         return sync_test_bit((unsigned long *)(v - 1), bit + 32);
     else
         return sync_test_bit((unsigned long *)v, bit);
}
David
^ permalink raw reply	[flat|nested] 12+ messages in thread
* [Xen-devel] [PATCH RFC] arm64: 32-bit tolerant sync bitops
  2014-04-17 10:42 ` [Xen-devel] [PATCH RFC] arm64: 32-bit tolerant sync bitops David Vrabel
@ 2014-04-21 16:18   ` Vladimir Murzin
  2014-04-22 10:16     ` David Vrabel
  2014-04-23  8:31   ` Ian Campbell
  1 sibling, 1 reply; 12+ messages in thread
From: Vladimir Murzin @ 2014-04-21 16:18 UTC (permalink / raw)
  To: linux-arm-kernel
On Thu, Apr 17, 2014 at 11:42 AM, David Vrabel <david.vrabel@citrix.com> wrote:
> On 17/04/14 09:38, Vladimir Murzin wrote:
>> Xen assumes that bit operations are able to operate on 32-bit size and
>> alignment [1]. For arm64 bitops are based on atomic exclusive load/store
>> instructions to guarantee that changes are made atomically. However, these
>> instructions require that address to be aligned to the data size. Because, by
>> default, bitops operates on 64-bit size it implies that address should be
>> aligned appropriately. All these lead to breakage of Xen assumption for bitops
>> properties.
>>
>> With this patch 32-bit sized/aligned bitops is implemented.
>>
>> [1] http://www.gossamer-threads.com/lists/xen/devel/325613
>>
>> Signed-off-by: Vladimir Murzin <murzin.v@gmail.com>
>> ---
>>  Apart this patch other approaches were implemented:
>>  1. turn bitops to be 32-bit size/align tolerant.
>>     the changes are minimal, but I'm not sure how broad side effect might be
>>  2. separate 32-bit size/aligned operations.
>>     it exports new API, which might not be good
>
> I've never been particularly happy with the way the events_fifo.c uses
> casts for the sync_*_bit() calls and I think we should do option 2.
>
> A generic implementation could be something like:
>
> bool sync_test_bit32(uint32_t *v, unsigned bit)
> {
>      if (sizeof(unsigned long) == 8 && (unsigned long)v & 0x4)
>          return sync_test_bit((unsigned long *)(v - 1), bit + 32);
>      else
>          return sync_test_bit((unsigned long *)v, bit);
> }
>
> David
Talking about separate 32-bit ops I mean arch specific bitops which
are targeting for 32-bit size/alignment.
With separate API for Xen it looks awkward for me, because currently
there are only a few users for sync_*_bit ops - Xen and HV.
Xen assumes that these ops are 32 bit and looks like never try to deal
with 64-bit (am I overlooking something?). So, sync ops are 32-bit
de-facto, having separate version means there is support for both
32-bit and 64-bit ops, but (by now) there is no user for 64-bit ops.
All this lead to obvious question why we need API conversion now? Is
not it easier to turn assumptions to requirements?
x86 world defines it's own sync ops, arm32 world alias them to
non-sync, so, probably arm64 could do something (may be different to
my patch) to make sure that sync bitops are 32-bit sized/aligned...
Hope to hear other opinions here ;)
Vladimir
^ permalink raw reply	[flat|nested] 12+ messages in thread
* [Xen-devel] [PATCH RFC] arm64: 32-bit tolerant sync bitops
  2014-04-21 16:18   ` Vladimir Murzin
@ 2014-04-22 10:16     ` David Vrabel
  2014-04-22 10:55       ` Ian Campbell
  0 siblings, 1 reply; 12+ messages in thread
From: David Vrabel @ 2014-04-22 10:16 UTC (permalink / raw)
  To: linux-arm-kernel
On 21/04/14 17:18, Vladimir Murzin wrote:
> On Thu, Apr 17, 2014 at 11:42 AM, David Vrabel <david.vrabel@citrix.com> wrote:
>> On 17/04/14 09:38, Vladimir Murzin wrote:
>>> Xen assumes that bit operations are able to operate on 32-bit size and
>>> alignment [1]. For arm64 bitops are based on atomic exclusive load/store
>>> instructions to guarantee that changes are made atomically. However, these
>>> instructions require that address to be aligned to the data size. Because, by
>>> default, bitops operates on 64-bit size it implies that address should be
>>> aligned appropriately. All these lead to breakage of Xen assumption for bitops
>>> properties.
>>>
>>> With this patch 32-bit sized/aligned bitops is implemented.
>>>
>>> [1] http://www.gossamer-threads.com/lists/xen/devel/325613
>>>
>>> Signed-off-by: Vladimir Murzin <murzin.v@gmail.com>
>>> ---
>>>  Apart this patch other approaches were implemented:
>>>  1. turn bitops to be 32-bit size/align tolerant.
>>>     the changes are minimal, but I'm not sure how broad side effect might be
>>>  2. separate 32-bit size/aligned operations.
>>>     it exports new API, which might not be good
>>
>> I've never been particularly happy with the way the events_fifo.c uses
>> casts for the sync_*_bit() calls and I think we should do option 2.
>>
>> A generic implementation could be something like:
>>
>> bool sync_test_bit32(uint32_t *v, unsigned bit)
>> {
>>      if (sizeof(unsigned long) == 8 && (unsigned long)v & 0x4)
>>          return sync_test_bit((unsigned long *)(v - 1), bit + 32);
>>      else
>>          return sync_test_bit((unsigned long *)v, bit);
>> }
>>
>> David
> 
> Talking about separate 32-bit ops I mean arch specific bitops which
> are targeting for 32-bit size/alignment.
> With separate API for Xen it looks awkward for me, because currently
> there are only a few users for sync_*_bit ops - Xen and HV.
> Xen assumes that these ops are 32 bit and looks like never try to deal
> with 64-bit (am I overlooking something?). So, sync ops are 32-bit
> de-facto, having separate version means there is support for both
> 32-bit and 64-bit ops, but (by now) there is no user for 64-bit ops.
> All this lead to obvious question why we need API conversion now? Is
> not it easier to turn assumptions to requirements?
Xen does use the sync bit ops on 64-bit values (for 2-level event channels).
David
^ permalink raw reply	[flat|nested] 12+ messages in thread
* [Xen-devel] [PATCH RFC] arm64: 32-bit tolerant sync bitops
  2014-04-22 10:16     ` David Vrabel
@ 2014-04-22 10:55       ` Ian Campbell
  0 siblings, 0 replies; 12+ messages in thread
From: Ian Campbell @ 2014-04-22 10:55 UTC (permalink / raw)
  To: linux-arm-kernel
On Tue, 2014-04-22 at 11:16 +0100, David Vrabel wrote:
> On 21/04/14 17:18, Vladimir Murzin wrote:
> > On Thu, Apr 17, 2014 at 11:42 AM, David Vrabel <david.vrabel@citrix.com> wrote:
> >> On 17/04/14 09:38, Vladimir Murzin wrote:
> >>> Xen assumes that bit operations are able to operate on 32-bit size and
> >>> alignment [1]. For arm64 bitops are based on atomic exclusive load/store
> >>> instructions to guarantee that changes are made atomically. However, these
> >>> instructions require that address to be aligned to the data size. Because, by
> >>> default, bitops operates on 64-bit size it implies that address should be
> >>> aligned appropriately. All these lead to breakage of Xen assumption for bitops
> >>> properties.
> >>>
> >>> With this patch 32-bit sized/aligned bitops is implemented.
> >>>
> >>> [1] http://www.gossamer-threads.com/lists/xen/devel/325613
> >>>
> >>> Signed-off-by: Vladimir Murzin <murzin.v@gmail.com>
> >>> ---
> >>>  Apart this patch other approaches were implemented:
> >>>  1. turn bitops to be 32-bit size/align tolerant.
> >>>     the changes are minimal, but I'm not sure how broad side effect might be
> >>>  2. separate 32-bit size/aligned operations.
> >>>     it exports new API, which might not be good
> >>
> >> I've never been particularly happy with the way the events_fifo.c uses
> >> casts for the sync_*_bit() calls and I think we should do option 2.
> >>
> >> A generic implementation could be something like:
> >>
> >> bool sync_test_bit32(uint32_t *v, unsigned bit)
> >> {
> >>      if (sizeof(unsigned long) == 8 && (unsigned long)v & 0x4)
> >>          return sync_test_bit((unsigned long *)(v - 1), bit + 32);
> >>      else
> >>          return sync_test_bit((unsigned long *)v, bit);
> >> }
> >>
> >> David
> > 
> > Talking about separate 32-bit ops I mean arch specific bitops which
> > are targeting for 32-bit size/alignment.
> > With separate API for Xen it looks awkward for me, because currently
> > there are only a few users for sync_*_bit ops - Xen and HV.
> > Xen assumes that these ops are 32 bit and looks like never try to deal
> > with 64-bit (am I overlooking something?). So, sync ops are 32-bit
> > de-facto, having separate version means there is support for both
> > 32-bit and 64-bit ops, but (by now) there is no user for 64-bit ops.
> > All this lead to obvious question why we need API conversion now? Is
> > not it easier to turn assumptions to requirements?
> 
> Xen does use the sync bit ops on 64-bit values (for 2-level event channels).
It is valid to use a 32-bit bitop on a 64-bit aligned bitmask though
(whereas the converse is not always true).
Ian.
^ permalink raw reply	[flat|nested] 12+ messages in thread
* [Xen-devel] [PATCH RFC] arm64: 32-bit tolerant sync bitops
  2014-04-23  8:31   ` Ian Campbell
@ 2014-04-22 20:56     ` Vladimir Murzin
  2014-04-25  7:17       ` Ian Campbell
  2014-04-25  8:42       ` Vladimir Murzin
  0 siblings, 2 replies; 12+ messages in thread
From: Vladimir Murzin @ 2014-04-22 20:56 UTC (permalink / raw)
  To: linux-arm-kernel
On Wed, Apr 23, 2014 at 09:31:29AM +0100, Ian Campbell wrote:
> On Thu, 2014-04-17 at 11:42 +0100, David Vrabel wrote:
> > On 17/04/14 09:38, Vladimir Murzin wrote:
> > > Xen assumes that bit operations are able to operate on 32-bit size and
> > > alignment [1]. For arm64 bitops are based on atomic exclusive load/store
> > > instructions to guarantee that changes are made atomically. However, these
> > > instructions require that address to be aligned to the data size. Because, by
> > > default, bitops operates on 64-bit size it implies that address should be
> > > aligned appropriately. All these lead to breakage of Xen assumption for bitops
> > > properties.
> > > 
> > > With this patch 32-bit sized/aligned bitops is implemented. 
> > > 
> > > [1] http://www.gossamer-threads.com/lists/xen/devel/325613
> > > 
> > > Signed-off-by: Vladimir Murzin <murzin.v@gmail.com>
> > > ---
> > >  Apart this patch other approaches were implemented:
> > >  1. turn bitops to be 32-bit size/align tolerant.
> > >     the changes are minimal, but I'm not sure how broad side effect might be
> > >  2. separate 32-bit size/aligned operations.
> > >     it exports new API, which might not be good
> > 
> > I've never been particularly happy with the way the events_fifo.c uses
> > casts for the sync_*_bit() calls and I think we should do option 2.
> > 
> > A generic implementation could be something like:
> 
> It seems like there isn't currently much call for/interest in a generic
> 32-bit set of bitops. Since this is a regression on arm64 right now how
> about we simply fix it up in the Xen layer now and if in the future a
> common need is found we can rework to use it.
> 
> We already have evtchn_fifo_{clear,set,is}_pending (although
> evtchn_fifo_unmask and consume_one_event need fixing to use is_pending)
> and evtchn_fifo_{test_and_set_,}mask, plus we would need
> evtchn_fifo_set_mask for consume_one_event to use instead of open
> coding.
> 
> With that then those helpers can become something like:
>         #define BM(w) (unsigned long *)(w & ~0x7)
>         #define EVTCHN_FIFO_BIT(b, w) (BUG_ON(w&0x3), w & 0x4 ? EVTCHN_FIFO_#b + 32 : EVTCHN_FIFO_#b)
>         static void evtchn_fifo_clear_pending(unsigned port)
>         {
>         	event_word_t *word = event_word_from_port(port);
>         	sync_clear_bit(EVTCHN_FIFO_BIT(PENDING, word), BM(word));
>         }
> similarly for the others.
> 
> If that is undesirable in the common code then wrap each existing helper
> in #ifndef evtchn_fifo_clean_pending and put something like the above in
> arch/arm64/include/asm/xen/events.h with a #define (and/or make the
> helpers themselves macros, or #define HAVE_XEN_FIFO_EVTCHN_ACCESSORS etc
> etc)
> 
> We still need your patch from
> http://article.gmane.org/gmane.comp.emulators.xen.devel/196067 for the
> other unaligned bitmap case. Note that this also removes the use of BM
> for non-PENDING/MASKED bits, which is why it was safe for me to change
> it above (also it would be safe to & ~0x7 after that change anyway).
> 
> Ian.
> 
Might be dealing with it on Xen side is only way unless arm64 folks say what
they think... because reports related to unaligned access still appear I'll
put the minimally tested patch for making arm64 bitops 32-bits friendly ;)
---
 arch/arm64/lib/bitops.S | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/arch/arm64/lib/bitops.S b/arch/arm64/lib/bitops.S
index 7dac371..7c3b058 100644
--- a/arch/arm64/lib/bitops.S
+++ b/arch/arm64/lib/bitops.S
@@ -26,14 +26,14 @@
  */
 	.macro	bitop, name, instr
 ENTRY(	\name	)
-	and	w3, w0, #63		// Get bit offset
+	and	w3, w0, #31		// Get bit offset
 	eor	w0, w0, w3		// Clear low bits
 	mov	x2, #1
-	add	x1, x1, x0, lsr #3	// Get word offset
+	add	x1, x1, x0, lsr #2	// Get word offset
 	lsl	x3, x2, x3		// Create mask
-1:	ldxr	x2, [x1]
-	\instr	x2, x2, x3
-	stxr	w0, x2, [x1]
+1:	ldxr	w2, [x1]
+	\instr	w2, w2, w3
+	stxr	w0, w2, [x1]
 	cbnz	w0, 1b
 	ret
 ENDPROC(\name	)
@@ -41,15 +41,15 @@ ENDPROC(\name	)
 
 	.macro	testop, name, instr
 ENTRY(	\name	)
-	and	w3, w0, #63		// Get bit offset
+	and	w3, w0, #31		// Get bit offset
 	eor	w0, w0, w3		// Clear low bits
 	mov	x2, #1
-	add	x1, x1, x0, lsr #3	// Get word offset
+	add	x1, x1, x0, lsr #2	// Get word offset
 	lsl	x4, x2, x3		// Create mask
-1:	ldxr	x2, [x1]
+1:	ldxr	w2, [x1]
 	lsr	x0, x2, x3		// Save old value of bit
-	\instr	x2, x2, x4		// toggle bit
-	stlxr	w5, x2, [x1]
+	\instr	w2, w2, w4		// toggle bit
+	stlxr	w5, w2, [x1]
 	cbnz	w5, 1b
 	dmb	ish
 	and	x0, x0, #1
-- 
1.8.3
^ permalink raw reply related	[flat|nested] 12+ messages in thread
* [Xen-devel] [PATCH RFC] arm64: 32-bit tolerant sync bitops
  2014-04-17 10:42 ` [Xen-devel] [PATCH RFC] arm64: 32-bit tolerant sync bitops David Vrabel
  2014-04-21 16:18   ` Vladimir Murzin
@ 2014-04-23  8:31   ` Ian Campbell
  2014-04-22 20:56     ` Vladimir Murzin
  1 sibling, 1 reply; 12+ messages in thread
From: Ian Campbell @ 2014-04-23  8:31 UTC (permalink / raw)
  To: linux-arm-kernel
On Thu, 2014-04-17 at 11:42 +0100, David Vrabel wrote:
> On 17/04/14 09:38, Vladimir Murzin wrote:
> > Xen assumes that bit operations are able to operate on 32-bit size and
> > alignment [1]. For arm64 bitops are based on atomic exclusive load/store
> > instructions to guarantee that changes are made atomically. However, these
> > instructions require that address to be aligned to the data size. Because, by
> > default, bitops operates on 64-bit size it implies that address should be
> > aligned appropriately. All these lead to breakage of Xen assumption for bitops
> > properties.
> > 
> > With this patch 32-bit sized/aligned bitops is implemented. 
> > 
> > [1] http://www.gossamer-threads.com/lists/xen/devel/325613
> > 
> > Signed-off-by: Vladimir Murzin <murzin.v@gmail.com>
> > ---
> >  Apart this patch other approaches were implemented:
> >  1. turn bitops to be 32-bit size/align tolerant.
> >     the changes are minimal, but I'm not sure how broad side effect might be
> >  2. separate 32-bit size/aligned operations.
> >     it exports new API, which might not be good
> 
> I've never been particularly happy with the way the events_fifo.c uses
> casts for the sync_*_bit() calls and I think we should do option 2.
> 
> A generic implementation could be something like:
It seems like there isn't currently much call for/interest in a generic
32-bit set of bitops. Since this is a regression on arm64 right now how
about we simply fix it up in the Xen layer now and if in the future a
common need is found we can rework to use it.
We already have evtchn_fifo_{clear,set,is}_pending (although
evtchn_fifo_unmask and consume_one_event need fixing to use is_pending)
and evtchn_fifo_{test_and_set_,}mask, plus we would need
evtchn_fifo_set_mask for consume_one_event to use instead of open
coding.
With that then those helpers can become something like:
        #define BM(w) (unsigned long *)(w & ~0x7)
        #define EVTCHN_FIFO_BIT(b, w) (BUG_ON(w&0x3), w & 0x4 ? EVTCHN_FIFO_#b + 32 : EVTCHN_FIFO_#b)
        static void evtchn_fifo_clear_pending(unsigned port)
        {
        	event_word_t *word = event_word_from_port(port);
        	sync_clear_bit(EVTCHN_FIFO_BIT(PENDING, word), BM(word));
        }
similarly for the others.
If that is undesirable in the common code then wrap each existing helper
in #ifndef evtchn_fifo_clean_pending and put something like the above in
arch/arm64/include/asm/xen/events.h with a #define (and/or make the
helpers themselves macros, or #define HAVE_XEN_FIFO_EVTCHN_ACCESSORS etc
etc)
We still need your patch from
http://article.gmane.org/gmane.comp.emulators.xen.devel/196067 for the
other unaligned bitmap case. Note that this also removes the use of BM
for non-PENDING/MASKED bits, which is why it was safe for me to change
it above (also it would be safe to & ~0x7 after that change anyway).
Ian.
^ permalink raw reply	[flat|nested] 12+ messages in thread
* [Xen-devel] [PATCH RFC] arm64: 32-bit tolerant sync bitops
  2014-04-22 20:56     ` Vladimir Murzin
@ 2014-04-25  7:17       ` Ian Campbell
  2014-04-25  8:43         ` Vladimir Murzin
  2014-04-25  8:42       ` Vladimir Murzin
  1 sibling, 1 reply; 12+ messages in thread
From: Ian Campbell @ 2014-04-25  7:17 UTC (permalink / raw)
  To: linux-arm-kernel
On Tue, 2014-04-22 at 21:56 +0100, Vladimir Murzin wrote:
> Might be dealing with it on Xen side is only way unless arm64 folks say what
> they think...
I don't expect the arm64 guys will have any interesting in changing the
bitops in this way. So please can you fix this on the Xen side, perhaps
in the way I suggested.
If at some point arm64 does gain 32-bit capable bitops we can always
switch back to using them directly again.
In the meantime we need to get this Xen regression fixed before 3.15. If
you don't have time to work on this please let us know.
Ian.
^ permalink raw reply	[flat|nested] 12+ messages in thread
* [Xen-devel] [PATCH RFC] arm64: 32-bit tolerant sync bitops
  2014-04-22 20:56     ` Vladimir Murzin
  2014-04-25  7:17       ` Ian Campbell
@ 2014-04-25  8:42       ` Vladimir Murzin
  1 sibling, 0 replies; 12+ messages in thread
From: Vladimir Murzin @ 2014-04-25  8:42 UTC (permalink / raw)
  To: linux-arm-kernel
On Tue, Apr 22, 2014 at 9:56 PM, Vladimir Murzin <murzin.v@gmail.com> wrote:
> On Wed, Apr 23, 2014 at 09:31:29AM +0100, Ian Campbell wrote:
>> On Thu, 2014-04-17 at 11:42 +0100, David Vrabel wrote:
>> > On 17/04/14 09:38, Vladimir Murzin wrote:
>> > > Xen assumes that bit operations are able to operate on 32-bit size and
>> > > alignment [1]. For arm64 bitops are based on atomic exclusive load/store
>> > > instructions to guarantee that changes are made atomically. However, these
>> > > instructions require that address to be aligned to the data size. Because, by
>> > > default, bitops operates on 64-bit size it implies that address should be
>> > > aligned appropriately. All these lead to breakage of Xen assumption for bitops
>> > > properties.
>> > >
>> > > With this patch 32-bit sized/aligned bitops is implemented.
>> > >
>> > > [1] http://www.gossamer-threads.com/lists/xen/devel/325613
>> > >
>> > > Signed-off-by: Vladimir Murzin <murzin.v@gmail.com>
>> > > ---
>> > >  Apart this patch other approaches were implemented:
>> > >  1. turn bitops to be 32-bit size/align tolerant.
>> > >     the changes are minimal, but I'm not sure how broad side effect might be
>> > >  2. separate 32-bit size/aligned operations.
>> > >     it exports new API, which might not be good
>> >
>> > I've never been particularly happy with the way the events_fifo.c uses
>> > casts for the sync_*_bit() calls and I think we should do option 2.
>> >
>> > A generic implementation could be something like:
>>
>> It seems like there isn't currently much call for/interest in a generic
>> 32-bit set of bitops. Since this is a regression on arm64 right now how
>> about we simply fix it up in the Xen layer now and if in the future a
>> common need is found we can rework to use it.
>>
>> We already have evtchn_fifo_{clear,set,is}_pending (although
>> evtchn_fifo_unmask and consume_one_event need fixing to use is_pending)
>> and evtchn_fifo_{test_and_set_,}mask, plus we would need
>> evtchn_fifo_set_mask for consume_one_event to use instead of open
>> coding.
>>
>> With that then those helpers can become something like:
>>         #define BM(w) (unsigned long *)(w & ~0x7)
>>         #define EVTCHN_FIFO_BIT(b, w) (BUG_ON(w&0x3), w & 0x4 ? EVTCHN_FIFO_#b + 32 : EVTCHN_FIFO_#b)
>>         static void evtchn_fifo_clear_pending(unsigned port)
>>         {
>>               event_word_t *word = event_word_from_port(port);
>>               sync_clear_bit(EVTCHN_FIFO_BIT(PENDING, word), BM(word));
>>         }
>> similarly for the others.
>>
>> If that is undesirable in the common code then wrap each existing helper
>> in #ifndef evtchn_fifo_clean_pending and put something like the above in
>> arch/arm64/include/asm/xen/events.h with a #define (and/or make the
>> helpers themselves macros, or #define HAVE_XEN_FIFO_EVTCHN_ACCESSORS etc
>> etc)
>>
>> We still need your patch from
>> http://article.gmane.org/gmane.comp.emulators.xen.devel/196067 for the
>> other unaligned bitmap case. Note that this also removes the use of BM
>> for non-PENDING/MASKED bits, which is why it was safe for me to change
>> it above (also it would be safe to & ~0x7 after that change anyway).
>>
>> Ian.
>>
>
>
> Might be dealing with it on Xen side is only way unless arm64 folks say what
> they think... because reports related to unaligned access still appear I'll
> put the minimally tested patch for making arm64 bitops 32-bits friendly ;)
>
> ---
>  arch/arm64/lib/bitops.S | 20 ++++++++++----------
>  1 file changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/arch/arm64/lib/bitops.S b/arch/arm64/lib/bitops.S
> index 7dac371..7c3b058 100644
> --- a/arch/arm64/lib/bitops.S
> +++ b/arch/arm64/lib/bitops.S
> @@ -26,14 +26,14 @@
>   */
>         .macro  bitop, name, instr
>  ENTRY( \name   )
> -       and     w3, w0, #63             // Get bit offset
> +       and     w3, w0, #31             // Get bit offset
>         eor     w0, w0, w3              // Clear low bits
>         mov     x2, #1
> -       add     x1, x1, x0, lsr #3      // Get word offset
> +       add     x1, x1, x0, lsr #2      // Get word offset
>         lsl     x3, x2, x3              // Create mask
> -1:     ldxr    x2, [x1]
> -       \instr  x2, x2, x3
> -       stxr    w0, x2, [x1]
> +1:     ldxr    w2, [x1]
> +       \instr  w2, w2, w3
> +       stxr    w0, w2, [x1]
>         cbnz    w0, 1b
>         ret
>  ENDPROC(\name  )
> @@ -41,15 +41,15 @@ ENDPROC(\name       )
>
>         .macro  testop, name, instr
>  ENTRY( \name   )
> -       and     w3, w0, #63             // Get bit offset
> +       and     w3, w0, #31             // Get bit offset
>         eor     w0, w0, w3              // Clear low bits
>         mov     x2, #1
> -       add     x1, x1, x0, lsr #3      // Get word offset
> +       add     x1, x1, x0, lsr #2      // Get word offset
>         lsl     x4, x2, x3              // Create mask
> -1:     ldxr    x2, [x1]
> +1:     ldxr    w2, [x1]
>         lsr     x0, x2, x3              // Save old value of bit
> -       \instr  x2, x2, x4              // toggle bit
> -       stlxr   w5, x2, [x1]
> +       \instr  w2, w2, w4              // toggle bit
> +       stlxr   w5, w2, [x1]
>         cbnz    w5, 1b
>         dmb     ish
>         and     x0, x0, #1
> --
> 1.8.3
>
After revisiting arm32 implementation I think there is a bug in
proposed changes.. :(
The changes like:
-       add     x1, x1, x0, lsr #3      // Get word offset
+       add     x1, x1, x0, lsr #2      // Get word offset
are not necessary.
arm32 implementation do
        mov     r0, r0, lsr #5
        add     r1, r1, r0, lsl #2      @ Get word offset
for calculation of word offset what equivalent to r0, lsr #3
I'll send v2 if someone provide feedback for that ;)
^ permalink raw reply	[flat|nested] 12+ messages in thread
* [Xen-devel] [PATCH RFC] arm64: 32-bit tolerant sync bitops
  2014-04-25  7:17       ` Ian Campbell
@ 2014-04-25  8:43         ` Vladimir Murzin
  0 siblings, 0 replies; 12+ messages in thread
From: Vladimir Murzin @ 2014-04-25  8:43 UTC (permalink / raw)
  To: linux-arm-kernel
On Fri, Apr 25, 2014 at 8:17 AM, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> On Tue, 2014-04-22 at 21:56 +0100, Vladimir Murzin wrote:
>> Might be dealing with it on Xen side is only way unless arm64 folks say what
>> they think...
>
>
> I don't expect the arm64 guys will have any interesting in changing the
> bitops in this way. So please can you fix this on the Xen side, perhaps
> in the way I suggested.
>
> If at some point arm64 does gain 32-bit capable bitops we can always
> switch back to using them directly again.
>
> In the meantime we need to get this Xen regression fixed before 3.15. If
> you don't have time to work on this please let us know.
>
> Ian.
>
Hi Ian
I'll try to cook some patches for that.
Vladimir
^ permalink raw reply	[flat|nested] 12+ messages in thread
end of thread, other threads:[~2014-04-25  8:43 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-17  8:38 [PATCH RFC] arm64: 32-bit tolerant sync bitops Vladimir Murzin
2014-04-17  8:41 ` [PATCH] xen: use sync_clear_bit instead of clear_bit Vladimir Murzin
2014-04-17 10:23   ` David Vrabel
2014-04-17 10:42 ` [Xen-devel] [PATCH RFC] arm64: 32-bit tolerant sync bitops David Vrabel
2014-04-21 16:18   ` Vladimir Murzin
2014-04-22 10:16     ` David Vrabel
2014-04-22 10:55       ` Ian Campbell
2014-04-23  8:31   ` Ian Campbell
2014-04-22 20:56     ` Vladimir Murzin
2014-04-25  7:17       ` Ian Campbell
2014-04-25  8:43         ` Vladimir Murzin
2014-04-25  8:42       ` Vladimir Murzin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).