[PATCH 0/4] ARM: Cleanup following bitops improvements

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/4] ARM: Cleanup following bitops improvements
@ 2024-09-02 10:03 Andrew Cooper
  2024-09-02 10:03 ` [PATCH 1/4] ARM/div: Drop __div64_fls() Andrew Cooper
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Andrew Cooper @ 2024-09-02 10:03 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, Stefano Stabellini, Julien Grall,
	Volodymyr Babchuk, Bertrand Marquis, Michal Orzel

Patch 1 has been posted before, but is included here just so it doesn't get
lost.

Patch 2 is a bugfix found by code inspection.

Patches 3 and 4 switch to a more efficient form.

Andrew Cooper (4):
  ARM/div: Drop __div64_fls()
  ARM/vgic: Correct the expression for lr_all_full()
  ARM/vgic: Use for_each_set_bit() in gic_find_unused_lr()
  ARM/vgic: Use for_each_set_bit() in vgic-mmio*

 xen/arch/arm/gic-vgic.c          | 15 +++++++------
 xen/arch/arm/include/asm/div64.h | 18 +++-------------
 xen/arch/arm/vgic/vgic-mmio-v2.c |  3 +--
 xen/arch/arm/vgic/vgic-mmio.c    | 36 +++++++++++++++++++++-----------
 4 files changed, 35 insertions(+), 37 deletions(-)

-- 
2.39.2



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/4] ARM/div: Drop __div64_fls()
  2024-09-02 10:03 [PATCH 0/4] ARM: Cleanup following bitops improvements Andrew Cooper
@ 2024-09-02 10:03 ` Andrew Cooper
  2024-09-03  8:37   ` Michal Orzel
  2024-09-02 10:03 ` [PATCH 2/4] ARM/vgic: Correct the expression for lr_all_full() Andrew Cooper
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Andrew Cooper @ 2024-09-02 10:03 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, Stefano Stabellini, Julien Grall,
	Volodymyr Babchuk, Bertrand Marquis, Michal Orzel

Following the improvements to Xen's bitops, fls() does constant propagation in
all cases.  Use it, and drop the local opencoded helper.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien@xen.org>
CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
CC: Bertrand Marquis <bertrand.marquis@arm.com>
CC: Michal Orzel <michal.orzel@amd.com>

ARM32 gets a very minor code generation improvement:

  xen.git/xen$ ../scripts/bloat-o-meter xen-syms-arm32-{before,after}
  add/remove: 0/0 grow/shrink: 0/6 up/down: 0/-48 (-48)
  Function                                     old     new   delta
  wallclock_time                               288     280      -8
  printk_start_of_line                         560     552      -8
  domain_vtimer_init                           472     464      -8
  do_settime                                   376     368      -8
  burn_credits                                 760     752      -8
  __printk_ratelimit                           424     416      -8

But it's just a couple of operations improvement and no real change in code
structure.  I expect that the constant propagation being done through
__builtin_clz(), rather than pure C, is giving the optimiser a bit more
information to work with.

This file also has an __GNUC__ < 4 case which seems ripe for removing...
---
 xen/arch/arm/include/asm/div64.h | 18 +++---------------
 1 file changed, 3 insertions(+), 15 deletions(-)

diff --git a/xen/arch/arm/include/asm/div64.h b/xen/arch/arm/include/asm/div64.h
index 0459d5cc0122..da1f1fcbd503 100644
--- a/xen/arch/arm/include/asm/div64.h
+++ b/xen/arch/arm/include/asm/div64.h
@@ -102,7 +102,7 @@
 		/* preserve low part of n for reminder computation */	\
 		__r = __n;						\
 		/* determine number of bits to represent __b */		\
-		__p = 1 << __div64_fls(__b);				\
+		__p = 1 << fls(__b);					\
 		/* compute __m = ((__p << 64) + __b - 1) / __b */	\
 		__m = (~0ULL / __b) * __p;				\
 		__m += (((~0ULL % __b + 1) * __p) + __b - 1) / __b;	\
@@ -150,8 +150,8 @@
 				__p /= (__m & -__m);			\
 				__m /= (__m & -__m);			\
 			} else {					\
-				__p >>= __div64_fls(__bits);		\
-				__m >>= __div64_fls(__bits);		\
+				__p >>= fls(__bits);			\
+				__m >>= fls(__bits);			\
 			}						\
 			/* No correction needed. */			\
 			__c = 0;					\
@@ -217,18 +217,6 @@
 	__r;								\
 })
 
-/* our own fls implementation to make sure constant propagation is fine */
-#define __div64_fls(bits)						\
-({									\
-	unsigned int __left = (bits), __nr = 0;				\
-	if (__left & 0xffff0000) __nr += 16, __left >>= 16;		\
-	if (__left & 0x0000ff00) __nr +=  8, __left >>=  8;		\
-	if (__left & 0x000000f0) __nr +=  4, __left >>=  4;		\
-	if (__left & 0x0000000c) __nr +=  2, __left >>=  2;		\
-	if (__left & 0x00000002) __nr +=  1;				\
-	__nr;								\
-})
-
 #endif /* GCC version */
 
 #endif /* BITS_PER_LONG */
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/4] ARM/vgic: Correct the expression for lr_all_full()
  2024-09-02 10:03 [PATCH 0/4] ARM: Cleanup following bitops improvements Andrew Cooper
  2024-09-02 10:03 ` [PATCH 1/4] ARM/div: Drop __div64_fls() Andrew Cooper
@ 2024-09-02 10:03 ` Andrew Cooper
  2024-09-03  8:42   ` Michal Orzel
  2024-09-02 10:03 ` [PATCH 3/4] ARM/vgic: Use for_each_set_bit() in gic_find_unused_lr() Andrew Cooper
  2024-09-02 10:03 ` [PATCH 4/4] ARM/vgic: Use for_each_set_bit() in vgic-mmio* Andrew Cooper
  3 siblings, 1 reply; 12+ messages in thread
From: Andrew Cooper @ 2024-09-02 10:03 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, Stefano Stabellini, Julien Grall,
	Volodymyr Babchuk, Bertrand Marquis, Michal Orzel

The current expression hits UB with 31 LRs (shifting into the sign bit), and
malfunctions with 32 LRs (shifting beyond the range of int).  Swapping 1 for
1ULL fixes some of these, but still malfunctions at 64 LRs which is the
architectural limit.

Instead, shift -1ULL right in order to create the mask.

Fixes: 596f885a3202 ("xen/arm: set GICH_HCR_UIE if all the LRs are in use")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien@xen.org>
CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
CC: Bertrand Marquis <bertrand.marquis@arm.com>
CC: Michal Orzel <michal.orzel@amd.com>

Found by code inspection while doing bitops work.  I don't even know if
there's a platform that really has 31 LRs, but the rest of Xen's code is
written with the expectation that there may be 64.
---
 xen/arch/arm/gic-vgic.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c
index 9aa245a36d98..3f14aab2efc7 100644
--- a/xen/arch/arm/gic-vgic.c
+++ b/xen/arch/arm/gic-vgic.c
@@ -16,7 +16,8 @@
 #include <asm/gic.h>
 #include <asm/vgic.h>
 
-#define lr_all_full() (this_cpu(lr_mask) == ((1 << gic_get_nr_lrs()) - 1))
+#define lr_all_full()                                           \
+    (this_cpu(lr_mask) == (-1ULL >> (64 - gic_get_nr_lrs())))
 
 #undef GIC_DEBUG
 
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/4] ARM/vgic: Use for_each_set_bit() in gic_find_unused_lr()
  2024-09-02 10:03 [PATCH 0/4] ARM: Cleanup following bitops improvements Andrew Cooper
  2024-09-02 10:03 ` [PATCH 1/4] ARM/div: Drop __div64_fls() Andrew Cooper
  2024-09-02 10:03 ` [PATCH 2/4] ARM/vgic: Correct the expression for lr_all_full() Andrew Cooper
@ 2024-09-02 10:03 ` Andrew Cooper
  2024-09-03  8:46   ` Michal Orzel
  2024-09-02 10:03 ` [PATCH 4/4] ARM/vgic: Use for_each_set_bit() in vgic-mmio* Andrew Cooper
  3 siblings, 1 reply; 12+ messages in thread
From: Andrew Cooper @ 2024-09-02 10:03 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, Stefano Stabellini, Julien Grall,
	Volodymyr Babchuk, Bertrand Marquis, Michal Orzel

There are no bits set in lr_mask beyond nr_lrs, so when substituting
bitmap_for_each() for for_each_set_bit(), we don't need to worry about the
upper bound.

However, the type of lr_mask does matter, so switch it to be uint64_t * and
move unsigned long * override until the find_next_zero_bit() call.

Move lr_val into a narrower scope and drop used_lr as it's declared by
for_each_set_bit() itself.

Drop the nr_lrs variable and use gic_get_nr_lrs() in the one location its now
used.  It hides a triple pointer defererence, and while it may not be needed
in the PRISTINE case, it certainly doesn't need to be live across the rest of
the function.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien@xen.org>
CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
CC: Bertrand Marquis <bertrand.marquis@arm.com>
CC: Michal Orzel <michal.orzel@amd.com>

ARM64:

  $ ../scripts/bloat-o-meter xen-syms-arm64-{before,after}
  add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-28 (-28)
  Function                                     old     new   delta
  gic_find_unused_lr.constprop                 228     200     -28

inc -2 find_next_bit()

ARM32:

  $ ../scripts/bloat-o-meter xen-syms-arm32-{before,after}
  add/remove: 0/0 grow/shrink: 1/0 up/down: 48/0 (48)
  Function                                     old     new   delta
  gic_find_unused_lr                           260     308     +48

because uint64_t, but -2 _find_{first,next}_bit_le()
---
 xen/arch/arm/gic-vgic.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c
index 3f14aab2efc7..ea48c5375a91 100644
--- a/xen/arch/arm/gic-vgic.c
+++ b/xen/arch/arm/gic-vgic.c
@@ -102,25 +102,23 @@ static unsigned int gic_find_unused_lr(struct vcpu *v,
                                        struct pending_irq *p,
                                        unsigned int lr)
 {
-    unsigned int nr_lrs = gic_get_nr_lrs();
-    unsigned long *lr_mask = (unsigned long *) &this_cpu(lr_mask);
-    struct gic_lr lr_val;
+    uint64_t *lr_mask = &this_cpu(lr_mask);
 
     ASSERT(spin_is_locked(&v->arch.vgic.lock));
 
     if ( unlikely(test_bit(GIC_IRQ_GUEST_PRISTINE_LPI, &p->status)) )
     {
-        unsigned int used_lr;
-
-        bitmap_for_each ( used_lr, lr_mask, nr_lrs )
+        for_each_set_bit ( used_lr, *lr_mask )
         {
+            struct gic_lr lr_val;
+
             gic_hw_ops->read_lr(used_lr, &lr_val);
             if ( lr_val.virq == p->irq )
                 return used_lr;
         }
     }
 
-    lr = find_next_zero_bit(lr_mask, nr_lrs, lr);
+    lr = find_next_zero_bit((unsigned long *)lr_mask, gic_get_nr_lrs(), lr);
 
     return lr;
 }
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/4] ARM/vgic: Use for_each_set_bit() in vgic-mmio*
  2024-09-02 10:03 [PATCH 0/4] ARM: Cleanup following bitops improvements Andrew Cooper
                   ` (2 preceding siblings ...)
  2024-09-02 10:03 ` [PATCH 3/4] ARM/vgic: Use for_each_set_bit() in gic_find_unused_lr() Andrew Cooper
@ 2024-09-02 10:03 ` Andrew Cooper
  2024-09-03 10:30   ` Michal Orzel
  3 siblings, 1 reply; 12+ messages in thread
From: Andrew Cooper @ 2024-09-02 10:03 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, Stefano Stabellini, Julien Grall,
	Volodymyr Babchuk, Bertrand Marquis, Michal Orzel

These are all loops over a scalar value, and don't need to call general bitop
helpers behind the scenes.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien@xen.org>
CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
CC: Bertrand Marquis <bertrand.marquis@arm.com>
CC: Michal Orzel <michal.orzel@amd.com>

Slighly RFC.  It's unclear whether len is the size of the register, or the
size of the access.  For sub-GPR accesses, won't the upper bits be clear
anyway?  If so, this can be simplified further.

$ ../scripts/bloat-o-meter xen-syms-arm64-{before,after}
add/remove: 0/0 grow/shrink: 2/5 up/down: 20/-140 (-120)
Function                                     old     new   delta
vgic_mmio_write_spending                     320     332     +12
vgic_mmio_write_cpending                     368     376      +8
vgic_mmio_write_sactive                      192     176     -16
vgic_mmio_write_cactive                      192     176     -16
vgic_mmio_write_cenable                      316     288     -28
vgic_mmio_write_senable                      320     284     -36
vgic_mmio_write_sgir                         344     300     -44

$ ../scripts/bloat-o-meter xen-syms-arm32-{before,after}
add/remove: 0/0 grow/shrink: 0/4 up/down: 0/-32 (-32)
Function                                     old     new   delta
vgic_mmio_write_sactive                      204     200      -4
vgic_mmio_write_cpending                     464     460      -4
vgic_mmio_write_cactive                      204     200      -4
vgic_mmio_write_sgir                         336     316     -20
---
 xen/arch/arm/vgic/vgic-mmio-v2.c |  3 +--
 xen/arch/arm/vgic/vgic-mmio.c    | 36 +++++++++++++++++++++-----------
 2 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/xen/arch/arm/vgic/vgic-mmio-v2.c b/xen/arch/arm/vgic/vgic-mmio-v2.c
index 670b335db2c3..42fac0403f07 100644
--- a/xen/arch/arm/vgic/vgic-mmio-v2.c
+++ b/xen/arch/arm/vgic/vgic-mmio-v2.c
@@ -90,7 +90,6 @@ static void vgic_mmio_write_sgir(struct vcpu *source_vcpu,
     unsigned int intid = val & GICD_SGI_INTID_MASK;
     unsigned long targets = (val & GICD_SGI_TARGET_MASK) >>
                             GICD_SGI_TARGET_SHIFT;
-    unsigned int vcpu_id;
 
     switch ( val & GICD_SGI_TARGET_LIST_MASK )
     {
@@ -108,7 +107,7 @@ static void vgic_mmio_write_sgir(struct vcpu *source_vcpu,
         return;
     }
 
-    bitmap_for_each ( vcpu_id, &targets, 8 )
+    for_each_set_bit ( vcpu_id, (uint8_t)targets )
     {
         struct vcpu *vcpu = d->vcpu[vcpu_id];
         struct vgic_irq *irq = vgic_get_irq(d, vcpu, intid);
diff --git a/xen/arch/arm/vgic/vgic-mmio.c b/xen/arch/arm/vgic/vgic-mmio.c
index bd4caf7fc326..f7c336a238ab 100644
--- a/xen/arch/arm/vgic/vgic-mmio.c
+++ b/xen/arch/arm/vgic/vgic-mmio.c
@@ -69,9 +69,11 @@ void vgic_mmio_write_senable(struct vcpu *vcpu,
                              unsigned long val)
 {
     uint32_t intid = VGIC_ADDR_TO_INTID(addr, 1);
-    unsigned int i;
 
-    bitmap_for_each ( i, &val, len * 8 )
+    if ( len < sizeof(val) )
+        val &= (1UL << (len * 8)) - 1;
+
+    for_each_set_bit ( i, val )
     {
         struct vgic_irq *irq = vgic_get_irq(vcpu->domain, vcpu, intid + i);
         unsigned long flags;
@@ -114,9 +116,11 @@ void vgic_mmio_write_cenable(struct vcpu *vcpu,
                              unsigned long val)
 {
     uint32_t intid = VGIC_ADDR_TO_INTID(addr, 1);
-    unsigned int i;
 
-    bitmap_for_each ( i, &val, len * 8 )
+    if ( len < sizeof(val) )
+        val &= (1UL << (len * 8)) - 1;
+
+    for_each_set_bit ( i, val )
     {
         struct vgic_irq *irq;
         unsigned long flags;
@@ -182,11 +186,13 @@ void vgic_mmio_write_spending(struct vcpu *vcpu,
                               unsigned long val)
 {
     uint32_t intid = VGIC_ADDR_TO_INTID(addr, 1);
-    unsigned int i;
     unsigned long flags;
     irq_desc_t *desc;
 
-    bitmap_for_each ( i, &val, len * 8 )
+    if ( len < sizeof(val) )
+        val &= (1UL << (len * 8)) - 1;
+
+    for_each_set_bit ( i, val )
     {
         struct vgic_irq *irq = vgic_get_irq(vcpu->domain, vcpu, intid + i);
 
@@ -230,11 +236,13 @@ void vgic_mmio_write_cpending(struct vcpu *vcpu,
                               unsigned long val)
 {
     uint32_t intid = VGIC_ADDR_TO_INTID(addr, 1);
-    unsigned int i;
     unsigned long flags;
     irq_desc_t *desc;
 
-    bitmap_for_each ( i, &val, len * 8 )
+    if ( len < sizeof(val) )
+        val &= (1UL << (len * 8)) - 1;
+
+    for_each_set_bit ( i, val )
     {
         struct vgic_irq *irq = vgic_get_irq(vcpu->domain, vcpu, intid + i);
 
@@ -326,9 +334,11 @@ void vgic_mmio_write_cactive(struct vcpu *vcpu,
                              unsigned long val)
 {
     uint32_t intid = VGIC_ADDR_TO_INTID(addr, 1);
-    unsigned int i;
 
-    bitmap_for_each ( i, &val, len * 8 )
+    if ( len < sizeof(val) )
+        val &= (1UL << (len * 8)) - 1;
+
+    for_each_set_bit ( i, val )
     {
         struct vgic_irq *irq = vgic_get_irq(vcpu->domain, vcpu, intid + i);
 
@@ -356,9 +366,11 @@ void vgic_mmio_write_sactive(struct vcpu *vcpu,
                              unsigned long val)
 {
     uint32_t intid = VGIC_ADDR_TO_INTID(addr, 1);
-    unsigned int i;
 
-    bitmap_for_each ( i, &val, len * 8 )
+    if ( len < sizeof(val) )
+        val &= (1UL << (len * 8)) - 1;
+
+    for_each_set_bit ( i, val )
     {
         struct vgic_irq *irq = vgic_get_irq(vcpu->domain, vcpu, intid + i);
 
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/4] ARM/div: Drop __div64_fls()
  2024-09-02 10:03 ` [PATCH 1/4] ARM/div: Drop __div64_fls() Andrew Cooper
@ 2024-09-03  8:37   ` Michal Orzel
  0 siblings, 0 replies; 12+ messages in thread
From: Michal Orzel @ 2024-09-03  8:37 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel
  Cc: Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Bertrand Marquis



On 02/09/2024 12:03, Andrew Cooper wrote:
> 
> 
> Following the improvements to Xen's bitops, fls() does constant propagation in
> all cases.  Use it, and drop the local opencoded helper.
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>

> ---
> CC: Stefano Stabellini <sstabellini@kernel.org>
> CC: Julien Grall <julien@xen.org>
> CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
> CC: Bertrand Marquis <bertrand.marquis@arm.com>
> CC: Michal Orzel <michal.orzel@amd.com>
> 
> ARM32 gets a very minor code generation improvement:
> 
>   xen.git/xen$ ../scripts/bloat-o-meter xen-syms-arm32-{before,after}
>   add/remove: 0/0 grow/shrink: 0/6 up/down: 0/-48 (-48)
>   Function                                     old     new   delta
>   wallclock_time                               288     280      -8
>   printk_start_of_line                         560     552      -8
>   domain_vtimer_init                           472     464      -8
>   do_settime                                   376     368      -8
>   burn_credits                                 760     752      -8
>   __printk_ratelimit                           424     416      -8
> 
> But it's just a couple of operations improvement and no real change in code
> structure.  I expect that the constant propagation being done through
> __builtin_clz(), rather than pure C, is giving the optimiser a bit more
> information to work with.
> 
> This file also has an __GNUC__ < 4 case which seems ripe for removing...
Agree and noted.

~Michal


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/4] ARM/vgic: Correct the expression for lr_all_full()
  2024-09-02 10:03 ` [PATCH 2/4] ARM/vgic: Correct the expression for lr_all_full() Andrew Cooper
@ 2024-09-03  8:42   ` Michal Orzel
  0 siblings, 0 replies; 12+ messages in thread
From: Michal Orzel @ 2024-09-03  8:42 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel
  Cc: Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Bertrand Marquis



On 02/09/2024 12:03, Andrew Cooper wrote:
> 
> 
> The current expression hits UB with 31 LRs (shifting into the sign bit), and
> malfunctions with 32 LRs (shifting beyond the range of int).  Swapping 1 for
> 1ULL fixes some of these, but still malfunctions at 64 LRs which is the
> architectural limit.
> 
> Instead, shift -1ULL right in order to create the mask.
> 
> Fixes: 596f885a3202 ("xen/arm: set GICH_HCR_UIE if all the LRs are in use")
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>

> ---
> CC: Stefano Stabellini <sstabellini@kernel.org>
> CC: Julien Grall <julien@xen.org>
> CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
> CC: Bertrand Marquis <bertrand.marquis@arm.com>
> CC: Michal Orzel <michal.orzel@amd.com>
> 
> Found by code inspection while doing bitops work.  I don't even know if
> there's a platform that really has 31 LRs, but the rest of Xen's code is
> written with the expectation that there may be 64.
So, for GICv2 the limit is 64 and for GICv3 the limit is 16. This made me realize we need to
fix the mask for GICv3 (ICH_VTR_NRLRGS) from 0x3f to 0xf. I'll send a patch.

> ---
>  xen/arch/arm/gic-vgic.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c
> index 9aa245a36d98..3f14aab2efc7 100644
> --- a/xen/arch/arm/gic-vgic.c
> +++ b/xen/arch/arm/gic-vgic.c
> @@ -16,7 +16,8 @@
>  #include <asm/gic.h>
>  #include <asm/vgic.h>
> 
> -#define lr_all_full() (this_cpu(lr_mask) == ((1 << gic_get_nr_lrs()) - 1))
> +#define lr_all_full()                                           \
> +    (this_cpu(lr_mask) == (-1ULL >> (64 - gic_get_nr_lrs())))
> 
>  #undef GIC_DEBUG
> 
> --
> 2.39.2
> 

~Michal


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/4] ARM/vgic: Use for_each_set_bit() in gic_find_unused_lr()
  2024-09-02 10:03 ` [PATCH 3/4] ARM/vgic: Use for_each_set_bit() in gic_find_unused_lr() Andrew Cooper
@ 2024-09-03  8:46   ` Michal Orzel
  2024-09-03 13:07     ` Andrew Cooper
  0 siblings, 1 reply; 12+ messages in thread
From: Michal Orzel @ 2024-09-03  8:46 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel
  Cc: Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Bertrand Marquis



On 02/09/2024 12:03, Andrew Cooper wrote:
> 
> 
> There are no bits set in lr_mask beyond nr_lrs, so when substituting
> bitmap_for_each() for for_each_set_bit(), we don't need to worry about the
> upper bound.
> 
> However, the type of lr_mask does matter, so switch it to be uint64_t * and
> move unsigned long * override until the find_next_zero_bit() call.
> 
> Move lr_val into a narrower scope and drop used_lr as it's declared by
> for_each_set_bit() itself.
> 
> Drop the nr_lrs variable and use gic_get_nr_lrs() in the one location its now
> used.  It hides a triple pointer defererence, and while it may not be needed
s/defererence/dereference

> in the PRISTINE case, it certainly doesn't need to be live across the rest of
> the function.
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>

~Michal


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/4] ARM/vgic: Use for_each_set_bit() in vgic-mmio*
  2024-09-02 10:03 ` [PATCH 4/4] ARM/vgic: Use for_each_set_bit() in vgic-mmio* Andrew Cooper
@ 2024-09-03 10:30   ` Michal Orzel
  2024-09-03 13:19     ` Andrew Cooper
  0 siblings, 1 reply; 12+ messages in thread
From: Michal Orzel @ 2024-09-03 10:30 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel, Julien Grall
  Cc: Stefano Stabellini, Volodymyr Babchuk, Bertrand Marquis



On 02/09/2024 12:03, Andrew Cooper wrote:
> 
> 
> These are all loops over a scalar value, and don't need to call general bitop
> helpers behind the scenes.
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Stefano Stabellini <sstabellini@kernel.org>
> CC: Julien Grall <julien@xen.org>
> CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
> CC: Bertrand Marquis <bertrand.marquis@arm.com>
> CC: Michal Orzel <michal.orzel@amd.com>
> 
> Slighly RFC.  It's unclear whether len is the size of the register, or the
> size of the access.  For sub-GPR accesses, won't the upper bits be clear
> anyway?  If so, this can be simplified further.
See dispatch_mmio_write(). "len" refers to access size (i.e. 1/4/8 bytes). Each register is defined
with a region access i.e. VGIC_ACCESS_32bit that is compared with the actual access. In the current code
there is no register with 8B access. If there is a mismatch, there will be a data abort injected.
Also, the top 32-bits are not cleared anywhere, so I don't think we can drop masking. @Julien?

> 
> $ ../scripts/bloat-o-meter xen-syms-arm64-{before,after}
> add/remove: 0/0 grow/shrink: 2/5 up/down: 20/-140 (-120)
> Function                                     old     new   delta
> vgic_mmio_write_spending                     320     332     +12
> vgic_mmio_write_cpending                     368     376      +8
> vgic_mmio_write_sactive                      192     176     -16
> vgic_mmio_write_cactive                      192     176     -16
> vgic_mmio_write_cenable                      316     288     -28
> vgic_mmio_write_senable                      320     284     -36
> vgic_mmio_write_sgir                         344     300     -44
> 
> $ ../scripts/bloat-o-meter xen-syms-arm32-{before,after}
> add/remove: 0/0 grow/shrink: 0/4 up/down: 0/-32 (-32)
> Function                                     old     new   delta
> vgic_mmio_write_sactive                      204     200      -4
> vgic_mmio_write_cpending                     464     460      -4
> vgic_mmio_write_cactive                      204     200      -4
> vgic_mmio_write_sgir                         336     316     -20
> ---
>  xen/arch/arm/vgic/vgic-mmio-v2.c |  3 +--
>  xen/arch/arm/vgic/vgic-mmio.c    | 36 +++++++++++++++++++++-----------
>  2 files changed, 25 insertions(+), 14 deletions(-)
> 
> diff --git a/xen/arch/arm/vgic/vgic-mmio-v2.c b/xen/arch/arm/vgic/vgic-mmio-v2.c
> index 670b335db2c3..42fac0403f07 100644
> --- a/xen/arch/arm/vgic/vgic-mmio-v2.c
> +++ b/xen/arch/arm/vgic/vgic-mmio-v2.c
> @@ -90,7 +90,6 @@ static void vgic_mmio_write_sgir(struct vcpu *source_vcpu,
>      unsigned int intid = val & GICD_SGI_INTID_MASK;
>      unsigned long targets = (val & GICD_SGI_TARGET_MASK) >>
>                              GICD_SGI_TARGET_SHIFT;
> -    unsigned int vcpu_id;
> 
>      switch ( val & GICD_SGI_TARGET_LIST_MASK )
>      {
> @@ -108,7 +107,7 @@ static void vgic_mmio_write_sgir(struct vcpu *source_vcpu,
>          return;
>      }
> 
> -    bitmap_for_each ( vcpu_id, &targets, 8 )
> +    for_each_set_bit ( vcpu_id, (uint8_t)targets )
>      {
>          struct vcpu *vcpu = d->vcpu[vcpu_id];
>          struct vgic_irq *irq = vgic_get_irq(d, vcpu, intid);
> diff --git a/xen/arch/arm/vgic/vgic-mmio.c b/xen/arch/arm/vgic/vgic-mmio.c
> index bd4caf7fc326..f7c336a238ab 100644
> --- a/xen/arch/arm/vgic/vgic-mmio.c
> +++ b/xen/arch/arm/vgic/vgic-mmio.c
> @@ -69,9 +69,11 @@ void vgic_mmio_write_senable(struct vcpu *vcpu,
>                               unsigned long val)
>  {
>      uint32_t intid = VGIC_ADDR_TO_INTID(addr, 1);
> -    unsigned int i;
> 
> -    bitmap_for_each ( i, &val, len * 8 )
> +    if ( len < sizeof(val) )
> +        val &= (1UL << (len * 8)) - 1;
> +
> +    for_each_set_bit ( i, val )
>      {
>          struct vgic_irq *irq = vgic_get_irq(vcpu->domain, vcpu, intid + i);
>          unsigned long flags;
> @@ -114,9 +116,11 @@ void vgic_mmio_write_cenable(struct vcpu *vcpu,
>                               unsigned long val)
>  {
>      uint32_t intid = VGIC_ADDR_TO_INTID(addr, 1);
> -    unsigned int i;
> 
> -    bitmap_for_each ( i, &val, len * 8 )
> +    if ( len < sizeof(val) )
> +        val &= (1UL << (len * 8)) - 1;
> +
> +    for_each_set_bit ( i, val )
>      {
>          struct vgic_irq *irq;
>          unsigned long flags;
> @@ -182,11 +186,13 @@ void vgic_mmio_write_spending(struct vcpu *vcpu,
>                                unsigned long val)
>  {
>      uint32_t intid = VGIC_ADDR_TO_INTID(addr, 1);
> -    unsigned int i;
>      unsigned long flags;
>      irq_desc_t *desc;
> 
> -    bitmap_for_each ( i, &val, len * 8 )
> +    if ( len < sizeof(val) )
> +        val &= (1UL << (len * 8)) - 1;
> +
> +    for_each_set_bit ( i, val )
>      {
>          struct vgic_irq *irq = vgic_get_irq(vcpu->domain, vcpu, intid + i);
> 
> @@ -230,11 +236,13 @@ void vgic_mmio_write_cpending(struct vcpu *vcpu,
>                                unsigned long val)
>  {
>      uint32_t intid = VGIC_ADDR_TO_INTID(addr, 1);
> -    unsigned int i;
>      unsigned long flags;
>      irq_desc_t *desc;
> 
> -    bitmap_for_each ( i, &val, len * 8 )
> +    if ( len < sizeof(val) )
> +        val &= (1UL << (len * 8)) - 1;
> +
> +    for_each_set_bit ( i, val )
>      {
>          struct vgic_irq *irq = vgic_get_irq(vcpu->domain, vcpu, intid + i);
> 
> @@ -326,9 +334,11 @@ void vgic_mmio_write_cactive(struct vcpu *vcpu,
>                               unsigned long val)
>  {
>      uint32_t intid = VGIC_ADDR_TO_INTID(addr, 1);
> -    unsigned int i;
> 
> -    bitmap_for_each ( i, &val, len * 8 )
> +    if ( len < sizeof(val) )
> +        val &= (1UL << (len * 8)) - 1;
> +
> +    for_each_set_bit ( i, val )
>      {
>          struct vgic_irq *irq = vgic_get_irq(vcpu->domain, vcpu, intid + i);
> 
> @@ -356,9 +366,11 @@ void vgic_mmio_write_sactive(struct vcpu *vcpu,
>                               unsigned long val)
>  {
>      uint32_t intid = VGIC_ADDR_TO_INTID(addr, 1);
> -    unsigned int i;
> 
> -    bitmap_for_each ( i, &val, len * 8 )
> +    if ( len < sizeof(val) )
> +        val &= (1UL << (len * 8)) - 1;
> +
> +    for_each_set_bit ( i, val )
>      {
>          struct vgic_irq *irq = vgic_get_irq(vcpu->domain, vcpu, intid + i);
> 
> --
> 2.39.2
> 

~Michal


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/4] ARM/vgic: Use for_each_set_bit() in gic_find_unused_lr()
  2024-09-03  8:46   ` Michal Orzel
@ 2024-09-03 13:07     ` Andrew Cooper
  0 siblings, 0 replies; 12+ messages in thread
From: Andrew Cooper @ 2024-09-03 13:07 UTC (permalink / raw)
  To: Michal Orzel, Xen-devel
  Cc: Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Bertrand Marquis

On 03/09/2024 9:46 am, Michal Orzel wrote:
>
> On 02/09/2024 12:03, Andrew Cooper wrote:
>>
>> There are no bits set in lr_mask beyond nr_lrs, so when substituting
>> bitmap_for_each() for for_each_set_bit(), we don't need to worry about the
>> upper bound.
>>
>> However, the type of lr_mask does matter, so switch it to be uint64_t * and
>> move unsigned long * override until the find_next_zero_bit() call.
>>
>> Move lr_val into a narrower scope and drop used_lr as it's declared by
>> for_each_set_bit() itself.
>>
>> Drop the nr_lrs variable and use gic_get_nr_lrs() in the one location its now
>> used.  It hides a triple pointer defererence, and while it may not be needed
> s/defererence/dereference

Fixed.

>
>> in the PRISTINE case, it certainly doesn't need to be live across the rest of
>> the function.
>>
>> No functional change.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Michal Orzel <michal.orzel@amd.com>

Thanks.

~Andrew


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/4] ARM/vgic: Use for_each_set_bit() in vgic-mmio*
  2024-09-03 10:30   ` Michal Orzel
@ 2024-09-03 13:19     ` Andrew Cooper
  2024-09-09 21:54       ` Julien Grall
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Cooper @ 2024-09-03 13:19 UTC (permalink / raw)
  To: Michal Orzel, Xen-devel, Julien Grall
  Cc: Stefano Stabellini, Volodymyr Babchuk, Bertrand Marquis

On 03/09/2024 11:30 am, Michal Orzel wrote:
>
> On 02/09/2024 12:03, Andrew Cooper wrote:
>>
>> These are all loops over a scalar value, and don't need to call general bitop
>> helpers behind the scenes.
>>
>> No functional change.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>> CC: Stefano Stabellini <sstabellini@kernel.org>
>> CC: Julien Grall <julien@xen.org>
>> CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
>> CC: Bertrand Marquis <bertrand.marquis@arm.com>
>> CC: Michal Orzel <michal.orzel@amd.com>
>>
>> Slighly RFC.  It's unclear whether len is the size of the register, or the
>> size of the access.  For sub-GPR accesses, won't the upper bits be clear
>> anyway?  If so, this can be simplified further.
> See dispatch_mmio_write(). "len" refers to access size (i.e. 1/4/8 bytes). Each register is defined
> with a region access i.e. VGIC_ACCESS_32bit that is compared with the actual access. In the current code
> there is no register with 8B access. If there is a mismatch, there will be a data abort injected.
> Also, the top 32-bits are not cleared anywhere, so I don't think we can drop masking. @Julien?

Ok, so it is necessary right now to have the clamping logic in every
callback.

However, given that the size is validated before dispatching, clamping
once in dispatch_mmio_write() would save a lot of repeated code in the
callbacks.

i.e., I think this:

diff --git a/xen/arch/arm/vgic/vgic-mmio.c b/xen/arch/arm/vgic/vgic-mmio.c
index bd4caf7fc326..e8b9805a0b2c 100644
--- a/xen/arch/arm/vgic/vgic-mmio.c
+++ b/xen/arch/arm/vgic/vgic-mmio.c
@@ -590,6 +590,9 @@ static int dispatch_mmio_write(struct vcpu *vcpu,
mmio_info_t *info,
     if ( !region )
         return 0;
 
+    if ( len < sizeof(data) )
+        data &= ~((1UL << (len * 8)) - 1);
+
     switch (iodev->iodev_type)
     {
     case IODEV_DIST:

would work to replace every if() introduced below.

~Andrew


>
>> $ ../scripts/bloat-o-meter xen-syms-arm64-{before,after}
>> add/remove: 0/0 grow/shrink: 2/5 up/down: 20/-140 (-120)
>> Function                                     old     new   delta
>> vgic_mmio_write_spending                     320     332     +12
>> vgic_mmio_write_cpending                     368     376      +8
>> vgic_mmio_write_sactive                      192     176     -16
>> vgic_mmio_write_cactive                      192     176     -16
>> vgic_mmio_write_cenable                      316     288     -28
>> vgic_mmio_write_senable                      320     284     -36
>> vgic_mmio_write_sgir                         344     300     -44
>>
>> $ ../scripts/bloat-o-meter xen-syms-arm32-{before,after}
>> add/remove: 0/0 grow/shrink: 0/4 up/down: 0/-32 (-32)
>> Function                                     old     new   delta
>> vgic_mmio_write_sactive                      204     200      -4
>> vgic_mmio_write_cpending                     464     460      -4
>> vgic_mmio_write_cactive                      204     200      -4
>> vgic_mmio_write_sgir                         336     316     -20
>> ---
>>  xen/arch/arm/vgic/vgic-mmio-v2.c |  3 +--
>>  xen/arch/arm/vgic/vgic-mmio.c    | 36 +++++++++++++++++++++-----------
>>  2 files changed, 25 insertions(+), 14 deletions(-)
>>
>> diff --git a/xen/arch/arm/vgic/vgic-mmio-v2.c b/xen/arch/arm/vgic/vgic-mmio-v2.c
>> index 670b335db2c3..42fac0403f07 100644
>> --- a/xen/arch/arm/vgic/vgic-mmio-v2.c
>> +++ b/xen/arch/arm/vgic/vgic-mmio-v2.c
>> @@ -90,7 +90,6 @@ static void vgic_mmio_write_sgir(struct vcpu *source_vcpu,
>>      unsigned int intid = val & GICD_SGI_INTID_MASK;
>>      unsigned long targets = (val & GICD_SGI_TARGET_MASK) >>
>>                              GICD_SGI_TARGET_SHIFT;
>> -    unsigned int vcpu_id;
>>
>>      switch ( val & GICD_SGI_TARGET_LIST_MASK )
>>      {
>> @@ -108,7 +107,7 @@ static void vgic_mmio_write_sgir(struct vcpu *source_vcpu,
>>          return;
>>      }
>>
>> -    bitmap_for_each ( vcpu_id, &targets, 8 )
>> +    for_each_set_bit ( vcpu_id, (uint8_t)targets )
>>      {
>>          struct vcpu *vcpu = d->vcpu[vcpu_id];
>>          struct vgic_irq *irq = vgic_get_irq(d, vcpu, intid);
>> diff --git a/xen/arch/arm/vgic/vgic-mmio.c b/xen/arch/arm/vgic/vgic-mmio.c
>> index bd4caf7fc326..f7c336a238ab 100644
>> --- a/xen/arch/arm/vgic/vgic-mmio.c
>> +++ b/xen/arch/arm/vgic/vgic-mmio.c
>> @@ -69,9 +69,11 @@ void vgic_mmio_write_senable(struct vcpu *vcpu,
>>                               unsigned long val)
>>  {
>>      uint32_t intid = VGIC_ADDR_TO_INTID(addr, 1);
>> -    unsigned int i;
>>
>> -    bitmap_for_each ( i, &val, len * 8 )
>> +    if ( len < sizeof(val) )
>> +        val &= (1UL << (len * 8)) - 1;
>> +
>> +    for_each_set_bit ( i, val )
>>      {
>>          struct vgic_irq *irq = vgic_get_irq(vcpu->domain, vcpu, intid + i);
>>          unsigned long flags;
>> @@ -114,9 +116,11 @@ void vgic_mmio_write_cenable(struct vcpu *vcpu,
>>                               unsigned long val)
>>  {
>>      uint32_t intid = VGIC_ADDR_TO_INTID(addr, 1);
>> -    unsigned int i;
>>
>> -    bitmap_for_each ( i, &val, len * 8 )
>> +    if ( len < sizeof(val) )
>> +        val &= (1UL << (len * 8)) - 1;
>> +
>> +    for_each_set_bit ( i, val )
>>      {
>>          struct vgic_irq *irq;
>>          unsigned long flags;
>> @@ -182,11 +186,13 @@ void vgic_mmio_write_spending(struct vcpu *vcpu,
>>                                unsigned long val)
>>  {
>>      uint32_t intid = VGIC_ADDR_TO_INTID(addr, 1);
>> -    unsigned int i;
>>      unsigned long flags;
>>      irq_desc_t *desc;
>>
>> -    bitmap_for_each ( i, &val, len * 8 )
>> +    if ( len < sizeof(val) )
>> +        val &= (1UL << (len * 8)) - 1;
>> +
>> +    for_each_set_bit ( i, val )
>>      {
>>          struct vgic_irq *irq = vgic_get_irq(vcpu->domain, vcpu, intid + i);
>>
>> @@ -230,11 +236,13 @@ void vgic_mmio_write_cpending(struct vcpu *vcpu,
>>                                unsigned long val)
>>  {
>>      uint32_t intid = VGIC_ADDR_TO_INTID(addr, 1);
>> -    unsigned int i;
>>      unsigned long flags;
>>      irq_desc_t *desc;
>>
>> -    bitmap_for_each ( i, &val, len * 8 )
>> +    if ( len < sizeof(val) )
>> +        val &= (1UL << (len * 8)) - 1;
>> +
>> +    for_each_set_bit ( i, val )
>>      {
>>          struct vgic_irq *irq = vgic_get_irq(vcpu->domain, vcpu, intid + i);
>>
>> @@ -326,9 +334,11 @@ void vgic_mmio_write_cactive(struct vcpu *vcpu,
>>                               unsigned long val)
>>  {
>>      uint32_t intid = VGIC_ADDR_TO_INTID(addr, 1);
>> -    unsigned int i;
>>
>> -    bitmap_for_each ( i, &val, len * 8 )
>> +    if ( len < sizeof(val) )
>> +        val &= (1UL << (len * 8)) - 1;
>> +
>> +    for_each_set_bit ( i, val )
>>      {
>>          struct vgic_irq *irq = vgic_get_irq(vcpu->domain, vcpu, intid + i);
>>
>> @@ -356,9 +366,11 @@ void vgic_mmio_write_sactive(struct vcpu *vcpu,
>>                               unsigned long val)
>>  {
>>      uint32_t intid = VGIC_ADDR_TO_INTID(addr, 1);
>> -    unsigned int i;
>>
>> -    bitmap_for_each ( i, &val, len * 8 )
>> +    if ( len < sizeof(val) )
>> +        val &= (1UL << (len * 8)) - 1;
>> +
>> +    for_each_set_bit ( i, val )
>>      {
>>          struct vgic_irq *irq = vgic_get_irq(vcpu->domain, vcpu, intid + i);
>>
>> --
>> 2.39.2
>>
> ~Michal



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/4] ARM/vgic: Use for_each_set_bit() in vgic-mmio*
  2024-09-03 13:19     ` Andrew Cooper
@ 2024-09-09 21:54       ` Julien Grall
  0 siblings, 0 replies; 12+ messages in thread
From: Julien Grall @ 2024-09-09 21:54 UTC (permalink / raw)
  To: Andrew Cooper, Michal Orzel, Xen-devel
  Cc: Stefano Stabellini, Volodymyr Babchuk, Bertrand Marquis

Hi,

On 03/09/2024 14:19, Andrew Cooper wrote:
> On 03/09/2024 11:30 am, Michal Orzel wrote:
>>
>> On 02/09/2024 12:03, Andrew Cooper wrote:
>>>
>>> These are all loops over a scalar value, and don't need to call general bitop
>>> helpers behind the scenes.
>>>
>>> No functional change.
>>>
>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>> ---
>>> CC: Stefano Stabellini <sstabellini@kernel.org>
>>> CC: Julien Grall <julien@xen.org>
>>> CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
>>> CC: Bertrand Marquis <bertrand.marquis@arm.com>
>>> CC: Michal Orzel <michal.orzel@amd.com>
>>>
>>> Slighly RFC.  It's unclear whether len is the size of the register, or the
>>> size of the access.  For sub-GPR accesses, won't the upper bits be clear
>>> anyway?  If so, this can be simplified further.
>> See dispatch_mmio_write(). "len" refers to access size (i.e. 1/4/8 bytes). Each register is defined
>> with a region access i.e. VGIC_ACCESS_32bit that is compared with the actual access. In the current code
>> there is no register with 8B access. If there is a mismatch, there will be a data abort injected.
>> Also, the top 32-bits are not cleared anywhere, so I don't think we can drop masking. @Julien?

That's correct, there are no masking in the I/O dispatch helpers.

> 
> Ok, so it is necessary right now to have the clamping logic in every
> callback.
> 
> However, given that the size is validated before dispatching, clamping
> once in dispatch_mmio_write() would save a lot of repeated code in the
> callbacks.
> 
> i.e., I think this:
> 
> diff --git a/xen/arch/arm/vgic/vgic-mmio.c b/xen/arch/arm/vgic/vgic-mmio.c
> index bd4caf7fc326..e8b9805a0b2c 100644
> --- a/xen/arch/arm/vgic/vgic-mmio.c
> +++ b/xen/arch/arm/vgic/vgic-mmio.c
> @@ -590,6 +590,9 @@ static int dispatch_mmio_write(struct vcpu *vcpu,
> mmio_info_t *info,
>       if ( !region )
>           return 0;
>   
> +    if ( len < sizeof(data) )
> +        data &= ~((1UL << (len * 8)) - 1);
> +

I think it would make sense to move the masking one level higher in 
handle_write() (arch/arm/io.c). So this would cover all the emulation 
helpers.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-09-09 21:54 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-02 10:03 [PATCH 0/4] ARM: Cleanup following bitops improvements Andrew Cooper
2024-09-02 10:03 ` [PATCH 1/4] ARM/div: Drop __div64_fls() Andrew Cooper
2024-09-03  8:37   ` Michal Orzel
2024-09-02 10:03 ` [PATCH 2/4] ARM/vgic: Correct the expression for lr_all_full() Andrew Cooper
2024-09-03  8:42   ` Michal Orzel
2024-09-02 10:03 ` [PATCH 3/4] ARM/vgic: Use for_each_set_bit() in gic_find_unused_lr() Andrew Cooper
2024-09-03  8:46   ` Michal Orzel
2024-09-03 13:07     ` Andrew Cooper
2024-09-02 10:03 ` [PATCH 4/4] ARM/vgic: Use for_each_set_bit() in vgic-mmio* Andrew Cooper
2024-09-03 10:30   ` Michal Orzel
2024-09-03 13:19     ` Andrew Cooper
2024-09-09 21:54       ` Julien Grall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.