[PATCH 0/5] use mask operations instead of test

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/5] use mask operations instead of test_bit()
@ 2015-10-02  4:40 Juergen Gross
  2015-10-02  4:40 ` [PATCH 1/5] xen: use masking operation instead of test_bit for RTDS bits Juergen Gross
                   ` (7 more replies)
  0 siblings, 8 replies; 23+ messages in thread
From: Juergen Gross @ 2015-10-02  4:40 UTC (permalink / raw)
  To: keir, jbeulich, andrew.cooper3, dario.faggioli, george.dunlap,
	xen-devel
  Cc: Juergen Gross

Instead of using test_bit() which is an atomic operation and limits
the compiler's choices to do optimization, use logical ANDs with
bitmasks where possible.

The possible candidates have been detected by searching definitions
of bitmasks in the form:

#define MASK  1 << _MASK

On x86 the resulting code is slightly smaller (about 2 bytes for each
case, checked via disassembly in few examples).

I'm quite sure I didn't replace a test_bit() call required to be
atomic, but I'd be grateful for a thorough review especially in the
scheduler.


Juergen Gross (5):
  xen: use masking operation instead of test_bit for RTDS bits
  xen: use masking operation instead of test_bit for CSFLAG bits
  xen: use masking operation instead of test_bit for VGCF bits
  xen: use masking operation instead of test_bit for VPF bits
  xen: use masking operation instead of test_bit for MCSF bits

 xen/arch/x86/domain.c           | 12 ++++++------
 xen/arch/x86/domctl.c           |  2 +-
 xen/arch/x86/hvm/hvm.c          |  4 ++--
 xen/arch/x86/hvm/vpt.c          |  2 +-
 xen/arch/x86/x86_64/compat/mm.c |  4 ++--
 xen/common/domain.c             |  4 ++--
 xen/common/domctl.c             |  8 ++++----
 xen/common/multicall.c          |  2 +-
 xen/common/sched_credit2.c      | 16 ++++++++--------
 xen/common/sched_rt.c           |  4 ++--
 xen/common/schedule.c           | 16 ++++++++--------
 11 files changed, 37 insertions(+), 37 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/5] xen: use masking operation instead of test_bit for RTDS bits
  2015-10-02  4:40 [PATCH 0/5] use mask operations instead of test_bit() Juergen Gross
@ 2015-10-02  4:40 ` Juergen Gross
  2015-10-02 10:21   ` Dario Faggioli
  2015-10-05 13:30   ` George Dunlap
  2015-10-02  4:40 ` [PATCH 2/5] xen: use masking operation instead of test_bit for CSFLAG bits Juergen Gross
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 23+ messages in thread
From: Juergen Gross @ 2015-10-02  4:40 UTC (permalink / raw)
  To: keir, jbeulich, andrew.cooper3, dario.faggioli, george.dunlap,
	xen-devel
  Cc: Juergen Gross

Use a bit mask for testing of a set bit instead of test_bit in case no
atomic operation is needed, as this will lead to smaller and more
effective code.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_rt.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 4b5c5e4..6a341b1 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -931,7 +931,7 @@ rt_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc)
         cpu_raise_softirq(vc->processor, SCHEDULE_SOFTIRQ);
     else if ( __vcpu_on_q(svc) )
         __q_remove(svc);
-    else if ( test_bit(__RTDS_delayed_runq_add, &svc->flags) )
+    else if ( svc->flags & RTDS_delayed_runq_add )
         clear_bit(__RTDS_delayed_runq_add, &svc->flags);
 }
 
@@ -1064,7 +1064,7 @@ rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
      * the Runqueue/DepletedQ. Instead, we set a flag so that it will be
      * put on the Runqueue/DepletedQ after the context has been saved.
      */
-    if ( unlikely(test_bit(__RTDS_scheduled, &svc->flags)) )
+    if ( unlikely(svc->flags & RTDS_scheduled) )
     {
         set_bit(__RTDS_delayed_runq_add, &svc->flags);
         return;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/5] xen: use masking operation instead of test_bit for RTDS bits
  2015-10-02  4:40 ` [PATCH 1/5] xen: use masking operation instead of test_bit for RTDS bits Juergen Gross
@ 2015-10-02 10:21   ` Dario Faggioli
  2015-10-05 13:30   ` George Dunlap
  1 sibling, 0 replies; 23+ messages in thread
From: Dario Faggioli @ 2015-10-02 10:21 UTC (permalink / raw)
  To: Juergen Gross, keir, jbeulich, andrew.cooper3, george.dunlap,
	xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 625 bytes --]

On Fri, 2015-10-02 at 06:40 +0200, Juergen Gross wrote:
> Use a bit mask for testing of a set bit instead of test_bit in case
> no
> atomic operation is needed, as this will lead to smaller and more
> effective code.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
>
Acked-by: Dario Faggioli <dario.faggioli@citrix.com>

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/5] xen: use masking operation instead of test_bit for RTDS bits
  2015-10-02  4:40 ` [PATCH 1/5] xen: use masking operation instead of test_bit for RTDS bits Juergen Gross
  2015-10-02 10:21   ` Dario Faggioli
@ 2015-10-05 13:30   ` George Dunlap
  1 sibling, 0 replies; 23+ messages in thread
From: George Dunlap @ 2015-10-05 13:30 UTC (permalink / raw)
  To: Juergen Gross, keir, jbeulich, andrew.cooper3, dario.faggioli,
	george.dunlap, xen-devel

On 02/10/15 05:40, Juergen Gross wrote:
> Use a bit mask for testing of a set bit instead of test_bit in case no
> atomic operation is needed, as this will lead to smaller and more
> effective code.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>

Acked-by: George Dunlap <george.dunlap@citrix.com>

> ---
>  xen/common/sched_rt.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
> index 4b5c5e4..6a341b1 100644
> --- a/xen/common/sched_rt.c
> +++ b/xen/common/sched_rt.c
> @@ -931,7 +931,7 @@ rt_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc)
>          cpu_raise_softirq(vc->processor, SCHEDULE_SOFTIRQ);
>      else if ( __vcpu_on_q(svc) )
>          __q_remove(svc);
> -    else if ( test_bit(__RTDS_delayed_runq_add, &svc->flags) )
> +    else if ( svc->flags & RTDS_delayed_runq_add )
>          clear_bit(__RTDS_delayed_runq_add, &svc->flags);
>  }
>  
> @@ -1064,7 +1064,7 @@ rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
>       * the Runqueue/DepletedQ. Instead, we set a flag so that it will be
>       * put on the Runqueue/DepletedQ after the context has been saved.
>       */
> -    if ( unlikely(test_bit(__RTDS_scheduled, &svc->flags)) )
> +    if ( unlikely(svc->flags & RTDS_scheduled) )
>      {
>          set_bit(__RTDS_delayed_runq_add, &svc->flags);
>          return;
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 2/5] xen: use masking operation instead of test_bit for CSFLAG bits
  2015-10-02  4:40 [PATCH 0/5] use mask operations instead of test_bit() Juergen Gross
  2015-10-02  4:40 ` [PATCH 1/5] xen: use masking operation instead of test_bit for RTDS bits Juergen Gross
@ 2015-10-02  4:40 ` Juergen Gross
  2015-10-02 10:45   ` Dario Faggioli
  2015-10-05 13:30   ` George Dunlap
  2015-10-02  4:40 ` [PATCH 3/5] xen: use masking operation instead of test_bit for VGCF bits Juergen Gross
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 23+ messages in thread
From: Juergen Gross @ 2015-10-02  4:40 UTC (permalink / raw)
  To: keir, jbeulich, andrew.cooper3, dario.faggioli, george.dunlap,
	xen-devel
  Cc: Juergen Gross

Use a bit mask for testing of a set bit instead of test_bit in case no
atomic operation is needed, as this will lead to smaller and more
effective code.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched_credit2.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index bf1fe6f..912e1a2 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -418,7 +418,7 @@ __runq_insert(struct list_head *runq, struct csched2_vcpu *svc)
     /* Idle vcpus not allowed on the runqueue anymore */
     BUG_ON(is_idle_vcpu(svc->vcpu));
     BUG_ON(svc->vcpu->is_running);
-    BUG_ON(test_bit(__CSFLAG_scheduled, &svc->flags));
+    BUG_ON(svc->flags & CSFLAG_scheduled);
 
     list_for_each( iter, runq )
     {
@@ -844,7 +844,7 @@ static void
 __runq_deassign(struct csched2_vcpu *svc)
 {
     BUG_ON(__vcpu_on_runq(svc));
-    BUG_ON(test_bit(__CSFLAG_scheduled, &svc->flags));
+    BUG_ON(svc->flags & CSFLAG_scheduled);
 
     list_del_init(&svc->rqd_elem);
     update_max_weight(svc->rqd, 0, svc->weight);
@@ -952,7 +952,7 @@ csched2_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc)
         update_load(ops, svc->rqd, svc, -1, NOW());
         __runq_remove(svc);
     }
-    else if ( test_bit(__CSFLAG_delayed_runq_add, &svc->flags) )
+    else if ( svc->flags & CSFLAG_delayed_runq_add )
         clear_bit(__CSFLAG_delayed_runq_add, &svc->flags);
 }
 
@@ -988,7 +988,7 @@ csched2_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
     /* If the context hasn't been saved for this vcpu yet, we can't put it on
      * another runqueue.  Instead, we set a flag so that it will be put on the runqueue
      * after the context has been saved. */
-    if ( unlikely (test_bit(__CSFLAG_scheduled, &svc->flags) ) )
+    if ( unlikely(svc->flags & CSFLAG_scheduled) )
     {
         set_bit(__CSFLAG_delayed_runq_add, &svc->flags);
         goto out;
@@ -1204,7 +1204,7 @@ static void migrate(const struct scheduler *ops,
                     struct csched2_runqueue_data *trqd, 
                     s_time_t now)
 {
-    if ( test_bit(__CSFLAG_scheduled, &svc->flags) )
+    if ( svc->flags & CSFLAG_scheduled )
     {
         d2printk("%pv %d-%d a\n", svc->vcpu, svc->rqd->id, trqd->id);
         /* It's running; mark it to migrate. */
@@ -1348,7 +1348,7 @@ retry:
         __update_svc_load(ops, push_svc, 0, now);
 
         /* Skip this one if it's already been flagged to migrate */
-        if ( test_bit(__CSFLAG_runq_migrate_request, &push_svc->flags) )
+        if ( push_svc->flags & CSFLAG_runq_migrate_request )
             continue;
 
         list_for_each( pull_iter, &st.orqd->svc )
@@ -1361,7 +1361,7 @@ retry:
             }
         
             /* Skip this one if it's already been flagged to migrate */
-            if ( test_bit(__CSFLAG_runq_migrate_request, &pull_svc->flags) )
+            if ( pull_svc->flags & CSFLAG_runq_migrate_request )
                 continue;
 
             consider(&st, push_svc, pull_svc);
@@ -1378,7 +1378,7 @@ retry:
         struct csched2_vcpu * pull_svc = list_entry(pull_iter, struct csched2_vcpu, rqd_elem);
         
         /* Skip this one if it's already been flagged to migrate */
-        if ( test_bit(__CSFLAG_runq_migrate_request, &pull_svc->flags) )
+        if ( pull_svc->flags & CSFLAG_runq_migrate_request )
             continue;
 
         /* Consider pull only */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/5] xen: use masking operation instead of test_bit for CSFLAG bits
  2015-10-02  4:40 ` [PATCH 2/5] xen: use masking operation instead of test_bit for CSFLAG bits Juergen Gross
@ 2015-10-02 10:45   ` Dario Faggioli
  2015-10-05 13:30   ` George Dunlap
  1 sibling, 0 replies; 23+ messages in thread
From: Dario Faggioli @ 2015-10-02 10:45 UTC (permalink / raw)
  To: Juergen Gross, keir, jbeulich, andrew.cooper3, george.dunlap,
	xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 617 bytes --]

On Fri, 2015-10-02 at 06:40 +0200, Juergen Gross wrote:
> Use a bit mask for testing of a set bit instead of test_bit in case
> no
> atomic operation is needed, as this will lead to smaller and more
> effective code.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/5] xen: use masking operation instead of test_bit for CSFLAG bits
  2015-10-02  4:40 ` [PATCH 2/5] xen: use masking operation instead of test_bit for CSFLAG bits Juergen Gross
  2015-10-02 10:45   ` Dario Faggioli
@ 2015-10-05 13:30   ` George Dunlap
  1 sibling, 0 replies; 23+ messages in thread
From: George Dunlap @ 2015-10-05 13:30 UTC (permalink / raw)
  To: Juergen Gross, keir, jbeulich, andrew.cooper3, dario.faggioli,
	george.dunlap, xen-devel

On 02/10/15 05:40, Juergen Gross wrote:
> Use a bit mask for testing of a set bit instead of test_bit in case no
> atomic operation is needed, as this will lead to smaller and more
> effective code.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>

Acked-by: George Dunlap <george.dunlap@citrix.com>

> ---
>  xen/common/sched_credit2.c | 16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index bf1fe6f..912e1a2 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -418,7 +418,7 @@ __runq_insert(struct list_head *runq, struct csched2_vcpu *svc)
>      /* Idle vcpus not allowed on the runqueue anymore */
>      BUG_ON(is_idle_vcpu(svc->vcpu));
>      BUG_ON(svc->vcpu->is_running);
> -    BUG_ON(test_bit(__CSFLAG_scheduled, &svc->flags));
> +    BUG_ON(svc->flags & CSFLAG_scheduled);
>  
>      list_for_each( iter, runq )
>      {
> @@ -844,7 +844,7 @@ static void
>  __runq_deassign(struct csched2_vcpu *svc)
>  {
>      BUG_ON(__vcpu_on_runq(svc));
> -    BUG_ON(test_bit(__CSFLAG_scheduled, &svc->flags));
> +    BUG_ON(svc->flags & CSFLAG_scheduled);
>  
>      list_del_init(&svc->rqd_elem);
>      update_max_weight(svc->rqd, 0, svc->weight);
> @@ -952,7 +952,7 @@ csched2_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc)
>          update_load(ops, svc->rqd, svc, -1, NOW());
>          __runq_remove(svc);
>      }
> -    else if ( test_bit(__CSFLAG_delayed_runq_add, &svc->flags) )
> +    else if ( svc->flags & CSFLAG_delayed_runq_add )
>          clear_bit(__CSFLAG_delayed_runq_add, &svc->flags);
>  }
>  
> @@ -988,7 +988,7 @@ csched2_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
>      /* If the context hasn't been saved for this vcpu yet, we can't put it on
>       * another runqueue.  Instead, we set a flag so that it will be put on the runqueue
>       * after the context has been saved. */
> -    if ( unlikely (test_bit(__CSFLAG_scheduled, &svc->flags) ) )
> +    if ( unlikely(svc->flags & CSFLAG_scheduled) )
>      {
>          set_bit(__CSFLAG_delayed_runq_add, &svc->flags);
>          goto out;
> @@ -1204,7 +1204,7 @@ static void migrate(const struct scheduler *ops,
>                      struct csched2_runqueue_data *trqd, 
>                      s_time_t now)
>  {
> -    if ( test_bit(__CSFLAG_scheduled, &svc->flags) )
> +    if ( svc->flags & CSFLAG_scheduled )
>      {
>          d2printk("%pv %d-%d a\n", svc->vcpu, svc->rqd->id, trqd->id);
>          /* It's running; mark it to migrate. */
> @@ -1348,7 +1348,7 @@ retry:
>          __update_svc_load(ops, push_svc, 0, now);
>  
>          /* Skip this one if it's already been flagged to migrate */
> -        if ( test_bit(__CSFLAG_runq_migrate_request, &push_svc->flags) )
> +        if ( push_svc->flags & CSFLAG_runq_migrate_request )
>              continue;
>  
>          list_for_each( pull_iter, &st.orqd->svc )
> @@ -1361,7 +1361,7 @@ retry:
>              }
>          
>              /* Skip this one if it's already been flagged to migrate */
> -            if ( test_bit(__CSFLAG_runq_migrate_request, &pull_svc->flags) )
> +            if ( pull_svc->flags & CSFLAG_runq_migrate_request )
>                  continue;
>  
>              consider(&st, push_svc, pull_svc);
> @@ -1378,7 +1378,7 @@ retry:
>          struct csched2_vcpu * pull_svc = list_entry(pull_iter, struct csched2_vcpu, rqd_elem);
>          
>          /* Skip this one if it's already been flagged to migrate */
> -        if ( test_bit(__CSFLAG_runq_migrate_request, &pull_svc->flags) )
> +        if ( pull_svc->flags & CSFLAG_runq_migrate_request )
>              continue;
>  
>          /* Consider pull only */
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 3/5] xen: use masking operation instead of test_bit for VGCF bits
  2015-10-02  4:40 [PATCH 0/5] use mask operations instead of test_bit() Juergen Gross
  2015-10-02  4:40 ` [PATCH 1/5] xen: use masking operation instead of test_bit for RTDS bits Juergen Gross
  2015-10-02  4:40 ` [PATCH 2/5] xen: use masking operation instead of test_bit for CSFLAG bits Juergen Gross
@ 2015-10-02  4:40 ` Juergen Gross
  2015-10-02  4:40 ` [PATCH 4/5] xen: use masking operation instead of test_bit for VPF bits Juergen Gross
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Juergen Gross @ 2015-10-02  4:40 UTC (permalink / raw)
  To: keir, jbeulich, andrew.cooper3, dario.faggioli, george.dunlap,
	xen-devel
  Cc: Juergen Gross

Use a bit mask for testing of a set bit instead of test_bit in case no
atomic operation is needed, as this will lead to smaller and more
effective code.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/arch/x86/domain.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index dc3bb08..4e96f6c 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1352,7 +1352,7 @@ static void load_segments(struct vcpu *n)
                 domain_crash(n->domain);
             }
 
-            if ( test_bit(_VGCF_failsafe_disables_events, &n->arch.vgc_flags) )
+            if ( n->arch.vgc_flags & VGCF_failsafe_disables_events )
                 vcpu_info(n, evtchn_upcall_mask) = 1;
 
             regs->entry_vector |= TRAP_syscall;
@@ -1394,7 +1394,7 @@ static void load_segments(struct vcpu *n)
             domain_crash(n->domain);
         }
 
-        if ( test_bit(_VGCF_failsafe_disables_events, &n->arch.vgc_flags) )
+        if ( n->arch.vgc_flags & VGCF_failsafe_disables_events )
             vcpu_info(n, evtchn_upcall_mask) = 1;
 
         regs->entry_vector |= TRAP_syscall;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/5] xen: use masking operation instead of test_bit for VPF bits
  2015-10-02  4:40 [PATCH 0/5] use mask operations instead of test_bit() Juergen Gross
                   ` (2 preceding siblings ...)
  2015-10-02  4:40 ` [PATCH 3/5] xen: use masking operation instead of test_bit for VGCF bits Juergen Gross
@ 2015-10-02  4:40 ` Juergen Gross
  2015-10-05 13:18   ` George Dunlap
  2015-10-05 13:24   ` George Dunlap
  2015-10-02  4:40 ` [PATCH 5/5] xen: use masking operation instead of test_bit for MCSF bits Juergen Gross
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 23+ messages in thread
From: Juergen Gross @ 2015-10-02  4:40 UTC (permalink / raw)
  To: keir, jbeulich, andrew.cooper3, dario.faggioli, george.dunlap,
	xen-devel
  Cc: Juergen Gross

Use a bit mask for testing of a set bit instead of test_bit in case no
atomic operation is needed, as this will lead to smaller and more
effective code.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/arch/x86/domctl.c  |  2 +-
 xen/arch/x86/hvm/hvm.c |  4 ++--
 xen/arch/x86/hvm/vpt.c |  2 +-
 xen/common/domain.c    |  4 ++--
 xen/common/domctl.c    |  8 ++++----
 xen/common/schedule.c  | 16 ++++++++--------
 6 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 6172c0d..f8a559c 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -1213,7 +1213,7 @@ void arch_get_info_guest(struct vcpu *v, vcpu_guest_context_u c)
     c(flags = v->arch.vgc_flags & ~(VGCF_i387_valid|VGCF_in_kernel));
     if ( v->fpu_initialised )
         c(flags |= VGCF_i387_valid);
-    if ( !test_bit(_VPF_down, &v->pause_flags) )
+    if ( !(v->pause_flags & VPF_down) )
         c(flags |= VGCF_online);
     if ( !compat )
     {
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 6afc344..3fa2280 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1728,7 +1728,7 @@ static int hvm_save_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
     {
         /* We don't need to save state for a vcpu that is down; the restore 
          * code will leave it down if there is nothing saved. */
-        if ( test_bit(_VPF_down, &v->pause_flags) ) 
+        if ( v->pause_flags & VPF_down )
             continue;
 
         /* Architecture-specific vmcs/vmcb bits */
@@ -2512,7 +2512,7 @@ void hvm_vcpu_down(struct vcpu *v)
     /* Any other VCPUs online? ... */
     domain_lock(d);
     for_each_vcpu ( d, v )
-        if ( !test_bit(_VPF_down, &v->pause_flags) )
+        if ( !(v->pause_flags & VPF_down) )
             online_count++;
     domain_unlock(d);
 
diff --git a/xen/arch/x86/hvm/vpt.c b/xen/arch/x86/hvm/vpt.c
index 0c8b22e..4fa6566 100644
--- a/xen/arch/x86/hvm/vpt.c
+++ b/xen/arch/x86/hvm/vpt.c
@@ -178,7 +178,7 @@ void pt_save_timer(struct vcpu *v)
     struct list_head *head = &v->arch.hvm_vcpu.tm_list;
     struct periodic_time *pt;
 
-    if ( test_bit(_VPF_blocked, &v->pause_flags) )
+    if ( v->pause_flags & VPF_blocked )
         return;
 
     spin_lock(&v->arch.hvm_vcpu.tm_lock);
diff --git a/xen/common/domain.c b/xen/common/domain.c
index cda60a9..7c362eb 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1135,7 +1135,7 @@ int map_vcpu_info(struct vcpu *v, unsigned long gfn, unsigned offset)
         return -EINVAL;
 
     /* Run this command on yourself or on other offline VCPUS. */
-    if ( (v != current) && !test_bit(_VPF_down, &v->pause_flags) )
+    if ( (v != current) && !(v->pause_flags & VPF_down) )
         return -EINVAL;
 
     page = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC);
@@ -1263,7 +1263,7 @@ long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
         break;
 
     case VCPUOP_is_up:
-        rc = !test_bit(_VPF_down, &v->pause_flags);
+        rc = !(v->pause_flags & VPF_down);
         break;
 
     case VCPUOP_get_runstate_info:
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 08de32d..46b967e 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -170,7 +170,7 @@ void getdomaininfo(struct domain *d, struct xen_domctl_getdomaininfo *info)
         vcpu_runstate_get(v, &runstate);
         cpu_time += runstate.time[RUNSTATE_running];
         info->max_vcpu_id = v->vcpu_id;
-        if ( !test_bit(_VPF_down, &v->pause_flags) )
+        if ( !(v->pause_flags & VPF_down) )
         {
             if ( !(v->pause_flags & VPF_blocked) )
                 flags &= ~XEN_DOMINF_blocked;
@@ -231,7 +231,7 @@ static unsigned int default_vcpu0_location(cpumask_t *online)
         rcu_read_lock(&domlist_read_lock);
         for_each_domain ( d )
             for_each_vcpu ( d, v )
-                if ( !test_bit(_VPF_down, &v->pause_flags)
+                if ( !(v->pause_flags & VPF_down)
                      && ((cpu = v->processor) < nr_cpus) )
                     cnt[cpu]++;
         rcu_read_unlock(&domlist_read_lock);
@@ -944,8 +944,8 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
 
         vcpu_runstate_get(v, &runstate);
 
-        op->u.getvcpuinfo.online   = !test_bit(_VPF_down, &v->pause_flags);
-        op->u.getvcpuinfo.blocked  = test_bit(_VPF_blocked, &v->pause_flags);
+        op->u.getvcpuinfo.online   = !(v->pause_flags & VPF_down);
+        op->u.getvcpuinfo.blocked  = !!(v->pause_flags & VPF_blocked);
         op->u.getvcpuinfo.running  = v->is_running;
         op->u.getvcpuinfo.cpu_time = runstate.time[RUNSTATE_running];
         op->u.getvcpuinfo.cpu      = v->processor;
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 5ffa1a1..c5f640f 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -119,7 +119,7 @@ static inline void vcpu_urgent_count_update(struct vcpu *v)
 
     if ( unlikely(v->is_urgent) )
     {
-        if ( !test_bit(_VPF_blocked, &v->pause_flags) ||
+        if ( !(v->pause_flags & VPF_blocked) ||
              !test_bit(v->vcpu_id, v->domain->poll_mask) )
         {
             v->is_urgent = 0;
@@ -128,8 +128,8 @@ static inline void vcpu_urgent_count_update(struct vcpu *v)
     }
     else
     {
-        if ( unlikely(test_bit(_VPF_blocked, &v->pause_flags) &&
-                      test_bit(v->vcpu_id, v->domain->poll_mask)) )
+        if ( unlikely(v->pause_flags & VPF_blocked) &&
+             unlikely(test_bit(v->vcpu_id, v->domain->poll_mask)) )
         {
             v->is_urgent = 1;
             atomic_inc(&per_cpu(schedule_data,v->processor).urgent_count);
@@ -418,7 +418,7 @@ void vcpu_wake(struct vcpu *v)
             vcpu_runstate_change(v, RUNSTATE_runnable, NOW());
         SCHED_OP(VCPU2OP(v), wake, v);
     }
-    else if ( !test_bit(_VPF_blocked, &v->pause_flags) )
+    else if ( !(v->pause_flags & VPF_blocked) )
     {
         if ( v->runstate.state == RUNSTATE_blocked )
             vcpu_runstate_change(v, RUNSTATE_offline, NOW());
@@ -595,7 +595,7 @@ void vcpu_force_reschedule(struct vcpu *v)
         set_bit(_VPF_migrating, &v->pause_flags);
     vcpu_schedule_unlock_irq(lock, v);
 
-    if ( test_bit(_VPF_migrating, &v->pause_flags) )
+    if ( v->pause_flags & VPF_migrating )
     {
         vcpu_sleep_nosync(v);
         vcpu_migrate(v);
@@ -763,7 +763,7 @@ static int vcpu_set_affinity(
 
     domain_update_node_affinity(v->domain);
 
-    if ( test_bit(_VPF_migrating, &v->pause_flags) )
+    if ( v->pause_flags & VPF_migrating )
     {
         vcpu_sleep_nosync(v);
         vcpu_migrate(v);
@@ -1285,7 +1285,7 @@ static void schedule(void)
 
     vcpu_runstate_change(
         prev,
-        (test_bit(_VPF_blocked, &prev->pause_flags) ? RUNSTATE_blocked :
+        ((prev->pause_flags & VPF_blocked) ? RUNSTATE_blocked :
          (vcpu_runnable(prev) ? RUNSTATE_runnable : RUNSTATE_offline)),
         now);
     prev->last_run_time = now;
@@ -1327,7 +1327,7 @@ void context_saved(struct vcpu *prev)
 
     SCHED_OP(VCPU2OP(prev), context_saved, prev);
 
-    if ( unlikely(test_bit(_VPF_migrating, &prev->pause_flags)) )
+    if ( unlikely(prev->pause_flags & VPF_migrating) )
         vcpu_migrate(prev);
 }
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/5] xen: use masking operation instead of test_bit for VPF bits
  2015-10-02  4:40 ` [PATCH 4/5] xen: use masking operation instead of test_bit for VPF bits Juergen Gross
@ 2015-10-05 13:18   ` George Dunlap
  2015-10-05 13:36     ` Jan Beulich
  2015-10-05 13:39     ` Juergen Gross
  2015-10-05 13:24   ` George Dunlap
  1 sibling, 2 replies; 23+ messages in thread
From: George Dunlap @ 2015-10-05 13:18 UTC (permalink / raw)
  To: Juergen Gross, keir, jbeulich, andrew.cooper3, dario.faggioli,
	george.dunlap, xen-devel

On 02/10/15 05:40, Juergen Gross wrote:
> Use a bit mask for testing of a set bit instead of test_bit in case no
> atomic operation is needed, as this will lead to smaller and more
> effective code.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>

I'm a bit confused here -- exactly when is an atomic operation needed or
not needed?  Isn't it the case that we always need to do an atomic read
if we ever do an atomic write without a lock held?

For example, xen/common/schedule.c:vcpu_unblock(), vcpu_block(),
do_poll; xen/common/event_channel.c:defaultxen_notification_fn(), and
many more?

 -George

> ---
>  xen/arch/x86/domctl.c  |  2 +-
>  xen/arch/x86/hvm/hvm.c |  4 ++--
>  xen/arch/x86/hvm/vpt.c |  2 +-
>  xen/common/domain.c    |  4 ++--
>  xen/common/domctl.c    |  8 ++++----
>  xen/common/schedule.c  | 16 ++++++++--------
>  6 files changed, 18 insertions(+), 18 deletions(-)
> 
> diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
> index 6172c0d..f8a559c 100644
> --- a/xen/arch/x86/domctl.c
> +++ b/xen/arch/x86/domctl.c
> @@ -1213,7 +1213,7 @@ void arch_get_info_guest(struct vcpu *v, vcpu_guest_context_u c)
>      c(flags = v->arch.vgc_flags & ~(VGCF_i387_valid|VGCF_in_kernel));
>      if ( v->fpu_initialised )
>          c(flags |= VGCF_i387_valid);
> -    if ( !test_bit(_VPF_down, &v->pause_flags) )
> +    if ( !(v->pause_flags & VPF_down) )
>          c(flags |= VGCF_online);
>      if ( !compat )
>      {
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 6afc344..3fa2280 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -1728,7 +1728,7 @@ static int hvm_save_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
>      {
>          /* We don't need to save state for a vcpu that is down; the restore 
>           * code will leave it down if there is nothing saved. */
> -        if ( test_bit(_VPF_down, &v->pause_flags) ) 
> +        if ( v->pause_flags & VPF_down )
>              continue;
>  
>          /* Architecture-specific vmcs/vmcb bits */
> @@ -2512,7 +2512,7 @@ void hvm_vcpu_down(struct vcpu *v)
>      /* Any other VCPUs online? ... */
>      domain_lock(d);
>      for_each_vcpu ( d, v )
> -        if ( !test_bit(_VPF_down, &v->pause_flags) )
> +        if ( !(v->pause_flags & VPF_down) )
>              online_count++;
>      domain_unlock(d);
>  
> diff --git a/xen/arch/x86/hvm/vpt.c b/xen/arch/x86/hvm/vpt.c
> index 0c8b22e..4fa6566 100644
> --- a/xen/arch/x86/hvm/vpt.c
> +++ b/xen/arch/x86/hvm/vpt.c
> @@ -178,7 +178,7 @@ void pt_save_timer(struct vcpu *v)
>      struct list_head *head = &v->arch.hvm_vcpu.tm_list;
>      struct periodic_time *pt;
>  
> -    if ( test_bit(_VPF_blocked, &v->pause_flags) )
> +    if ( v->pause_flags & VPF_blocked )
>          return;
>  
>      spin_lock(&v->arch.hvm_vcpu.tm_lock);
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index cda60a9..7c362eb 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -1135,7 +1135,7 @@ int map_vcpu_info(struct vcpu *v, unsigned long gfn, unsigned offset)
>          return -EINVAL;
>  
>      /* Run this command on yourself or on other offline VCPUS. */
> -    if ( (v != current) && !test_bit(_VPF_down, &v->pause_flags) )
> +    if ( (v != current) && !(v->pause_flags & VPF_down) )
>          return -EINVAL;
>  
>      page = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC);
> @@ -1263,7 +1263,7 @@ long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
>          break;
>  
>      case VCPUOP_is_up:
> -        rc = !test_bit(_VPF_down, &v->pause_flags);
> +        rc = !(v->pause_flags & VPF_down);
>          break;
>  
>      case VCPUOP_get_runstate_info:
> diff --git a/xen/common/domctl.c b/xen/common/domctl.c
> index 08de32d..46b967e 100644
> --- a/xen/common/domctl.c
> +++ b/xen/common/domctl.c
> @@ -170,7 +170,7 @@ void getdomaininfo(struct domain *d, struct xen_domctl_getdomaininfo *info)
>          vcpu_runstate_get(v, &runstate);
>          cpu_time += runstate.time[RUNSTATE_running];
>          info->max_vcpu_id = v->vcpu_id;
> -        if ( !test_bit(_VPF_down, &v->pause_flags) )
> +        if ( !(v->pause_flags & VPF_down) )
>          {
>              if ( !(v->pause_flags & VPF_blocked) )
>                  flags &= ~XEN_DOMINF_blocked;
> @@ -231,7 +231,7 @@ static unsigned int default_vcpu0_location(cpumask_t *online)
>          rcu_read_lock(&domlist_read_lock);
>          for_each_domain ( d )
>              for_each_vcpu ( d, v )
> -                if ( !test_bit(_VPF_down, &v->pause_flags)
> +                if ( !(v->pause_flags & VPF_down)
>                       && ((cpu = v->processor) < nr_cpus) )
>                      cnt[cpu]++;
>          rcu_read_unlock(&domlist_read_lock);
> @@ -944,8 +944,8 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
>  
>          vcpu_runstate_get(v, &runstate);
>  
> -        op->u.getvcpuinfo.online   = !test_bit(_VPF_down, &v->pause_flags);
> -        op->u.getvcpuinfo.blocked  = test_bit(_VPF_blocked, &v->pause_flags);
> +        op->u.getvcpuinfo.online   = !(v->pause_flags & VPF_down);
> +        op->u.getvcpuinfo.blocked  = !!(v->pause_flags & VPF_blocked);
>          op->u.getvcpuinfo.running  = v->is_running;
>          op->u.getvcpuinfo.cpu_time = runstate.time[RUNSTATE_running];
>          op->u.getvcpuinfo.cpu      = v->processor;
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index 5ffa1a1..c5f640f 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -119,7 +119,7 @@ static inline void vcpu_urgent_count_update(struct vcpu *v)
>  
>      if ( unlikely(v->is_urgent) )
>      {
> -        if ( !test_bit(_VPF_blocked, &v->pause_flags) ||
> +        if ( !(v->pause_flags & VPF_blocked) ||
>               !test_bit(v->vcpu_id, v->domain->poll_mask) )
>          {
>              v->is_urgent = 0;
> @@ -128,8 +128,8 @@ static inline void vcpu_urgent_count_update(struct vcpu *v)
>      }
>      else
>      {
> -        if ( unlikely(test_bit(_VPF_blocked, &v->pause_flags) &&
> -                      test_bit(v->vcpu_id, v->domain->poll_mask)) )
> +        if ( unlikely(v->pause_flags & VPF_blocked) &&
> +             unlikely(test_bit(v->vcpu_id, v->domain->poll_mask)) )
>          {
>              v->is_urgent = 1;
>              atomic_inc(&per_cpu(schedule_data,v->processor).urgent_count);
> @@ -418,7 +418,7 @@ void vcpu_wake(struct vcpu *v)
>              vcpu_runstate_change(v, RUNSTATE_runnable, NOW());
>          SCHED_OP(VCPU2OP(v), wake, v);
>      }
> -    else if ( !test_bit(_VPF_blocked, &v->pause_flags) )
> +    else if ( !(v->pause_flags & VPF_blocked) )
>      {
>          if ( v->runstate.state == RUNSTATE_blocked )
>              vcpu_runstate_change(v, RUNSTATE_offline, NOW());
> @@ -595,7 +595,7 @@ void vcpu_force_reschedule(struct vcpu *v)
>          set_bit(_VPF_migrating, &v->pause_flags);
>      vcpu_schedule_unlock_irq(lock, v);
>  
> -    if ( test_bit(_VPF_migrating, &v->pause_flags) )
> +    if ( v->pause_flags & VPF_migrating )
>      {
>          vcpu_sleep_nosync(v);
>          vcpu_migrate(v);
> @@ -763,7 +763,7 @@ static int vcpu_set_affinity(
>  
>      domain_update_node_affinity(v->domain);
>  
> -    if ( test_bit(_VPF_migrating, &v->pause_flags) )
> +    if ( v->pause_flags & VPF_migrating )
>      {
>          vcpu_sleep_nosync(v);
>          vcpu_migrate(v);
> @@ -1285,7 +1285,7 @@ static void schedule(void)
>  
>      vcpu_runstate_change(
>          prev,
> -        (test_bit(_VPF_blocked, &prev->pause_flags) ? RUNSTATE_blocked :
> +        ((prev->pause_flags & VPF_blocked) ? RUNSTATE_blocked :
>           (vcpu_runnable(prev) ? RUNSTATE_runnable : RUNSTATE_offline)),
>          now);
>      prev->last_run_time = now;
> @@ -1327,7 +1327,7 @@ void context_saved(struct vcpu *prev)
>  
>      SCHED_OP(VCPU2OP(prev), context_saved, prev);
>  
> -    if ( unlikely(test_bit(_VPF_migrating, &prev->pause_flags)) )
> +    if ( unlikely(prev->pause_flags & VPF_migrating) )
>          vcpu_migrate(prev);
>  }
>  
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/5] xen: use masking operation instead of test_bit for VPF bits
  2015-10-05 13:18   ` George Dunlap
@ 2015-10-05 13:36     ` Jan Beulich
  2015-10-05 13:45       ` George Dunlap
  2015-10-05 13:39     ` Juergen Gross
  1 sibling, 1 reply; 23+ messages in thread
From: Jan Beulich @ 2015-10-05 13:36 UTC (permalink / raw)
  To: George Dunlap
  Cc: Juergen Gross, keir, george.dunlap, andrew.cooper3,
	dario.faggioli, xen-devel

>>> On 05.10.15 at 15:18, <george.dunlap@citrix.com> wrote:
> On 02/10/15 05:40, Juergen Gross wrote:
>> Use a bit mask for testing of a set bit instead of test_bit in case no
>> atomic operation is needed, as this will lead to smaller and more
>> effective code.
>> 
>> Signed-off-by: Juergen Gross <jgross@suse.com>
> 
> I'm a bit confused here -- exactly when is an atomic operation needed or
> not needed?  Isn't it the case that we always need to do an atomic read
> if we ever do an atomic write without a lock held?

First of all - what is an atomic read from CPU perspective other than
just a read? Since we talk about individual bits here, we don't care
about the granularity the compiler may convert the read to, and even
if the compiler chose to do multiple reads the result would still be
correct, as only one of those reads can possibly have read the bit
in question.

And then, the old mechanism was in no way "atomic", all it added was
a kind of compiler barrier (due to the cast to volatile). Yet in none of
the cases changed I was able to spot a need for such a barrier.

Jan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/5] xen: use masking operation instead of test_bit for VPF bits
  2015-10-05 13:36     ` Jan Beulich
@ 2015-10-05 13:45       ` George Dunlap
  2015-10-05 14:05         ` Jan Beulich
  0 siblings, 1 reply; 23+ messages in thread
From: George Dunlap @ 2015-10-05 13:45 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Juergen Gross, keir, george.dunlap, andrew.cooper3,
	dario.faggioli, xen-devel

On 05/10/15 14:36, Jan Beulich wrote:
>>>> On 05.10.15 at 15:18, <george.dunlap@citrix.com> wrote:
>> On 02/10/15 05:40, Juergen Gross wrote:
>>> Use a bit mask for testing of a set bit instead of test_bit in case no
>>> atomic operation is needed, as this will lead to smaller and more
>>> effective code.
>>>
>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>
>> I'm a bit confused here -- exactly when is an atomic operation needed or
>> not needed?  Isn't it the case that we always need to do an atomic read
>> if we ever do an atomic write without a lock held?
> 
> First of all - what is an atomic read from CPU perspective other than
> just a read? Since we talk about individual bits here, we don't care
> about the granularity the compiler may convert the read to, and even
> if the compiler chose to do multiple reads the result would still be
> correct, as only one of those reads can possibly have read the bit
> in question.
> 
> And then, the old mechanism was in no way "atomic", all it added was
> a kind of compiler barrier (due to the cast to volatile). Yet in none of
> the cases changed I was able to spot a need for such a barrier.

OK, so the key thing about test_bit isn't that it's atomic, so much that
it's an implicit memory barrier.  So as long as you're not doing a
lockless careful-ordering sort of thing, then a simple memory read
should be fine.  Is that correct?

In that case, it's likely that the patch is correct (though I'll take a
closer look just to be sure).

 -George

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/5] xen: use masking operation instead of test_bit for VPF bits
  2015-10-05 13:45       ` George Dunlap
@ 2015-10-05 14:05         ` Jan Beulich
  2015-10-05 14:31           ` George Dunlap
  0 siblings, 1 reply; 23+ messages in thread
From: Jan Beulich @ 2015-10-05 14:05 UTC (permalink / raw)
  To: George Dunlap
  Cc: Juergen Gross, keir, george.dunlap, andrew.cooper3,
	dario.faggioli, xen-devel

>>> On 05.10.15 at 15:45, <george.dunlap@citrix.com> wrote:
> On 05/10/15 14:36, Jan Beulich wrote:
>>>>> On 05.10.15 at 15:18, <george.dunlap@citrix.com> wrote:
>>> On 02/10/15 05:40, Juergen Gross wrote:
>>>> Use a bit mask for testing of a set bit instead of test_bit in case no
>>>> atomic operation is needed, as this will lead to smaller and more
>>>> effective code.
>>>>
>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>
>>> I'm a bit confused here -- exactly when is an atomic operation needed or
>>> not needed?  Isn't it the case that we always need to do an atomic read
>>> if we ever do an atomic write without a lock held?
>> 
>> First of all - what is an atomic read from CPU perspective other than
>> just a read? Since we talk about individual bits here, we don't care
>> about the granularity the compiler may convert the read to, and even
>> if the compiler chose to do multiple reads the result would still be
>> correct, as only one of those reads can possibly have read the bit
>> in question.
>> 
>> And then, the old mechanism was in no way "atomic", all it added was
>> a kind of compiler barrier (due to the cast to volatile). Yet in none of
>> the cases changed I was able to spot a need for such a barrier.
> 
> OK, so the key thing about test_bit isn't that it's atomic, so much that
> it's an implicit memory barrier.  So as long as you're not doing a
> lockless careful-ordering sort of thing, then a simple memory read
> should be fine.  Is that correct?

Yes.

> In that case, it's likely that the patch is correct (though I'll take a
> closer look just to be sure).

Thanks.

Jan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/5] xen: use masking operation instead of test_bit for VPF bits
  2015-10-05 14:05         ` Jan Beulich
@ 2015-10-05 14:31           ` George Dunlap
  0 siblings, 0 replies; 23+ messages in thread
From: George Dunlap @ 2015-10-05 14:31 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Juergen Gross, keir, george.dunlap, andrew.cooper3,
	dario.faggioli, xen-devel

On 05/10/15 15:05, Jan Beulich wrote:
>>>> On 05.10.15 at 15:45, <george.dunlap@citrix.com> wrote:
>> On 05/10/15 14:36, Jan Beulich wrote:
>>>>>> On 05.10.15 at 15:18, <george.dunlap@citrix.com> wrote:
>>>> On 02/10/15 05:40, Juergen Gross wrote:
>>>>> Use a bit mask for testing of a set bit instead of test_bit in case no
>>>>> atomic operation is needed, as this will lead to smaller and more
>>>>> effective code.
>>>>>
>>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>>
>>>> I'm a bit confused here -- exactly when is an atomic operation needed or
>>>> not needed?  Isn't it the case that we always need to do an atomic read
>>>> if we ever do an atomic write without a lock held?
>>>
>>> First of all - what is an atomic read from CPU perspective other than
>>> just a read? Since we talk about individual bits here, we don't care
>>> about the granularity the compiler may convert the read to, and even
>>> if the compiler chose to do multiple reads the result would still be
>>> correct, as only one of those reads can possibly have read the bit
>>> in question.
>>>
>>> And then, the old mechanism was in no way "atomic", all it added was
>>> a kind of compiler barrier (due to the cast to volatile). Yet in none of
>>> the cases changed I was able to spot a need for such a barrier.
>>
>> OK, so the key thing about test_bit isn't that it's atomic, so much that
>> it's an implicit memory barrier.  So as long as you're not doing a
>> lockless careful-ordering sort of thing, then a simple memory read
>> should be fine.  Is that correct?
> 
> Yes.
> 
>> In that case, it's likely that the patch is correct (though I'll take a
>> closer look just to be sure).
> 
> Thanks.

OK, I've looked through again and don't see anything that looks racy:

Reviewed-by: George Dunlap <george.dunlap@citrix.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/5] xen: use masking operation instead of test_bit for VPF bits
  2015-10-05 13:18   ` George Dunlap
  2015-10-05 13:36     ` Jan Beulich
@ 2015-10-05 13:39     ` Juergen Gross
  1 sibling, 0 replies; 23+ messages in thread
From: Juergen Gross @ 2015-10-05 13:39 UTC (permalink / raw)
  To: George Dunlap, keir, jbeulich, andrew.cooper3, dario.faggioli,
	george.dunlap, xen-devel

On 10/05/2015 03:18 PM, George Dunlap wrote:
> On 02/10/15 05:40, Juergen Gross wrote:
>> Use a bit mask for testing of a set bit instead of test_bit in case no
>> atomic operation is needed, as this will lead to smaller and more
>> effective code.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>
> I'm a bit confused here -- exactly when is an atomic operation needed or
> not needed?  Isn't it the case that we always need to do an atomic read
> if we ever do an atomic write without a lock held?

If we do the write without lock doing an atomic read is required
only if there is some other kind of serialization between both
involved cpus, as otherwise things would go wrong if the write is
happening just after the (non-atomic) read.

Same applies to reads without lock: the write could happen just
afterwards, so such coding can serve only as an optimization, not as
a guarantee for correctness. TBH: I think there are some places
where testing of a flag could just be omitted, as the value will
nearly always be the one expected just from a local point of view.
Example: vcpu_force_reschedule().

What must be ruled out is some compiler optimization leading to
an omitted read. I hope I haven't introduced such a case.


Juergen

>
> For example, xen/common/schedule.c:vcpu_unblock(), vcpu_block(),
> do_poll; xen/common/event_channel.c:defaultxen_notification_fn(), and
> many more?
>
>   -George
>
>> ---
>>   xen/arch/x86/domctl.c  |  2 +-
>>   xen/arch/x86/hvm/hvm.c |  4 ++--
>>   xen/arch/x86/hvm/vpt.c |  2 +-
>>   xen/common/domain.c    |  4 ++--
>>   xen/common/domctl.c    |  8 ++++----
>>   xen/common/schedule.c  | 16 ++++++++--------
>>   6 files changed, 18 insertions(+), 18 deletions(-)
>>
>> diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
>> index 6172c0d..f8a559c 100644
>> --- a/xen/arch/x86/domctl.c
>> +++ b/xen/arch/x86/domctl.c
>> @@ -1213,7 +1213,7 @@ void arch_get_info_guest(struct vcpu *v, vcpu_guest_context_u c)
>>       c(flags = v->arch.vgc_flags & ~(VGCF_i387_valid|VGCF_in_kernel));
>>       if ( v->fpu_initialised )
>>           c(flags |= VGCF_i387_valid);
>> -    if ( !test_bit(_VPF_down, &v->pause_flags) )
>> +    if ( !(v->pause_flags & VPF_down) )
>>           c(flags |= VGCF_online);
>>       if ( !compat )
>>       {
>> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
>> index 6afc344..3fa2280 100644
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -1728,7 +1728,7 @@ static int hvm_save_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
>>       {
>>           /* We don't need to save state for a vcpu that is down; the restore
>>            * code will leave it down if there is nothing saved. */
>> -        if ( test_bit(_VPF_down, &v->pause_flags) )
>> +        if ( v->pause_flags & VPF_down )
>>               continue;
>>
>>           /* Architecture-specific vmcs/vmcb bits */
>> @@ -2512,7 +2512,7 @@ void hvm_vcpu_down(struct vcpu *v)
>>       /* Any other VCPUs online? ... */
>>       domain_lock(d);
>>       for_each_vcpu ( d, v )
>> -        if ( !test_bit(_VPF_down, &v->pause_flags) )
>> +        if ( !(v->pause_flags & VPF_down) )
>>               online_count++;
>>       domain_unlock(d);
>>
>> diff --git a/xen/arch/x86/hvm/vpt.c b/xen/arch/x86/hvm/vpt.c
>> index 0c8b22e..4fa6566 100644
>> --- a/xen/arch/x86/hvm/vpt.c
>> +++ b/xen/arch/x86/hvm/vpt.c
>> @@ -178,7 +178,7 @@ void pt_save_timer(struct vcpu *v)
>>       struct list_head *head = &v->arch.hvm_vcpu.tm_list;
>>       struct periodic_time *pt;
>>
>> -    if ( test_bit(_VPF_blocked, &v->pause_flags) )
>> +    if ( v->pause_flags & VPF_blocked )
>>           return;
>>
>>       spin_lock(&v->arch.hvm_vcpu.tm_lock);
>> diff --git a/xen/common/domain.c b/xen/common/domain.c
>> index cda60a9..7c362eb 100644
>> --- a/xen/common/domain.c
>> +++ b/xen/common/domain.c
>> @@ -1135,7 +1135,7 @@ int map_vcpu_info(struct vcpu *v, unsigned long gfn, unsigned offset)
>>           return -EINVAL;
>>
>>       /* Run this command on yourself or on other offline VCPUS. */
>> -    if ( (v != current) && !test_bit(_VPF_down, &v->pause_flags) )
>> +    if ( (v != current) && !(v->pause_flags & VPF_down) )
>>           return -EINVAL;
>>
>>       page = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC);
>> @@ -1263,7 +1263,7 @@ long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
>>           break;
>>
>>       case VCPUOP_is_up:
>> -        rc = !test_bit(_VPF_down, &v->pause_flags);
>> +        rc = !(v->pause_flags & VPF_down);
>>           break;
>>
>>       case VCPUOP_get_runstate_info:
>> diff --git a/xen/common/domctl.c b/xen/common/domctl.c
>> index 08de32d..46b967e 100644
>> --- a/xen/common/domctl.c
>> +++ b/xen/common/domctl.c
>> @@ -170,7 +170,7 @@ void getdomaininfo(struct domain *d, struct xen_domctl_getdomaininfo *info)
>>           vcpu_runstate_get(v, &runstate);
>>           cpu_time += runstate.time[RUNSTATE_running];
>>           info->max_vcpu_id = v->vcpu_id;
>> -        if ( !test_bit(_VPF_down, &v->pause_flags) )
>> +        if ( !(v->pause_flags & VPF_down) )
>>           {
>>               if ( !(v->pause_flags & VPF_blocked) )
>>                   flags &= ~XEN_DOMINF_blocked;
>> @@ -231,7 +231,7 @@ static unsigned int default_vcpu0_location(cpumask_t *online)
>>           rcu_read_lock(&domlist_read_lock);
>>           for_each_domain ( d )
>>               for_each_vcpu ( d, v )
>> -                if ( !test_bit(_VPF_down, &v->pause_flags)
>> +                if ( !(v->pause_flags & VPF_down)
>>                        && ((cpu = v->processor) < nr_cpus) )
>>                       cnt[cpu]++;
>>           rcu_read_unlock(&domlist_read_lock);
>> @@ -944,8 +944,8 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
>>
>>           vcpu_runstate_get(v, &runstate);
>>
>> -        op->u.getvcpuinfo.online   = !test_bit(_VPF_down, &v->pause_flags);
>> -        op->u.getvcpuinfo.blocked  = test_bit(_VPF_blocked, &v->pause_flags);
>> +        op->u.getvcpuinfo.online   = !(v->pause_flags & VPF_down);
>> +        op->u.getvcpuinfo.blocked  = !!(v->pause_flags & VPF_blocked);
>>           op->u.getvcpuinfo.running  = v->is_running;
>>           op->u.getvcpuinfo.cpu_time = runstate.time[RUNSTATE_running];
>>           op->u.getvcpuinfo.cpu      = v->processor;
>> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
>> index 5ffa1a1..c5f640f 100644
>> --- a/xen/common/schedule.c
>> +++ b/xen/common/schedule.c
>> @@ -119,7 +119,7 @@ static inline void vcpu_urgent_count_update(struct vcpu *v)
>>
>>       if ( unlikely(v->is_urgent) )
>>       {
>> -        if ( !test_bit(_VPF_blocked, &v->pause_flags) ||
>> +        if ( !(v->pause_flags & VPF_blocked) ||
>>                !test_bit(v->vcpu_id, v->domain->poll_mask) )
>>           {
>>               v->is_urgent = 0;
>> @@ -128,8 +128,8 @@ static inline void vcpu_urgent_count_update(struct vcpu *v)
>>       }
>>       else
>>       {
>> -        if ( unlikely(test_bit(_VPF_blocked, &v->pause_flags) &&
>> -                      test_bit(v->vcpu_id, v->domain->poll_mask)) )
>> +        if ( unlikely(v->pause_flags & VPF_blocked) &&
>> +             unlikely(test_bit(v->vcpu_id, v->domain->poll_mask)) )
>>           {
>>               v->is_urgent = 1;
>>               atomic_inc(&per_cpu(schedule_data,v->processor).urgent_count);
>> @@ -418,7 +418,7 @@ void vcpu_wake(struct vcpu *v)
>>               vcpu_runstate_change(v, RUNSTATE_runnable, NOW());
>>           SCHED_OP(VCPU2OP(v), wake, v);
>>       }
>> -    else if ( !test_bit(_VPF_blocked, &v->pause_flags) )
>> +    else if ( !(v->pause_flags & VPF_blocked) )
>>       {
>>           if ( v->runstate.state == RUNSTATE_blocked )
>>               vcpu_runstate_change(v, RUNSTATE_offline, NOW());
>> @@ -595,7 +595,7 @@ void vcpu_force_reschedule(struct vcpu *v)
>>           set_bit(_VPF_migrating, &v->pause_flags);
>>       vcpu_schedule_unlock_irq(lock, v);
>>
>> -    if ( test_bit(_VPF_migrating, &v->pause_flags) )
>> +    if ( v->pause_flags & VPF_migrating )
>>       {
>>           vcpu_sleep_nosync(v);
>>           vcpu_migrate(v);
>> @@ -763,7 +763,7 @@ static int vcpu_set_affinity(
>>
>>       domain_update_node_affinity(v->domain);
>>
>> -    if ( test_bit(_VPF_migrating, &v->pause_flags) )
>> +    if ( v->pause_flags & VPF_migrating )
>>       {
>>           vcpu_sleep_nosync(v);
>>           vcpu_migrate(v);
>> @@ -1285,7 +1285,7 @@ static void schedule(void)
>>
>>       vcpu_runstate_change(
>>           prev,
>> -        (test_bit(_VPF_blocked, &prev->pause_flags) ? RUNSTATE_blocked :
>> +        ((prev->pause_flags & VPF_blocked) ? RUNSTATE_blocked :
>>            (vcpu_runnable(prev) ? RUNSTATE_runnable : RUNSTATE_offline)),
>>           now);
>>       prev->last_run_time = now;
>> @@ -1327,7 +1327,7 @@ void context_saved(struct vcpu *prev)
>>
>>       SCHED_OP(VCPU2OP(prev), context_saved, prev);
>>
>> -    if ( unlikely(test_bit(_VPF_migrating, &prev->pause_flags)) )
>> +    if ( unlikely(prev->pause_flags & VPF_migrating) )
>>           vcpu_migrate(prev);
>>   }
>>
>>
>
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/5] xen: use masking operation instead of test_bit for VPF bits
  2015-10-02  4:40 ` [PATCH 4/5] xen: use masking operation instead of test_bit for VPF bits Juergen Gross
  2015-10-05 13:18   ` George Dunlap
@ 2015-10-05 13:24   ` George Dunlap
  2015-10-05 13:40     ` Jan Beulich
  1 sibling, 1 reply; 23+ messages in thread
From: George Dunlap @ 2015-10-05 13:24 UTC (permalink / raw)
  To: Juergen Gross, keir, jbeulich, andrew.cooper3, dario.faggioli,
	george.dunlap, xen-devel

On 02/10/15 05:40, Juergen Gross wrote:
> Use a bit mask for testing of a set bit instead of test_bit in case no
> atomic operation is needed, as this will lead to smaller and more
> effective code.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>

On a side note, a handful of functions seem to access the vcpu structure
with the *domain* lock held, rather than the *scheduler* lock; for example:

xen/arch/x86/hvm/vlapic.c:vlapic_accept_irq()
xen/common/domain.c:vcpu_reset()

Is this a bug (perhaps left over from when we used the domain lock for
vcpus rather than the scheduler lock)?

 -George

> ---
>  xen/arch/x86/domctl.c  |  2 +-
>  xen/arch/x86/hvm/hvm.c |  4 ++--
>  xen/arch/x86/hvm/vpt.c |  2 +-
>  xen/common/domain.c    |  4 ++--
>  xen/common/domctl.c    |  8 ++++----
>  xen/common/schedule.c  | 16 ++++++++--------
>  6 files changed, 18 insertions(+), 18 deletions(-)
> 
> diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
> index 6172c0d..f8a559c 100644
> --- a/xen/arch/x86/domctl.c
> +++ b/xen/arch/x86/domctl.c
> @@ -1213,7 +1213,7 @@ void arch_get_info_guest(struct vcpu *v, vcpu_guest_context_u c)
>      c(flags = v->arch.vgc_flags & ~(VGCF_i387_valid|VGCF_in_kernel));
>      if ( v->fpu_initialised )
>          c(flags |= VGCF_i387_valid);
> -    if ( !test_bit(_VPF_down, &v->pause_flags) )
> +    if ( !(v->pause_flags & VPF_down) )
>          c(flags |= VGCF_online);
>      if ( !compat )
>      {
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 6afc344..3fa2280 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -1728,7 +1728,7 @@ static int hvm_save_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
>      {
>          /* We don't need to save state for a vcpu that is down; the restore 
>           * code will leave it down if there is nothing saved. */
> -        if ( test_bit(_VPF_down, &v->pause_flags) ) 
> +        if ( v->pause_flags & VPF_down )
>              continue;
>  
>          /* Architecture-specific vmcs/vmcb bits */
> @@ -2512,7 +2512,7 @@ void hvm_vcpu_down(struct vcpu *v)
>      /* Any other VCPUs online? ... */
>      domain_lock(d);
>      for_each_vcpu ( d, v )
> -        if ( !test_bit(_VPF_down, &v->pause_flags) )
> +        if ( !(v->pause_flags & VPF_down) )
>              online_count++;
>      domain_unlock(d);
>  
> diff --git a/xen/arch/x86/hvm/vpt.c b/xen/arch/x86/hvm/vpt.c
> index 0c8b22e..4fa6566 100644
> --- a/xen/arch/x86/hvm/vpt.c
> +++ b/xen/arch/x86/hvm/vpt.c
> @@ -178,7 +178,7 @@ void pt_save_timer(struct vcpu *v)
>      struct list_head *head = &v->arch.hvm_vcpu.tm_list;
>      struct periodic_time *pt;
>  
> -    if ( test_bit(_VPF_blocked, &v->pause_flags) )
> +    if ( v->pause_flags & VPF_blocked )
>          return;
>  
>      spin_lock(&v->arch.hvm_vcpu.tm_lock);
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index cda60a9..7c362eb 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -1135,7 +1135,7 @@ int map_vcpu_info(struct vcpu *v, unsigned long gfn, unsigned offset)
>          return -EINVAL;
>  
>      /* Run this command on yourself or on other offline VCPUS. */
> -    if ( (v != current) && !test_bit(_VPF_down, &v->pause_flags) )
> +    if ( (v != current) && !(v->pause_flags & VPF_down) )
>          return -EINVAL;
>  
>      page = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC);
> @@ -1263,7 +1263,7 @@ long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
>          break;
>  
>      case VCPUOP_is_up:
> -        rc = !test_bit(_VPF_down, &v->pause_flags);
> +        rc = !(v->pause_flags & VPF_down);
>          break;
>  
>      case VCPUOP_get_runstate_info:
> diff --git a/xen/common/domctl.c b/xen/common/domctl.c
> index 08de32d..46b967e 100644
> --- a/xen/common/domctl.c
> +++ b/xen/common/domctl.c
> @@ -170,7 +170,7 @@ void getdomaininfo(struct domain *d, struct xen_domctl_getdomaininfo *info)
>          vcpu_runstate_get(v, &runstate);
>          cpu_time += runstate.time[RUNSTATE_running];
>          info->max_vcpu_id = v->vcpu_id;
> -        if ( !test_bit(_VPF_down, &v->pause_flags) )
> +        if ( !(v->pause_flags & VPF_down) )
>          {
>              if ( !(v->pause_flags & VPF_blocked) )
>                  flags &= ~XEN_DOMINF_blocked;
> @@ -231,7 +231,7 @@ static unsigned int default_vcpu0_location(cpumask_t *online)
>          rcu_read_lock(&domlist_read_lock);
>          for_each_domain ( d )
>              for_each_vcpu ( d, v )
> -                if ( !test_bit(_VPF_down, &v->pause_flags)
> +                if ( !(v->pause_flags & VPF_down)
>                       && ((cpu = v->processor) < nr_cpus) )
>                      cnt[cpu]++;
>          rcu_read_unlock(&domlist_read_lock);
> @@ -944,8 +944,8 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
>  
>          vcpu_runstate_get(v, &runstate);
>  
> -        op->u.getvcpuinfo.online   = !test_bit(_VPF_down, &v->pause_flags);
> -        op->u.getvcpuinfo.blocked  = test_bit(_VPF_blocked, &v->pause_flags);
> +        op->u.getvcpuinfo.online   = !(v->pause_flags & VPF_down);
> +        op->u.getvcpuinfo.blocked  = !!(v->pause_flags & VPF_blocked);
>          op->u.getvcpuinfo.running  = v->is_running;
>          op->u.getvcpuinfo.cpu_time = runstate.time[RUNSTATE_running];
>          op->u.getvcpuinfo.cpu      = v->processor;
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index 5ffa1a1..c5f640f 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -119,7 +119,7 @@ static inline void vcpu_urgent_count_update(struct vcpu *v)
>  
>      if ( unlikely(v->is_urgent) )
>      {
> -        if ( !test_bit(_VPF_blocked, &v->pause_flags) ||
> +        if ( !(v->pause_flags & VPF_blocked) ||
>               !test_bit(v->vcpu_id, v->domain->poll_mask) )
>          {
>              v->is_urgent = 0;
> @@ -128,8 +128,8 @@ static inline void vcpu_urgent_count_update(struct vcpu *v)
>      }
>      else
>      {
> -        if ( unlikely(test_bit(_VPF_blocked, &v->pause_flags) &&
> -                      test_bit(v->vcpu_id, v->domain->poll_mask)) )
> +        if ( unlikely(v->pause_flags & VPF_blocked) &&
> +             unlikely(test_bit(v->vcpu_id, v->domain->poll_mask)) )
>          {
>              v->is_urgent = 1;
>              atomic_inc(&per_cpu(schedule_data,v->processor).urgent_count);
> @@ -418,7 +418,7 @@ void vcpu_wake(struct vcpu *v)
>              vcpu_runstate_change(v, RUNSTATE_runnable, NOW());
>          SCHED_OP(VCPU2OP(v), wake, v);
>      }
> -    else if ( !test_bit(_VPF_blocked, &v->pause_flags) )
> +    else if ( !(v->pause_flags & VPF_blocked) )
>      {
>          if ( v->runstate.state == RUNSTATE_blocked )
>              vcpu_runstate_change(v, RUNSTATE_offline, NOW());
> @@ -595,7 +595,7 @@ void vcpu_force_reschedule(struct vcpu *v)
>          set_bit(_VPF_migrating, &v->pause_flags);
>      vcpu_schedule_unlock_irq(lock, v);
>  
> -    if ( test_bit(_VPF_migrating, &v->pause_flags) )
> +    if ( v->pause_flags & VPF_migrating )
>      {
>          vcpu_sleep_nosync(v);
>          vcpu_migrate(v);
> @@ -763,7 +763,7 @@ static int vcpu_set_affinity(
>  
>      domain_update_node_affinity(v->domain);
>  
> -    if ( test_bit(_VPF_migrating, &v->pause_flags) )
> +    if ( v->pause_flags & VPF_migrating )
>      {
>          vcpu_sleep_nosync(v);
>          vcpu_migrate(v);
> @@ -1285,7 +1285,7 @@ static void schedule(void)
>  
>      vcpu_runstate_change(
>          prev,
> -        (test_bit(_VPF_blocked, &prev->pause_flags) ? RUNSTATE_blocked :
> +        ((prev->pause_flags & VPF_blocked) ? RUNSTATE_blocked :
>           (vcpu_runnable(prev) ? RUNSTATE_runnable : RUNSTATE_offline)),
>          now);
>      prev->last_run_time = now;
> @@ -1327,7 +1327,7 @@ void context_saved(struct vcpu *prev)
>  
>      SCHED_OP(VCPU2OP(prev), context_saved, prev);
>  
> -    if ( unlikely(test_bit(_VPF_migrating, &prev->pause_flags)) )
> +    if ( unlikely(prev->pause_flags & VPF_migrating) )
>          vcpu_migrate(prev);
>  }
>  
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/5] xen: use masking operation instead of test_bit for VPF bits
  2015-10-05 13:24   ` George Dunlap
@ 2015-10-05 13:40     ` Jan Beulich
  0 siblings, 0 replies; 23+ messages in thread
From: Jan Beulich @ 2015-10-05 13:40 UTC (permalink / raw)
  To: George Dunlap
  Cc: Juergen Gross, keir, george.dunlap, andrew.cooper3,
	dario.faggioli, xen-devel

>>> On 05.10.15 at 15:24, <george.dunlap@citrix.com> wrote:
> On 02/10/15 05:40, Juergen Gross wrote:
>> Use a bit mask for testing of a set bit instead of test_bit in case no
>> atomic operation is needed, as this will lead to smaller and more
>> effective code.
>> 
>> Signed-off-by: Juergen Gross <jgross@suse.com>
> 
> On a side note, a handful of functions seem to access the vcpu structure
> with the *domain* lock held, rather than the *scheduler* lock; for example:
> 
> xen/arch/x86/hvm/vlapic.c:vlapic_accept_irq()
> xen/common/domain.c:vcpu_reset()

Why would the scheduler lock be needed there? In both cases it would
- at a first glance - seem as if the domain lock isn't needed there.

> Is this a bug (perhaps left over from when we used the domain lock for
> vcpus rather than the scheduler lock)?

Not completely impossible there is too little locking now, but generally
I don't think all vcpu structure accesses need to happen under one
single lock.

Jan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 5/5] xen: use masking operation instead of test_bit for MCSF bits
  2015-10-02  4:40 [PATCH 0/5] use mask operations instead of test_bit() Juergen Gross
                   ` (3 preceding siblings ...)
  2015-10-02  4:40 ` [PATCH 4/5] xen: use masking operation instead of test_bit for VPF bits Juergen Gross
@ 2015-10-02  4:40 ` Juergen Gross
  2015-10-02  9:03 ` [PATCH 0/5] use mask operations instead of test_bit() Dario Faggioli
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Juergen Gross @ 2015-10-02  4:40 UTC (permalink / raw)
  To: keir, jbeulich, andrew.cooper3, dario.faggioli, george.dunlap,
	xen-devel
  Cc: Juergen Gross

Use a bit mask for testing of a set bit instead of test_bit in case no
atomic operation is needed, as this will lead to smaller and more
effective code.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/arch/x86/domain.c           | 8 ++++----
 xen/arch/x86/x86_64/compat/mm.c | 4 ++--
 xen/common/multicall.c          | 2 +-
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 4e96f6c..7ca9b93 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1750,7 +1750,7 @@ void hypercall_cancel_continuation(void)
     struct cpu_user_regs *regs = guest_cpu_user_regs();
     struct mc_state *mcs = &current->mc_state;
 
-    if ( test_bit(_MCSF_in_multicall, &mcs->flags) )
+    if ( mcs->flags & MCSF_in_multicall )
     {
         __clear_bit(_MCSF_call_preempted, &mcs->flags);
     }
@@ -1774,7 +1774,7 @@ unsigned long hypercall_create_continuation(
 
     va_start(args, format);
 
-    if ( test_bit(_MCSF_in_multicall, &mcs->flags) )
+    if ( mcs->flags & MCSF_in_multicall )
     {
         __set_bit(_MCSF_call_preempted, &mcs->flags);
 
@@ -1852,9 +1852,9 @@ int hypercall_xlat_continuation(unsigned int *id, unsigned int nr,
 
     va_start(args, mask);
 
-    if ( test_bit(_MCSF_in_multicall, &mcs->flags) )
+    if ( mcs->flags & MCSF_in_multicall )
     {
-        if ( !test_bit(_MCSF_call_preempted, &mcs->flags) )
+        if ( !(mcs->flags & MCSF_call_preempted) )
         {
             va_end(args);
             return 0;
diff --git a/xen/arch/x86/x86_64/compat/mm.c b/xen/arch/x86/x86_64/compat/mm.c
index d034bd0..178e42d 100644
--- a/xen/arch/x86/x86_64/compat/mm.c
+++ b/xen/arch/x86/x86_64/compat/mm.c
@@ -324,7 +324,7 @@ int compat_mmuext_op(XEN_GUEST_HANDLE_PARAM(mmuext_op_compat_t) cmp_uops,
             {
                 struct cpu_user_regs *regs = guest_cpu_user_regs();
                 struct mc_state *mcs = &current->mc_state;
-                unsigned int arg1 = !test_bit(_MCSF_in_multicall, &mcs->flags)
+                unsigned int arg1 = !(mcs->flags & MCSF_in_multicall)
                                     ? regs->ecx
                                     : mcs->call.args[1];
                 unsigned int left = arg1 & ~MMU_UPDATE_PREEMPTED;
@@ -338,7 +338,7 @@ int compat_mmuext_op(XEN_GUEST_HANDLE_PARAM(mmuext_op_compat_t) cmp_uops,
                 {
                     BUG_ON(!hypercall_xlat_continuation(&left, 4, 0x01, nat_ops,
                                                         cmp_uops));
-                    if ( !test_bit(_MCSF_in_multicall, &mcs->flags) )
+                    if ( !(mcs->flags & MCSF_in_multicall) )
                         regs->_ecx += count - i;
                     else
                         mcs->compat_call.args[1] += count - i;
diff --git a/xen/common/multicall.c b/xen/common/multicall.c
index fa9d910..21661ee 100644
--- a/xen/common/multicall.c
+++ b/xen/common/multicall.c
@@ -79,7 +79,7 @@ do_multicall(
 
         if ( unlikely(__copy_field_to_guest(call_list, &mcs->call, result)) )
             rc = -EFAULT;
-        else if ( test_bit(_MCSF_call_preempted, &mcs->flags) )
+        else if ( mcs->flags & MCSF_call_preempted )
         {
             /* Translate sub-call continuation to guest layout */
             xlat_multicall_entry(mcs);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/5] use mask operations instead of test_bit()
  2015-10-02  4:40 [PATCH 0/5] use mask operations instead of test_bit() Juergen Gross
                   ` (4 preceding siblings ...)
  2015-10-02  4:40 ` [PATCH 5/5] xen: use masking operation instead of test_bit for MCSF bits Juergen Gross
@ 2015-10-02  9:03 ` Dario Faggioli
  2015-10-02  9:10   ` Juergen Gross
  2015-10-02  9:44 ` Jan Beulich
  2015-10-02  9:47 ` Andrew Cooper
  7 siblings, 1 reply; 23+ messages in thread
From: Dario Faggioli @ 2015-10-02  9:03 UTC (permalink / raw)
  To: Juergen Gross; +Cc: george.dunlap, andrew.cooper3, keir, jbeulich, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1309 bytes --]

On Fri, 2015-10-02 at 06:40 +0200, Juergen Gross wrote:
> Instead of using test_bit() which is an atomic operation and limits
> the compiler's choices to do optimization, use logical ANDs with
> bitmasks where possible.
> 
That's a good idea, I think.

> The possible candidates have been detected by searching definitions
> of bitmasks in the form:
> 
> #define MASK  1 << _MASK
> 
Right.

> On x86 the resulting code is slightly smaller (about 2 bytes for each
> case, checked via disassembly in few examples).
> 
> I'm quite sure I didn't replace a test_bit() call required to be
> atomic, but I'd be grateful for a thorough review especially in the
> scheduler.
> 
I'll have a deep look.

One question, can we introduce a __test_bit() macro/inline function,
like Jan did with __set_bit?

I've quickly-&-dirtily tested this:

#define __test_bit(nr, addr) ({  \
    unsigned _flags = 1 << nr;   \
    addr & _flags; \
})

and the result (I've checked a couple of cases) seems the same to me.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/5] use mask operations instead of test_bit()
  2015-10-02  9:03 ` [PATCH 0/5] use mask operations instead of test_bit() Dario Faggioli
@ 2015-10-02  9:10   ` Juergen Gross
  2015-10-02  9:33     ` Dario Faggioli
  0 siblings, 1 reply; 23+ messages in thread
From: Juergen Gross @ 2015-10-02  9:10 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: george.dunlap, andrew.cooper3, keir, jbeulich, xen-devel

On 10/02/2015 11:03 AM, Dario Faggioli wrote:
> On Fri, 2015-10-02 at 06:40 +0200, Juergen Gross wrote:
>> Instead of using test_bit() which is an atomic operation and limits
>> the compiler's choices to do optimization, use logical ANDs with
>> bitmasks where possible.
>>
> That's a good idea, I think.

It's a fallout from a cleanup patch:

http://lists.xen.org/archives/html/xen-devel/2015-09/msg03184.html

>
>> The possible candidates have been detected by searching definitions
>> of bitmasks in the form:
>>
>> #define MASK  1 << _MASK
>>
> Right.
>
>> On x86 the resulting code is slightly smaller (about 2 bytes for each
>> case, checked via disassembly in few examples).
>>
>> I'm quite sure I didn't replace a test_bit() call required to be
>> atomic, but I'd be grateful for a thorough review especially in the
>> scheduler.
>>
> I'll have a deep look.

Thanks.

> One question, can we introduce a __test_bit() macro/inline function,
> like Jan did with __set_bit?

In the thread mentioned above you'll find a discussion about exactly
this idea between Jan and me.

> I've quickly-&-dirtily tested this:
>
> #define __test_bit(nr, addr) ({  \
>      unsigned _flags = 1 << nr;   \
>      addr & _flags; \
> })
>
> and the result (I've checked a couple of cases) seems the same to me.

The problem is the limited scope where this scheme is really working and
is a better solution at the same time (nr must be a constant less than
the numbers of bits of *addr).


Juergen

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/5] use mask operations instead of test_bit()
  2015-10-02  9:10   ` Juergen Gross
@ 2015-10-02  9:33     ` Dario Faggioli
  0 siblings, 0 replies; 23+ messages in thread
From: Dario Faggioli @ 2015-10-02  9:33 UTC (permalink / raw)
  To: Juergen Gross; +Cc: george.dunlap, andrew.cooper3, keir, jbeulich, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1387 bytes --]

On Fri, 2015-10-02 at 11:10 +0200, Juergen Gross wrote:
> On 10/02/2015 11:03 AM, Dario Faggioli wrote:

> > That's a good idea, I think.
> 
> It's a fallout from a cleanup patch:
> 
> http://lists.xen.org/archives/html/xen-devel/2015-09/msg03184.html
> 

> > One question, can we introduce a __test_bit() macro/inline
> > function,
> > like Jan did with __set_bit?
> 
> In the thread mentioned above you'll find a discussion about exactly
> this idea between Jan and me.
> 
Ah, I see. Sorry, I missed it.

> > I've quickly-&-dirtily tested this:
> > 
> > #define __test_bit(nr, addr) ({  \
> >      unsigned _flags = 1 << nr;   \
> >      addr & _flags; \
> > })
> > 
> > and the result (I've checked a couple of cases) seems the same to
> > me.
> 
> The problem is the limited scope where this scheme is really working
> and
> is a better solution at the same time (nr must be a constant less
> than
> the numbers of bits of *addr).
> 
Right, so we could only use __test_bit() in a subset of the cases,
i.e., we won't gain that much more consistency... I see it now.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/5] use mask operations instead of test_bit()
  2015-10-02  4:40 [PATCH 0/5] use mask operations instead of test_bit() Juergen Gross
                   ` (5 preceding siblings ...)
  2015-10-02  9:03 ` [PATCH 0/5] use mask operations instead of test_bit() Dario Faggioli
@ 2015-10-02  9:44 ` Jan Beulich
  2015-10-02  9:47 ` Andrew Cooper
  7 siblings, 0 replies; 23+ messages in thread
From: Jan Beulich @ 2015-10-02  9:44 UTC (permalink / raw)
  To: Juergen Gross
  Cc: george.dunlap, andrew.cooper3, dario.faggioli, keir, xen-devel

>>> Juergen Gross <jgross@suse.com> 10/02/15 6:40 AM >>>
>Juergen Gross (5):
>xen: use masking operation instead of test_bit for RTDS bits
>xen: use masking operation instead of test_bit for CSFLAG bits
>xen: use masking operation instead of test_bit for VGCF bits
>xen: use masking operation instead of test_bit for VPF bits
>xen: use masking operation instead of test_bit for MCSF bits

Reviewed-by: Jan Beulich <jbeulich@suse.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/5] use mask operations instead of test_bit()
  2015-10-02  4:40 [PATCH 0/5] use mask operations instead of test_bit() Juergen Gross
                   ` (6 preceding siblings ...)
  2015-10-02  9:44 ` Jan Beulich
@ 2015-10-02  9:47 ` Andrew Cooper
  7 siblings, 0 replies; 23+ messages in thread
From: Andrew Cooper @ 2015-10-02  9:47 UTC (permalink / raw)
  To: Juergen Gross, keir, jbeulich, dario.faggioli, george.dunlap,
	xen-devel

On 02/10/15 05:40, Juergen Gross wrote:
> Instead of using test_bit() which is an atomic operation and limits
> the compiler's choices to do optimization, use logical ANDs with
> bitmasks where possible.
>
> The possible candidates have been detected by searching definitions
> of bitmasks in the form:
>
> #define MASK  1 << _MASK
>
> On x86 the resulting code is slightly smaller (about 2 bytes for each
> case, checked via disassembly in few examples).
>
> I'm quite sure I didn't replace a test_bit() call required to be
> atomic, but I'd be grateful for a thorough review especially in the
> scheduler.
>
>
> Juergen Gross (5):
>   xen: use masking operation instead of test_bit for RTDS bits
>   xen: use masking operation instead of test_bit for CSFLAG bits
>   xen: use masking operation instead of test_bit for VGCF bits
>   xen: use masking operation instead of test_bit for VPF bits
>   xen: use masking operation instead of test_bit for MCSF bits

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2015-10-05 14:31 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-02  4:40 [PATCH 0/5] use mask operations instead of test_bit() Juergen Gross
2015-10-02  4:40 ` [PATCH 1/5] xen: use masking operation instead of test_bit for RTDS bits Juergen Gross
2015-10-02 10:21   ` Dario Faggioli
2015-10-05 13:30   ` George Dunlap
2015-10-02  4:40 ` [PATCH 2/5] xen: use masking operation instead of test_bit for CSFLAG bits Juergen Gross
2015-10-02 10:45   ` Dario Faggioli
2015-10-05 13:30   ` George Dunlap
2015-10-02  4:40 ` [PATCH 3/5] xen: use masking operation instead of test_bit for VGCF bits Juergen Gross
2015-10-02  4:40 ` [PATCH 4/5] xen: use masking operation instead of test_bit for VPF bits Juergen Gross
2015-10-05 13:18   ` George Dunlap
2015-10-05 13:36     ` Jan Beulich
2015-10-05 13:45       ` George Dunlap
2015-10-05 14:05         ` Jan Beulich
2015-10-05 14:31           ` George Dunlap
2015-10-05 13:39     ` Juergen Gross
2015-10-05 13:24   ` George Dunlap
2015-10-05 13:40     ` Jan Beulich
2015-10-02  4:40 ` [PATCH 5/5] xen: use masking operation instead of test_bit for MCSF bits Juergen Gross
2015-10-02  9:03 ` [PATCH 0/5] use mask operations instead of test_bit() Dario Faggioli
2015-10-02  9:10   ` Juergen Gross
2015-10-02  9:33     ` Dario Faggioli
2015-10-02  9:44 ` Jan Beulich
2015-10-02  9:47 ` Andrew Cooper

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).