[PATCH 0/2] xen/x86: cmpxchg cleanup

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] xen/x86: cmpxchg cleanup
@ 2024-02-22  9:05 Roger Pau Monne
  2024-02-22  9:05 ` [PATCH 1/2] x86/memsharing: use an atomic add instead of a cmpxchg loop Roger Pau Monne
  2024-02-22  9:05 ` [PATCH 2/2] x86/hpet: " Roger Pau Monne
  0 siblings, 2 replies; 17+ messages in thread
From: Roger Pau Monne @ 2024-02-22  9:05 UTC (permalink / raw)
  To: xen-devel
  Cc: Roger Pau Monne, Tamas K Lengyel, Jan Beulich, Andrew Cooper,
	George Dunlap, Wei Liu

Hello,

Following series replace a couple of cmpxchg loops with an atomic inc.
The usage of such loops probably dates back to 32bit support, which
didn't have an instruction to do an atomic 64bit addition.

Thanks, Roger.

Roger Pau Monne (2):
  x86/memsharing: use an atomic add instead of a cmpxchg loop
  x86/hpet: use an atomic add instead of a cmpxchg loop

 xen/arch/x86/hpet.c           | 6 +-----
 xen/arch/x86/mm/mem_sharing.c | 8 +-------
 2 files changed, 2 insertions(+), 12 deletions(-)

-- 
2.43.0



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/2] x86/memsharing: use an atomic add instead of a cmpxchg loop
  2024-02-22  9:05 [PATCH 0/2] xen/x86: cmpxchg cleanup Roger Pau Monne
@ 2024-02-22  9:05 ` Roger Pau Monne
  2024-02-22 10:06   ` Jan Beulich
                     ` (2 more replies)
  2024-02-22  9:05 ` [PATCH 2/2] x86/hpet: " Roger Pau Monne
  1 sibling, 3 replies; 17+ messages in thread
From: Roger Pau Monne @ 2024-02-22  9:05 UTC (permalink / raw)
  To: xen-devel
  Cc: Roger Pau Monne, Tamas K Lengyel, Jan Beulich, Andrew Cooper,
	George Dunlap, Wei Liu

The usage of a cmpxchg loop in get_next_handle() is unnecessary, as the same
can be achieved with an atomic increment, which is both simpler to read, and
avoid any need for a loop.

The cmpxchg usage is likely a remnant of 32bit support, which didn't have an
instruction to do an atomic 64bit add, and instead a cmpxchg had to be used.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-of-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/mm/mem_sharing.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index 4f810706a315..fe299a2bf9aa 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -179,13 +179,7 @@ static void mem_sharing_page_unlock(struct page_info *pg)
 
 static shr_handle_t get_next_handle(void)
 {
-    /* Get the next handle get_page style */
-    uint64_t x, y = next_handle;
-    do {
-        x = y;
-    }
-    while ( (y = cmpxchg(&next_handle, x, x + 1)) != x );
-    return x + 1;
+    return arch_fetch_and_add(&next_handle, 1) + 1;
 }
 
 static atomic_t nr_saved_mfns   = ATOMIC_INIT(0);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/2] x86/hpet: use an atomic add instead of a cmpxchg loop
  2024-02-22  9:05 [PATCH 0/2] xen/x86: cmpxchg cleanup Roger Pau Monne
  2024-02-22  9:05 ` [PATCH 1/2] x86/memsharing: use an atomic add instead of a cmpxchg loop Roger Pau Monne
@ 2024-02-22  9:05 ` Roger Pau Monne
  2024-02-22 10:10   ` Jan Beulich
                     ` (2 more replies)
  1 sibling, 3 replies; 17+ messages in thread
From: Roger Pau Monne @ 2024-02-22  9:05 UTC (permalink / raw)
  To: xen-devel; +Cc: Roger Pau Monne, Jan Beulich, Andrew Cooper, Wei Liu

The usage of a cmpxchg loop in hpet_get_channel() is unnecessary, as the same
can be achieved with an atomic increment, which is both simpler to read, and
avoid any need for a loop.

Note there can be a small divergence in the channel returned if next_channel
overflows, but returned channel will always be in the [0, num_hpets_used)
range, and that's fine for the purpose of balancing HPET channels across CPUs.
This is also theoretical, as there's no system currently with 2^32 CPUs (as
long as next_channel is 32bit width).

Signed-of-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/hpet.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/xen/arch/x86/hpet.c b/xen/arch/x86/hpet.c
index d982b0f6b2c9..4777dc859d96 100644
--- a/xen/arch/x86/hpet.c
+++ b/xen/arch/x86/hpet.c
@@ -458,11 +458,7 @@ static struct hpet_event_channel *hpet_get_channel(unsigned int cpu)
     if ( num_hpets_used >= nr_cpu_ids )
         return &hpet_events[cpu];
 
-    do {
-        next = next_channel;
-        if ( (i = next + 1) == num_hpets_used )
-            i = 0;
-    } while ( cmpxchg(&next_channel, next, i) != next );
+    next = arch_fetch_and_add(&next_channel, 1) % num_hpets_used;
 
     /* try unused channel first */
     for ( i = next; i < next + num_hpets_used; i++ )
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] x86/memsharing: use an atomic add instead of a cmpxchg loop
  2024-02-22  9:05 ` [PATCH 1/2] x86/memsharing: use an atomic add instead of a cmpxchg loop Roger Pau Monne
@ 2024-02-22 10:06   ` Jan Beulich
  2024-02-22 18:03     ` Tamas K Lengyel
  2024-02-22 11:12   ` Julien Grall
  2024-02-29  7:37   ` Jan Beulich
  2 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2024-02-22 10:06 UTC (permalink / raw)
  To: Roger Pau Monne, Tamas K Lengyel
  Cc: Andrew Cooper, George Dunlap, Wei Liu, xen-devel

On 22.02.2024 10:05, Roger Pau Monne wrote:
> The usage of a cmpxchg loop in get_next_handle() is unnecessary, as the same
> can be achieved with an atomic increment, which is both simpler to read, and
> avoid any need for a loop.
> 
> The cmpxchg usage is likely a remnant of 32bit support, which didn't have an
> instruction to do an atomic 64bit add, and instead a cmpxchg had to be used.
> 
> Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-of-by: Roger Pau Monné <roger.pau@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
albeit ...

> --- a/xen/arch/x86/mm/mem_sharing.c
> +++ b/xen/arch/x86/mm/mem_sharing.c
> @@ -179,13 +179,7 @@ static void mem_sharing_page_unlock(struct page_info *pg)
>  
>  static shr_handle_t get_next_handle(void)
>  {
> -    /* Get the next handle get_page style */
> -    uint64_t x, y = next_handle;
> -    do {
> -        x = y;
> -    }
> -    while ( (y = cmpxchg(&next_handle, x, x + 1)) != x );
> -    return x + 1;
> +    return arch_fetch_and_add(&next_handle, 1) + 1;
>  }

... the adding of 1 here is a little odd when taken together with
next_handle's initializer. Tamas, you've not written that code, but do
you have any thoughts towards the possible removal of either the
initializer or the adding here? Plus that variable of course could
very well do with moving into this function.

Jan


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] x86/hpet: use an atomic add instead of a cmpxchg loop
  2024-02-22  9:05 ` [PATCH 2/2] x86/hpet: " Roger Pau Monne
@ 2024-02-22 10:10   ` Jan Beulich
  2024-02-22 10:58     ` Roger Pau Monné
  2024-02-22 11:16   ` Julien Grall
  2024-02-26  9:29   ` Jan Beulich
  2 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2024-02-22 10:10 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, Wei Liu, xen-devel

On 22.02.2024 10:05, Roger Pau Monne wrote:
> The usage of a cmpxchg loop in hpet_get_channel() is unnecessary, as the same
> can be achieved with an atomic increment, which is both simpler to read, and
> avoid any need for a loop.
> 
> Note there can be a small divergence in the channel returned if next_channel
> overflows, but returned channel will always be in the [0, num_hpets_used)
> range, and that's fine for the purpose of balancing HPET channels across CPUs.
> This is also theoretical, as there's no system currently with 2^32 CPUs (as
> long as next_channel is 32bit width).

The code change looks good to me, but I fail to see the connection to
2^32 CPUs. So it feels like I'm missing something, in which case I'd
rather not offer any R-b.

Jan

> Signed-of-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
>  xen/arch/x86/hpet.c | 6 +-----
>  1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/xen/arch/x86/hpet.c b/xen/arch/x86/hpet.c
> index d982b0f6b2c9..4777dc859d96 100644
> --- a/xen/arch/x86/hpet.c
> +++ b/xen/arch/x86/hpet.c
> @@ -458,11 +458,7 @@ static struct hpet_event_channel *hpet_get_channel(unsigned int cpu)
>      if ( num_hpets_used >= nr_cpu_ids )
>          return &hpet_events[cpu];
>  
> -    do {
> -        next = next_channel;
> -        if ( (i = next + 1) == num_hpets_used )
> -            i = 0;
> -    } while ( cmpxchg(&next_channel, next, i) != next );
> +    next = arch_fetch_and_add(&next_channel, 1) % num_hpets_used;
>  
>      /* try unused channel first */
>      for ( i = next; i < next + num_hpets_used; i++ )



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] x86/hpet: use an atomic add instead of a cmpxchg loop
  2024-02-22 10:10   ` Jan Beulich
@ 2024-02-22 10:58     ` Roger Pau Monné
  2024-02-22 11:02       ` Jan Beulich
  0 siblings, 1 reply; 17+ messages in thread
From: Roger Pau Monné @ 2024-02-22 10:58 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, xen-devel

On Thu, Feb 22, 2024 at 11:10:54AM +0100, Jan Beulich wrote:
> On 22.02.2024 10:05, Roger Pau Monne wrote:
> > The usage of a cmpxchg loop in hpet_get_channel() is unnecessary, as the same
> > can be achieved with an atomic increment, which is both simpler to read, and
> > avoid any need for a loop.
> > 
> > Note there can be a small divergence in the channel returned if next_channel
> > overflows, but returned channel will always be in the [0, num_hpets_used)
> > range, and that's fine for the purpose of balancing HPET channels across CPUs.
> > This is also theoretical, as there's no system currently with 2^32 CPUs (as
> > long as next_channel is 32bit width).
> 
> The code change looks good to me, but I fail to see the connection to
> 2^32 CPUs. So it feels like I'm missing something, in which case I'd
> rather not offer any R-b.

The purpose of hpet_get_channel() is to distribute HPET channels
across CPUs, so that each CPU gets assigned an HPET channel, balancing
the number of CPUs that use each channel.

This is done by (next_channel++ % num_hpets_used), so the value of
next_channel after this change can be > num_hpets_used, which
previously wasn't.  This can lead to a different channel returned for
the 2^32 call to the function, as the counter would overflow.  Note
calls to the function are restricted to the number of CPUs in the
host, as per-CPU channel assignment is done only once (and the channel
is then stored in a per-cpu variable).

Feel free to adjust the commit message if you think all this is too
much data, or too confusing.

Thanks, Roger.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] x86/hpet: use an atomic add instead of a cmpxchg loop
  2024-02-22 10:58     ` Roger Pau Monné
@ 2024-02-22 11:02       ` Jan Beulich
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Beulich @ 2024-02-22 11:02 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Andrew Cooper, Wei Liu, xen-devel

On 22.02.2024 11:58, Roger Pau Monné wrote:
> On Thu, Feb 22, 2024 at 11:10:54AM +0100, Jan Beulich wrote:
>> On 22.02.2024 10:05, Roger Pau Monne wrote:
>>> The usage of a cmpxchg loop in hpet_get_channel() is unnecessary, as the same
>>> can be achieved with an atomic increment, which is both simpler to read, and
>>> avoid any need for a loop.
>>>
>>> Note there can be a small divergence in the channel returned if next_channel
>>> overflows, but returned channel will always be in the [0, num_hpets_used)
>>> range, and that's fine for the purpose of balancing HPET channels across CPUs.
>>> This is also theoretical, as there's no system currently with 2^32 CPUs (as
>>> long as next_channel is 32bit width).
>>
>> The code change looks good to me, but I fail to see the connection to
>> 2^32 CPUs. So it feels like I'm missing something, in which case I'd
>> rather not offer any R-b.
> 
> The purpose of hpet_get_channel() is to distribute HPET channels
> across CPUs, so that each CPU gets assigned an HPET channel, balancing
> the number of CPUs that use each channel.
> 
> This is done by (next_channel++ % num_hpets_used), so the value of
> next_channel after this change can be > num_hpets_used, which
> previously wasn't.  This can lead to a different channel returned for
> the 2^32 call to the function, as the counter would overflow.  Note
> calls to the function are restricted to the number of CPUs in the
> host, as per-CPU channel assignment is done only once (and the channel
> is then stored in a per-cpu variable).

That's once per CPU being brought up, not once per CPU in the system.

> Feel free to adjust the commit message if you think all this is too
> much data, or too confusing.

I'll simply drop that last sentence then, without which
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] x86/memsharing: use an atomic add instead of a cmpxchg loop
  2024-02-22  9:05 ` [PATCH 1/2] x86/memsharing: use an atomic add instead of a cmpxchg loop Roger Pau Monne
  2024-02-22 10:06   ` Jan Beulich
@ 2024-02-22 11:12   ` Julien Grall
  2024-02-29  7:37   ` Jan Beulich
  2 siblings, 0 replies; 17+ messages in thread
From: Julien Grall @ 2024-02-22 11:12 UTC (permalink / raw)
  To: Roger Pau Monne, xen-devel
  Cc: Tamas K Lengyel, Jan Beulich, Andrew Cooper, George Dunlap,
	Wei Liu

Hi Roger,

On 22/02/2024 09:05, Roger Pau Monne wrote:
> The usage of a cmpxchg loop in get_next_handle() is unnecessary, as the same
> can be achieved with an atomic increment, which is both simpler to read, and
> avoid any need for a loop.
> 
> The cmpxchg usage is likely a remnant of 32bit support, which didn't have an
> instruction to do an atomic 64bit add, and instead a cmpxchg had to be used.
> 
> Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-of-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] x86/hpet: use an atomic add instead of a cmpxchg loop
  2024-02-22  9:05 ` [PATCH 2/2] x86/hpet: " Roger Pau Monne
  2024-02-22 10:10   ` Jan Beulich
@ 2024-02-22 11:16   ` Julien Grall
  2024-02-26  9:29   ` Jan Beulich
  2 siblings, 0 replies; 17+ messages in thread
From: Julien Grall @ 2024-02-22 11:16 UTC (permalink / raw)
  To: Roger Pau Monne, xen-devel; +Cc: Jan Beulich, Andrew Cooper, Wei Liu

Hi Roger,

On 22/02/2024 09:05, Roger Pau Monne wrote:
> The usage of a cmpxchg loop in hpet_get_channel() is unnecessary, as the same
> can be achieved with an atomic increment, which is both simpler to read, and
> avoid any need for a loop.
> 
> Note there can be a small divergence in the channel returned if next_channel
> overflows, but returned channel will always be in the [0, num_hpets_used)
> range, and that's fine for the purpose of balancing HPET channels across CPUs.
> This is also theoretical, as there's no system currently with 2^32 CPUs (as
> long as next_channel is 32bit width).

This would also happen if we decide to reduce the size next_channel.

Reviewed-by: Julien Grall <jgrall@amazon.com>

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] x86/memsharing: use an atomic add instead of a cmpxchg loop
  2024-02-22 10:06   ` Jan Beulich
@ 2024-02-22 18:03     ` Tamas K Lengyel
  2024-02-23  7:43       ` Jan Beulich
  0 siblings, 1 reply; 17+ messages in thread
From: Tamas K Lengyel @ 2024-02-22 18:03 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Roger Pau Monne, Andrew Cooper, George Dunlap, Wei Liu, xen-devel

On Thu, Feb 22, 2024 at 5:06 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 22.02.2024 10:05, Roger Pau Monne wrote:
> > The usage of a cmpxchg loop in get_next_handle() is unnecessary, as the same
> > can be achieved with an atomic increment, which is both simpler to read, and
> > avoid any need for a loop.
> >
> > The cmpxchg usage is likely a remnant of 32bit support, which didn't have an
> > instruction to do an atomic 64bit add, and instead a cmpxchg had to be used.
> >
> > Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
> > Signed-of-by: Roger Pau Monné <roger.pau@citrix.com>
>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> albeit ...
>
> > --- a/xen/arch/x86/mm/mem_sharing.c
> > +++ b/xen/arch/x86/mm/mem_sharing.c
> > @@ -179,13 +179,7 @@ static void mem_sharing_page_unlock(struct page_info *pg)
> >
> >  static shr_handle_t get_next_handle(void)
> >  {
> > -    /* Get the next handle get_page style */
> > -    uint64_t x, y = next_handle;
> > -    do {
> > -        x = y;
> > -    }
> > -    while ( (y = cmpxchg(&next_handle, x, x + 1)) != x );
> > -    return x + 1;
> > +    return arch_fetch_and_add(&next_handle, 1) + 1;
> >  }
>
> ... the adding of 1 here is a little odd when taken together with
> next_handle's initializer. Tamas, you've not written that code, but do
> you have any thoughts towards the possible removal of either the
> initializer or the adding here? Plus that variable of course could
> very well do with moving into this function.

I have to say I find the existing logic here hard to parse but by the
looks I don't think we need the + 1 once we switch to
arch_fetch_and_add. Also could go without initializing next_handle to
1. Moving it into the function would not really accomplish anything
other than style AFAICT?

Tamas


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] x86/memsharing: use an atomic add instead of a cmpxchg loop
  2024-02-22 18:03     ` Tamas K Lengyel
@ 2024-02-23  7:43       ` Jan Beulich
  2024-02-28 10:53         ` Roger Pau Monné
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2024-02-23  7:43 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Roger Pau Monne, Andrew Cooper, George Dunlap, Wei Liu, xen-devel

On 22.02.2024 19:03, Tamas K Lengyel wrote:
> On Thu, Feb 22, 2024 at 5:06 AM Jan Beulich <jbeulich@suse.com> wrote:
>> On 22.02.2024 10:05, Roger Pau Monne wrote:
>>> The usage of a cmpxchg loop in get_next_handle() is unnecessary, as the same
>>> can be achieved with an atomic increment, which is both simpler to read, and
>>> avoid any need for a loop.
>>>
>>> The cmpxchg usage is likely a remnant of 32bit support, which didn't have an
>>> instruction to do an atomic 64bit add, and instead a cmpxchg had to be used.
>>>
>>> Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>> Signed-of-by: Roger Pau Monné <roger.pau@citrix.com>
>>
>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>> albeit ...
>>
>>> --- a/xen/arch/x86/mm/mem_sharing.c
>>> +++ b/xen/arch/x86/mm/mem_sharing.c
>>> @@ -179,13 +179,7 @@ static void mem_sharing_page_unlock(struct page_info *pg)
>>>
>>>  static shr_handle_t get_next_handle(void)
>>>  {
>>> -    /* Get the next handle get_page style */
>>> -    uint64_t x, y = next_handle;
>>> -    do {
>>> -        x = y;
>>> -    }
>>> -    while ( (y = cmpxchg(&next_handle, x, x + 1)) != x );
>>> -    return x + 1;
>>> +    return arch_fetch_and_add(&next_handle, 1) + 1;
>>>  }
>>
>> ... the adding of 1 here is a little odd when taken together with
>> next_handle's initializer. Tamas, you've not written that code, but do
>> you have any thoughts towards the possible removal of either the
>> initializer or the adding here? Plus that variable of course could
>> very well do with moving into this function.
> 
> I have to say I find the existing logic here hard to parse but by the
> looks I don't think we need the + 1 once we switch to
> arch_fetch_and_add. Also could go without initializing next_handle to
> 1. Moving it into the function would not really accomplish anything
> other than style AFAICT?

Well, limiting scope of things can be viewed as purely style, but I
think it's more than that: It makes intentions more clear and reduces
the chance of abuse (deliberate or unintentional).

Jan


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] x86/hpet: use an atomic add instead of a cmpxchg loop
  2024-02-22  9:05 ` [PATCH 2/2] x86/hpet: " Roger Pau Monne
  2024-02-22 10:10   ` Jan Beulich
  2024-02-22 11:16   ` Julien Grall
@ 2024-02-26  9:29   ` Jan Beulich
  2 siblings, 0 replies; 17+ messages in thread
From: Jan Beulich @ 2024-02-26  9:29 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, Wei Liu, xen-devel

On 22.02.2024 10:05, Roger Pau Monne wrote:
> The usage of a cmpxchg loop in hpet_get_channel() is unnecessary, as the same
> can be achieved with an atomic increment, which is both simpler to read, and
> avoid any need for a loop.
> 
> Note there can be a small divergence in the channel returned if next_channel
> overflows, but returned channel will always be in the [0, num_hpets_used)
> range, and that's fine for the purpose of balancing HPET channels across CPUs.
> This is also theoretical, as there's no system currently with 2^32 CPUs (as
> long as next_channel is 32bit width).
> 
> Signed-of-by: Roger Pau Monné <roger.pau@citrix.com>

Hmm, I'm sorry - it's now me who is recorded as the author of the patch,
for my script not finding any Signed-off-by: here (note the typo).

Jan


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] x86/memsharing: use an atomic add instead of a cmpxchg loop
  2024-02-23  7:43       ` Jan Beulich
@ 2024-02-28 10:53         ` Roger Pau Monné
  2024-02-28 11:18           ` Jan Beulich
  0 siblings, 1 reply; 17+ messages in thread
From: Roger Pau Monné @ 2024-02-28 10:53 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tamas K Lengyel, Andrew Cooper, George Dunlap, Wei Liu, xen-devel

On Fri, Feb 23, 2024 at 08:43:24AM +0100, Jan Beulich wrote:
> On 22.02.2024 19:03, Tamas K Lengyel wrote:
> > On Thu, Feb 22, 2024 at 5:06 AM Jan Beulich <jbeulich@suse.com> wrote:
> >> On 22.02.2024 10:05, Roger Pau Monne wrote:
> >>> The usage of a cmpxchg loop in get_next_handle() is unnecessary, as the same
> >>> can be achieved with an atomic increment, which is both simpler to read, and
> >>> avoid any need for a loop.
> >>>
> >>> The cmpxchg usage is likely a remnant of 32bit support, which didn't have an
> >>> instruction to do an atomic 64bit add, and instead a cmpxchg had to be used.
> >>>
> >>> Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
> >>> Signed-of-by: Roger Pau Monné <roger.pau@citrix.com>
> >>
> >> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> >> albeit ...
> >>
> >>> --- a/xen/arch/x86/mm/mem_sharing.c
> >>> +++ b/xen/arch/x86/mm/mem_sharing.c
> >>> @@ -179,13 +179,7 @@ static void mem_sharing_page_unlock(struct page_info *pg)
> >>>
> >>>  static shr_handle_t get_next_handle(void)
> >>>  {
> >>> -    /* Get the next handle get_page style */
> >>> -    uint64_t x, y = next_handle;
> >>> -    do {
> >>> -        x = y;
> >>> -    }
> >>> -    while ( (y = cmpxchg(&next_handle, x, x + 1)) != x );
> >>> -    return x + 1;
> >>> +    return arch_fetch_and_add(&next_handle, 1) + 1;
> >>>  }
> >>
> >> ... the adding of 1 here is a little odd when taken together with
> >> next_handle's initializer. Tamas, you've not written that code, but do
> >> you have any thoughts towards the possible removal of either the
> >> initializer or the adding here? Plus that variable of course could
> >> very well do with moving into this function.
> > 
> > I have to say I find the existing logic here hard to parse but by the
> > looks I don't think we need the + 1 once we switch to
> > arch_fetch_and_add. Also could go without initializing next_handle to
> > 1. Moving it into the function would not really accomplish anything
> > other than style AFAICT?
> 
> Well, limiting scope of things can be viewed as purely style, but I
> think it's more than that: It makes intentions more clear and reduces
> the chance of abuse (deliberate or unintentional).

I'm afraid that whatever is the outcome here, I will defer it to a
further commit, since the purpose here is to be a non-functional
change.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] x86/memsharing: use an atomic add instead of a cmpxchg loop
  2024-02-28 10:53         ` Roger Pau Monné
@ 2024-02-28 11:18           ` Jan Beulich
  2024-02-28 13:28             ` Roger Pau Monné
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2024-02-28 11:18 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Tamas K Lengyel, Andrew Cooper, George Dunlap, Wei Liu, xen-devel

On 28.02.2024 11:53, Roger Pau Monné wrote:
> On Fri, Feb 23, 2024 at 08:43:24AM +0100, Jan Beulich wrote:
>> On 22.02.2024 19:03, Tamas K Lengyel wrote:
>>> On Thu, Feb 22, 2024 at 5:06 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>> On 22.02.2024 10:05, Roger Pau Monne wrote:
>>>>> The usage of a cmpxchg loop in get_next_handle() is unnecessary, as the same
>>>>> can be achieved with an atomic increment, which is both simpler to read, and
>>>>> avoid any need for a loop.
>>>>>
>>>>> The cmpxchg usage is likely a remnant of 32bit support, which didn't have an
>>>>> instruction to do an atomic 64bit add, and instead a cmpxchg had to be used.
>>>>>
>>>>> Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>>> Signed-of-by: Roger Pau Monné <roger.pau@citrix.com>
>>>>
>>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>>> albeit ...
>>>>
>>>>> --- a/xen/arch/x86/mm/mem_sharing.c
>>>>> +++ b/xen/arch/x86/mm/mem_sharing.c
>>>>> @@ -179,13 +179,7 @@ static void mem_sharing_page_unlock(struct page_info *pg)
>>>>>
>>>>>  static shr_handle_t get_next_handle(void)
>>>>>  {
>>>>> -    /* Get the next handle get_page style */
>>>>> -    uint64_t x, y = next_handle;
>>>>> -    do {
>>>>> -        x = y;
>>>>> -    }
>>>>> -    while ( (y = cmpxchg(&next_handle, x, x + 1)) != x );
>>>>> -    return x + 1;
>>>>> +    return arch_fetch_and_add(&next_handle, 1) + 1;
>>>>>  }
>>>>
>>>> ... the adding of 1 here is a little odd when taken together with
>>>> next_handle's initializer. Tamas, you've not written that code, but do
>>>> you have any thoughts towards the possible removal of either the
>>>> initializer or the adding here? Plus that variable of course could
>>>> very well do with moving into this function.
>>>
>>> I have to say I find the existing logic here hard to parse but by the
>>> looks I don't think we need the + 1 once we switch to
>>> arch_fetch_and_add. Also could go without initializing next_handle to
>>> 1. Moving it into the function would not really accomplish anything
>>> other than style AFAICT?
>>
>> Well, limiting scope of things can be viewed as purely style, but I
>> think it's more than that: It makes intentions more clear and reduces
>> the chance of abuse (deliberate or unintentional).
> 
> I'm afraid that whatever is the outcome here, I will defer it to a
> further commit, since the purpose here is to be a non-functional
> change.

That's fine with me, but an ack from Tamas is still pending, unless I
missed something somewhere.

Jan


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] x86/memsharing: use an atomic add instead of a cmpxchg loop
  2024-02-28 11:18           ` Jan Beulich
@ 2024-02-28 13:28             ` Roger Pau Monné
  2024-02-28 13:38               ` Tamas K Lengyel
  0 siblings, 1 reply; 17+ messages in thread
From: Roger Pau Monné @ 2024-02-28 13:28 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tamas K Lengyel, Andrew Cooper, George Dunlap, Wei Liu, xen-devel

On Wed, Feb 28, 2024 at 12:18:31PM +0100, Jan Beulich wrote:
> On 28.02.2024 11:53, Roger Pau Monné wrote:
> > On Fri, Feb 23, 2024 at 08:43:24AM +0100, Jan Beulich wrote:
> >> On 22.02.2024 19:03, Tamas K Lengyel wrote:
> >>> On Thu, Feb 22, 2024 at 5:06 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>> On 22.02.2024 10:05, Roger Pau Monne wrote:
> >>>>> The usage of a cmpxchg loop in get_next_handle() is unnecessary, as the same
> >>>>> can be achieved with an atomic increment, which is both simpler to read, and
> >>>>> avoid any need for a loop.
> >>>>>
> >>>>> The cmpxchg usage is likely a remnant of 32bit support, which didn't have an
> >>>>> instruction to do an atomic 64bit add, and instead a cmpxchg had to be used.
> >>>>>
> >>>>> Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
> >>>>> Signed-of-by: Roger Pau Monné <roger.pau@citrix.com>
> >>>>
> >>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> >>>> albeit ...
> >>>>
> >>>>> --- a/xen/arch/x86/mm/mem_sharing.c
> >>>>> +++ b/xen/arch/x86/mm/mem_sharing.c
> >>>>> @@ -179,13 +179,7 @@ static void mem_sharing_page_unlock(struct page_info *pg)
> >>>>>
> >>>>>  static shr_handle_t get_next_handle(void)
> >>>>>  {
> >>>>> -    /* Get the next handle get_page style */
> >>>>> -    uint64_t x, y = next_handle;
> >>>>> -    do {
> >>>>> -        x = y;
> >>>>> -    }
> >>>>> -    while ( (y = cmpxchg(&next_handle, x, x + 1)) != x );
> >>>>> -    return x + 1;
> >>>>> +    return arch_fetch_and_add(&next_handle, 1) + 1;
> >>>>>  }
> >>>>
> >>>> ... the adding of 1 here is a little odd when taken together with
> >>>> next_handle's initializer. Tamas, you've not written that code, but do
> >>>> you have any thoughts towards the possible removal of either the
> >>>> initializer or the adding here? Plus that variable of course could
> >>>> very well do with moving into this function.
> >>>
> >>> I have to say I find the existing logic here hard to parse but by the
> >>> looks I don't think we need the + 1 once we switch to
> >>> arch_fetch_and_add. Also could go without initializing next_handle to
> >>> 1. Moving it into the function would not really accomplish anything
> >>> other than style AFAICT?
> >>
> >> Well, limiting scope of things can be viewed as purely style, but I
> >> think it's more than that: It makes intentions more clear and reduces
> >> the chance of abuse (deliberate or unintentional).
> > 
> > I'm afraid that whatever is the outcome here, I will defer it to a
> > further commit, since the purpose here is to be a non-functional
> > change.
> 
> That's fine with me, but an ack from Tamas is still pending, unless I
> missed something somewhere.

No, just wanted to clarify that I wasn't expecting to do further
changes here, FTAOD.  Not sure if Tamas was expecting me to further
adjust the code.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] x86/memsharing: use an atomic add instead of a cmpxchg loop
  2024-02-28 13:28             ` Roger Pau Monné
@ 2024-02-28 13:38               ` Tamas K Lengyel
  0 siblings, 0 replies; 17+ messages in thread
From: Tamas K Lengyel @ 2024-02-28 13:38 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Jan Beulich, Andrew Cooper, George Dunlap, Wei Liu, xen-devel

On Wed, Feb 28, 2024 at 8:28 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Wed, Feb 28, 2024 at 12:18:31PM +0100, Jan Beulich wrote:
> > On 28.02.2024 11:53, Roger Pau Monné wrote:
> > > On Fri, Feb 23, 2024 at 08:43:24AM +0100, Jan Beulich wrote:
> > >> On 22.02.2024 19:03, Tamas K Lengyel wrote:
> > >>> On Thu, Feb 22, 2024 at 5:06 AM Jan Beulich <jbeulich@suse.com> wrote:
> > >>>> On 22.02.2024 10:05, Roger Pau Monne wrote:
> > >>>>> The usage of a cmpxchg loop in get_next_handle() is unnecessary, as the same
> > >>>>> can be achieved with an atomic increment, which is both simpler to read, and
> > >>>>> avoid any need for a loop.
> > >>>>>
> > >>>>> The cmpxchg usage is likely a remnant of 32bit support, which didn't have an
> > >>>>> instruction to do an atomic 64bit add, and instead a cmpxchg had to be used.
> > >>>>>
> > >>>>> Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
> > >>>>> Signed-of-by: Roger Pau Monné <roger.pau@citrix.com>
> > >>>>
> > >>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> > >>>> albeit ...
> > >>>>
> > >>>>> --- a/xen/arch/x86/mm/mem_sharing.c
> > >>>>> +++ b/xen/arch/x86/mm/mem_sharing.c
> > >>>>> @@ -179,13 +179,7 @@ static void mem_sharing_page_unlock(struct page_info *pg)
> > >>>>>
> > >>>>>  static shr_handle_t get_next_handle(void)
> > >>>>>  {
> > >>>>> -    /* Get the next handle get_page style */
> > >>>>> -    uint64_t x, y = next_handle;
> > >>>>> -    do {
> > >>>>> -        x = y;
> > >>>>> -    }
> > >>>>> -    while ( (y = cmpxchg(&next_handle, x, x + 1)) != x );
> > >>>>> -    return x + 1;
> > >>>>> +    return arch_fetch_and_add(&next_handle, 1) + 1;
> > >>>>>  }
> > >>>>
> > >>>> ... the adding of 1 here is a little odd when taken together with
> > >>>> next_handle's initializer. Tamas, you've not written that code, but do
> > >>>> you have any thoughts towards the possible removal of either the
> > >>>> initializer or the adding here? Plus that variable of course could
> > >>>> very well do with moving into this function.
> > >>>
> > >>> I have to say I find the existing logic here hard to parse but by the
> > >>> looks I don't think we need the + 1 once we switch to
> > >>> arch_fetch_and_add. Also could go without initializing next_handle to
> > >>> 1. Moving it into the function would not really accomplish anything
> > >>> other than style AFAICT?
> > >>
> > >> Well, limiting scope of things can be viewed as purely style, but I
> > >> think it's more than that: It makes intentions more clear and reduces
> > >> the chance of abuse (deliberate or unintentional).
> > >
> > > I'm afraid that whatever is the outcome here, I will defer it to a
> > > further commit, since the purpose here is to be a non-functional
> > > change.
> >
> > That's fine with me, but an ack from Tamas is still pending, unless I
> > missed something somewhere.
>
> No, just wanted to clarify that I wasn't expecting to do further
> changes here, FTAOD.  Not sure if Tamas was expecting me to further
> adjust the code.

Acked-by: Tamas K Lengyel <tamas@tklengyel.com>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] x86/memsharing: use an atomic add instead of a cmpxchg loop
  2024-02-22  9:05 ` [PATCH 1/2] x86/memsharing: use an atomic add instead of a cmpxchg loop Roger Pau Monne
  2024-02-22 10:06   ` Jan Beulich
  2024-02-22 11:12   ` Julien Grall
@ 2024-02-29  7:37   ` Jan Beulich
  2 siblings, 0 replies; 17+ messages in thread
From: Jan Beulich @ 2024-02-29  7:37 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Tamas K Lengyel, Andrew Cooper, George Dunlap, Wei Liu, xen-devel

On 22.02.2024 10:05, Roger Pau Monne wrote:
> The usage of a cmpxchg loop in get_next_handle() is unnecessary, as the same
> can be achieved with an atomic increment, which is both simpler to read, and
> avoid any need for a loop.
> 
> The cmpxchg usage is likely a remnant of 32bit support, which didn't have an
> instruction to do an atomic 64bit add, and instead a cmpxchg had to be used.
> 
> Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-of-by: Roger Pau Monné <roger.pau@citrix.com>

Hmm, same typo again, so my commit script again got the author wrong.
I'm sorry again. I would consider adding an explicit confirmation step
when there's no (well-formed) S-o-b, but I use this script elsewhere
as well, and in e.g. binutils and gcc S-o-b are pretty uncommon. (And
to avoid the question: I'm specifically avoiding From: itself in the
common case, as for you and other people with non-ASCII characters in
their names those typically don't properly show up there.)

Jan

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-02-29  7:38 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-22  9:05 [PATCH 0/2] xen/x86: cmpxchg cleanup Roger Pau Monne
2024-02-22  9:05 ` [PATCH 1/2] x86/memsharing: use an atomic add instead of a cmpxchg loop Roger Pau Monne
2024-02-22 10:06   ` Jan Beulich
2024-02-22 18:03     ` Tamas K Lengyel
2024-02-23  7:43       ` Jan Beulich
2024-02-28 10:53         ` Roger Pau Monné
2024-02-28 11:18           ` Jan Beulich
2024-02-28 13:28             ` Roger Pau Monné
2024-02-28 13:38               ` Tamas K Lengyel
2024-02-22 11:12   ` Julien Grall
2024-02-29  7:37   ` Jan Beulich
2024-02-22  9:05 ` [PATCH 2/2] x86/hpet: " Roger Pau Monne
2024-02-22 10:10   ` Jan Beulich
2024-02-22 10:58     ` Roger Pau Monné
2024-02-22 11:02       ` Jan Beulich
2024-02-22 11:16   ` Julien Grall
2024-02-26  9:29   ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.