All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86/mm: re-implement get_page_light() using an atomic increment
@ 2024-03-01 12:42 Roger Pau Monne
  2024-03-01 15:06 ` Andrew Cooper
  2024-03-04  7:54 ` Jan Beulich
  0 siblings, 2 replies; 5+ messages in thread
From: Roger Pau Monne @ 2024-03-01 12:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Roger Pau Monne, Jan Beulich, Andrew Cooper, Wei Liu

The current usage of a cmpxchg loop to increase the value of page count is not
optimal on amd64, as there's already an instruction to do an atomic add to a
64bit integer.

Switch the code in get_page_light() to use an atomic increment, as that avoids
a loop construct.  This slightly changes the order of the checks, as current
code will crash before modifying the page count_info if the conditions are not
correct, while with the proposed change the crash will happen immediately
after having carried the counter increase.  Since we are crashing anyway, I
don't believe the re-ordering to have any meaningful impact.

Note that the page must already have a non-zero reference count which prevents
the flags from changing, and the previous usage of the cmpxchg loop didn't
guarantee that the rest of the fields in count_info didn't change while
updating the reference count.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/mm.c | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 4d6d7bfe4f89..2aff6d4b5338 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -2580,16 +2580,10 @@ bool get_page(struct page_info *page, const struct domain *domain)
  */
 static void get_page_light(struct page_info *page)
 {
-    unsigned long x, nx, y = page->count_info;
+    unsigned long old_pgc = arch_fetch_and_add(&page->count_info, 1);
 
-    do {
-        x  = y;
-        nx = x + 1;
-        BUG_ON(!(x & PGC_count_mask)); /* Not allocated? */
-        BUG_ON(!(nx & PGC_count_mask)); /* Overflow? */
-        y = cmpxchg(&page->count_info, x, nx);
-    }
-    while ( unlikely(y != x) );
+    BUG_ON(!(old_pgc & PGC_count_mask)); /* Not allocated? */
+    BUG_ON(!((old_pgc + 1) & PGC_count_mask)); /* Overflow? */
 }
 
 static int validate_page(struct page_info *page, unsigned long type,
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86/mm: re-implement get_page_light() using an atomic increment
  2024-03-01 12:42 [PATCH] x86/mm: re-implement get_page_light() using an atomic increment Roger Pau Monne
@ 2024-03-01 15:06 ` Andrew Cooper
  2024-03-04  7:54 ` Jan Beulich
  1 sibling, 0 replies; 5+ messages in thread
From: Andrew Cooper @ 2024-03-01 15:06 UTC (permalink / raw)
  To: Roger Pau Monne, xen-devel; +Cc: Jan Beulich, Wei Liu

On 01/03/2024 12:42 pm, Roger Pau Monne wrote:
> The current usage of a cmpxchg loop to increase the value of page count is not
> optimal on amd64, as there's already an instruction to do an atomic add to a
> 64bit integer.
>
> Switch the code in get_page_light() to use an atomic increment, as that avoids
> a loop construct.  This slightly changes the order of the checks, as current
> code will crash before modifying the page count_info if the conditions are not
> correct, while with the proposed change the crash will happen immediately
> after having carried the counter increase.  Since we are crashing anyway, I
> don't believe the re-ordering to have any meaningful impact.
>
> Note that the page must already have a non-zero reference count which prevents
> the flags from changing, and the previous usage of the cmpxchg loop didn't
> guarantee that the rest of the fields in count_info didn't change while
> updating the reference count.
>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

I think the minor new corner case is well worth the simplification this
change provides.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86/mm: re-implement get_page_light() using an atomic increment
  2024-03-01 12:42 [PATCH] x86/mm: re-implement get_page_light() using an atomic increment Roger Pau Monne
  2024-03-01 15:06 ` Andrew Cooper
@ 2024-03-04  7:54 ` Jan Beulich
  2024-03-04  8:50   ` Roger Pau Monné
  1 sibling, 1 reply; 5+ messages in thread
From: Jan Beulich @ 2024-03-04  7:54 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, Wei Liu, xen-devel

On 01.03.2024 13:42, Roger Pau Monne wrote:
> The current usage of a cmpxchg loop to increase the value of page count is not
> optimal on amd64, as there's already an instruction to do an atomic add to a
> 64bit integer.
> 
> Switch the code in get_page_light() to use an atomic increment, as that avoids
> a loop construct.  This slightly changes the order of the checks, as current
> code will crash before modifying the page count_info if the conditions are not
> correct, while with the proposed change the crash will happen immediately
> after having carried the counter increase.  Since we are crashing anyway, I
> don't believe the re-ordering to have any meaningful impact.

While I consider this argument fine for ...

> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -2580,16 +2580,10 @@ bool get_page(struct page_info *page, const struct domain *domain)
>   */
>  static void get_page_light(struct page_info *page)
>  {
> -    unsigned long x, nx, y = page->count_info;
> +    unsigned long old_pgc = arch_fetch_and_add(&page->count_info, 1);
>  
> -    do {
> -        x  = y;
> -        nx = x + 1;
> -        BUG_ON(!(x & PGC_count_mask)); /* Not allocated? */

... this check, I'm afraid ...

> -        BUG_ON(!(nx & PGC_count_mask)); /* Overflow? */

... this is a problem unless we discount the possibility of an overflow
happening in practice: If an overflow was detected only after the fact,
there would be a window in time where privilege escalation was still
possible from another CPU. IOW at the very least the description will
need extending further. Personally I wouldn't chance it and leave this
as a loop.

Jan


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86/mm: re-implement get_page_light() using an atomic increment
  2024-03-04  7:54 ` Jan Beulich
@ 2024-03-04  8:50   ` Roger Pau Monné
  2024-03-04  8:54     ` Jan Beulich
  0 siblings, 1 reply; 5+ messages in thread
From: Roger Pau Monné @ 2024-03-04  8:50 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, xen-devel

On Mon, Mar 04, 2024 at 08:54:34AM +0100, Jan Beulich wrote:
> On 01.03.2024 13:42, Roger Pau Monne wrote:
> > The current usage of a cmpxchg loop to increase the value of page count is not
> > optimal on amd64, as there's already an instruction to do an atomic add to a
> > 64bit integer.
> > 
> > Switch the code in get_page_light() to use an atomic increment, as that avoids
> > a loop construct.  This slightly changes the order of the checks, as current
> > code will crash before modifying the page count_info if the conditions are not
> > correct, while with the proposed change the crash will happen immediately
> > after having carried the counter increase.  Since we are crashing anyway, I
> > don't believe the re-ordering to have any meaningful impact.
> 
> While I consider this argument fine for ...
> 
> > --- a/xen/arch/x86/mm.c
> > +++ b/xen/arch/x86/mm.c
> > @@ -2580,16 +2580,10 @@ bool get_page(struct page_info *page, const struct domain *domain)
> >   */
> >  static void get_page_light(struct page_info *page)
> >  {
> > -    unsigned long x, nx, y = page->count_info;
> > +    unsigned long old_pgc = arch_fetch_and_add(&page->count_info, 1);
> >  
> > -    do {
> > -        x  = y;
> > -        nx = x + 1;
> > -        BUG_ON(!(x & PGC_count_mask)); /* Not allocated? */
> 
> ... this check, I'm afraid ...
> 
> > -        BUG_ON(!(nx & PGC_count_mask)); /* Overflow? */
> 
> ... this is a problem unless we discount the possibility of an overflow
> happening in practice: If an overflow was detected only after the fact,
> there would be a window in time where privilege escalation was still
> possible from another CPU. IOW at the very least the description will
> need extending further. Personally I wouldn't chance it and leave this
> as a loop.

So you are worried because this could potentially turn a DoS into an
information leak during the brief period of time where the page
counter has overflowed into the PGC state.

My understating is the BUG_ON() was a mere protection against bad code
that could mess with the counter, but that the counter overflowing is
not a real issue during normal operation.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86/mm: re-implement get_page_light() using an atomic increment
  2024-03-04  8:50   ` Roger Pau Monné
@ 2024-03-04  8:54     ` Jan Beulich
  0 siblings, 0 replies; 5+ messages in thread
From: Jan Beulich @ 2024-03-04  8:54 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Andrew Cooper, Wei Liu, xen-devel

On 04.03.2024 09:50, Roger Pau Monné wrote:
> On Mon, Mar 04, 2024 at 08:54:34AM +0100, Jan Beulich wrote:
>> On 01.03.2024 13:42, Roger Pau Monne wrote:
>>> The current usage of a cmpxchg loop to increase the value of page count is not
>>> optimal on amd64, as there's already an instruction to do an atomic add to a
>>> 64bit integer.
>>>
>>> Switch the code in get_page_light() to use an atomic increment, as that avoids
>>> a loop construct.  This slightly changes the order of the checks, as current
>>> code will crash before modifying the page count_info if the conditions are not
>>> correct, while with the proposed change the crash will happen immediately
>>> after having carried the counter increase.  Since we are crashing anyway, I
>>> don't believe the re-ordering to have any meaningful impact.
>>
>> While I consider this argument fine for ...
>>
>>> --- a/xen/arch/x86/mm.c
>>> +++ b/xen/arch/x86/mm.c
>>> @@ -2580,16 +2580,10 @@ bool get_page(struct page_info *page, const struct domain *domain)
>>>   */
>>>  static void get_page_light(struct page_info *page)
>>>  {
>>> -    unsigned long x, nx, y = page->count_info;
>>> +    unsigned long old_pgc = arch_fetch_and_add(&page->count_info, 1);
>>>  
>>> -    do {
>>> -        x  = y;
>>> -        nx = x + 1;
>>> -        BUG_ON(!(x & PGC_count_mask)); /* Not allocated? */
>>
>> ... this check, I'm afraid ...
>>
>>> -        BUG_ON(!(nx & PGC_count_mask)); /* Overflow? */
>>
>> ... this is a problem unless we discount the possibility of an overflow
>> happening in practice: If an overflow was detected only after the fact,
>> there would be a window in time where privilege escalation was still
>> possible from another CPU. IOW at the very least the description will
>> need extending further. Personally I wouldn't chance it and leave this
>> as a loop.
> 
> So you are worried because this could potentially turn a DoS into an
> information leak during the brief period of time where the page
> counter has overflowed into the PGC state.
> 
> My understating is the BUG_ON() was a mere protection against bad code
> that could mess with the counter, but that the counter overflowing is
> not a real issue during normal operation.

With the present counter width it should be a merely theoretical concern.
I didn't do the older calculation again though taking LA57 into account,
so I'm not sure we're not moving onto thinner and thinner ice as hardware
(and our support for it) advances. As to "mere protection" - see how the
less wide counter was an active issue on 32-bit Xen, back at the time.

Jan


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-03-04  8:54 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-01 12:42 [PATCH] x86/mm: re-implement get_page_light() using an atomic increment Roger Pau Monne
2024-03-01 15:06 ` Andrew Cooper
2024-03-04  7:54 ` Jan Beulich
2024-03-04  8:50   ` Roger Pau Monné
2024-03-04  8:54     ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.