[Qemu-devel] [PATCH v4] exec: Fix non-power-of-2 sized accesses

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v4] exec: Fix non-power-of-2 sized accesses
@ 2013-08-16 21:58 Alex Williamson
  2013-08-17  6:33 ` Paolo Bonzini
  2013-08-17  8:23 ` Laszlo Ersek
  0 siblings, 2 replies; 7+ messages in thread
From: Alex Williamson @ 2013-08-16 21:58 UTC (permalink / raw)
  To: qemu-devel; +Cc: lersek, qemu-stable, rth

Since commit 23326164 we align access sizes to match the alignment of
the address, but we don't align the access size itself.  This means we
let illegal access sizes (ex. 3) slip through if the address is
sufficiently aligned (ex. 4).  This results in an abort which would be
easy for a guest to trigger.  Account for aligning the access size.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: qemu-stable@nongnu.org
---

v4: KISS
v3: Highest power of 2, not lowest
v2: Remove unnecessary loop condition

 exec.c |   18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/exec.c b/exec.c
index 3ca9381..67a822c 100644
--- a/exec.c
+++ b/exec.c
@@ -1924,12 +1924,20 @@ static int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr)
         }
     }
 
-    /* Don't attempt accesses larger than the maximum.  */
-    if (l > access_size_max) {
-        l = access_size_max;
+    /* Don't attempt accesses larger than the maximum or unsupported sizes.  */
+    if (l >= access_size_max) {
+        return access_size_max;
+    } else {
+        if (l >= 8) {
+            return 8;
+        } else if (l >= 4) {
+            return 4;
+        } else if (l >= 2) {
+            return 2;
+        } else {
+            return 1;
+        }
     }
-
-    return l;
 }
 
 bool address_space_rw(AddressSpace *as, hwaddr addr, uint8_t *buf,

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH v4] exec: Fix non-power-of-2 sized accesses
  2013-08-16 21:58 [Qemu-devel] [PATCH v4] exec: Fix non-power-of-2 sized accesses Alex Williamson
@ 2013-08-17  6:33 ` Paolo Bonzini
  2013-08-17 15:19   ` Alex Williamson
  2013-08-17  8:23 ` Laszlo Ersek
  1 sibling, 1 reply; 7+ messages in thread
From: Paolo Bonzini @ 2013-08-17  6:33 UTC (permalink / raw)
  To: Alex Williamson; +Cc: rth, lersek, qemu-devel, qemu-stable

Il 16/08/2013 23:58, Alex Williamson ha scritto:
> Since commit 23326164 we align access sizes to match the alignment of
> the address, but we don't align the access size itself.  This means we
> let illegal access sizes (ex. 3) slip through if the address is
> sufficiently aligned (ex. 4).  This results in an abort which would be
> easy for a guest to trigger.  Account for aligning the access size.

Is it the same as this?

http://lists.gnu.org/archive/html/qemu-devel/2013-07/msg05398.html

(which perhaps is buggy as your v1/v2/v3 :))?

Paolo

> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> Cc: qemu-stable@nongnu.org
> ---
> 
> v4: KISS
> v3: Highest power of 2, not lowest
> v2: Remove unnecessary loop condition
> 
>  exec.c |   18 +++++++++++++-----
>  1 file changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index 3ca9381..67a822c 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1924,12 +1924,20 @@ static int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr)
>          }
>      }
>  
> -    /* Don't attempt accesses larger than the maximum.  */
> -    if (l > access_size_max) {
> -        l = access_size_max;
> +    /* Don't attempt accesses larger than the maximum or unsupported sizes.  */
> +    if (l >= access_size_max) {
> +        return access_size_max;
> +    } else {
> +        if (l >= 8) {
> +            return 8;
> +        } else if (l >= 4) {
> +            return 4;
> +        } else if (l >= 2) {
> +            return 2;
> +        } else {
> +            return 1;
> +        }
>      }
> -
> -    return l;
>  }
>  
>  bool address_space_rw(AddressSpace *as, hwaddr addr, uint8_t *buf,
> 
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH v4] exec: Fix non-power-of-2 sized accesses
  2013-08-17  6:33 ` Paolo Bonzini
@ 2013-08-17 15:19   ` Alex Williamson
  0 siblings, 0 replies; 7+ messages in thread
From: Alex Williamson @ 2013-08-17 15:19 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: rth, lersek, qemu-devel, qemu-stable

On Sat, 2013-08-17 at 08:33 +0200, Paolo Bonzini wrote:
> Il 16/08/2013 23:58, Alex Williamson ha scritto:
> > Since commit 23326164 we align access sizes to match the alignment of
> > the address, but we don't align the access size itself.  This means we
> > let illegal access sizes (ex. 3) slip through if the address is
> > sufficiently aligned (ex. 4).  This results in an abort which would be
> > easy for a guest to trigger.  Account for aligning the access size.
> 
> Is it the same as this?
> 
> http://lists.gnu.org/archive/html/qemu-devel/2013-07/msg05398.html
> 
> (which perhaps is buggy as your v1/v2/v3 :))?

Too bad this didn't make 1.6.  I suspect your patch is ok because I
don't think we're going to see it called with a length greater than 8.
Maybe I don't even need that test in my version, but it's reassuring to
have it.  As I note in my reply to Laszlo, using generic power-of-2
functions is quite a bit slower than the limited case we need to handle,
so while initially tempted by fancy algorithms, I actually prefer the
version below.  Thanks,

Alex

> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > Cc: qemu-stable@nongnu.org
> > ---
> > 
> > v4: KISS
> > v3: Highest power of 2, not lowest
> > v2: Remove unnecessary loop condition
> > 
> >  exec.c |   18 +++++++++++++-----
> >  1 file changed, 13 insertions(+), 5 deletions(-)
> > 
> > diff --git a/exec.c b/exec.c
> > index 3ca9381..67a822c 100644
> > --- a/exec.c
> > +++ b/exec.c
> > @@ -1924,12 +1924,20 @@ static int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr)
> >          }
> >      }
> >  
> > -    /* Don't attempt accesses larger than the maximum.  */
> > -    if (l > access_size_max) {
> > -        l = access_size_max;
> > +    /* Don't attempt accesses larger than the maximum or unsupported sizes.  */
> > +    if (l >= access_size_max) {
> > +        return access_size_max;
> > +    } else {
> > +        if (l >= 8) {
> > +            return 8;
> > +        } else if (l >= 4) {
> > +            return 4;
> > +        } else if (l >= 2) {
> > +            return 2;
> > +        } else {
> > +            return 1;
> > +        }
> >      }
> > -
> > -    return l;
> >  }
> >  
> >  bool address_space_rw(AddressSpace *as, hwaddr addr, uint8_t *buf,
> > 
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH v4] exec: Fix non-power-of-2 sized accesses
  2013-08-16 21:58 [Qemu-devel] [PATCH v4] exec: Fix non-power-of-2 sized accesses Alex Williamson
  2013-08-17  6:33 ` Paolo Bonzini
@ 2013-08-17  8:23 ` Laszlo Ersek
  2013-08-17  9:16   ` Laszlo Ersek
                     ` (2 more replies)
  1 sibling, 3 replies; 7+ messages in thread
From: Laszlo Ersek @ 2013-08-17  8:23 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Paolo Bonzini, rth, qemu-devel, qemu-stable

On 08/16/13 23:58, Alex Williamson wrote:
> Since commit 23326164 we align access sizes to match the alignment of
> the address, but we don't align the access size itself.  This means we
> let illegal access sizes (ex. 3) slip through if the address is
> sufficiently aligned (ex. 4).  This results in an abort which would be
> easy for a guest to trigger.  Account for aligning the access size.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> Cc: qemu-stable@nongnu.org
> ---
> 
> v4: KISS
> v3: Highest power of 2, not lowest
> v2: Remove unnecessary loop condition
> 
>  exec.c |   18 +++++++++++++-----
>  1 file changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index 3ca9381..67a822c 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1924,12 +1924,20 @@ static int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr)
>          }
>      }
>  
> -    /* Don't attempt accesses larger than the maximum.  */
> -    if (l > access_size_max) {
> -        l = access_size_max;
> +    /* Don't attempt accesses larger than the maximum or unsupported sizes.  */
> +    if (l >= access_size_max) {
> +        return access_size_max;
> +    } else {
> +        if (l >= 8) {
> +            return 8;
> +        } else if (l >= 4) {
> +            return 4;
> +        } else if (l >= 2) {
> +            return 2;
> +        } else {
> +            return 1;
> +        }
>      }
> -
> -    return l;
>  }
>  
>  bool address_space_rw(AddressSpace *as, hwaddr addr, uint8_t *buf,
> 

Considering that each block contains a return statement, I'd drop the
else's:

    if (l >= access_size_max) {
        return access_size_max;
    }
    if (l >= 8) {
        return 8;
    }
    if (l >= 4) {
        return 4;
    }
    if (l >= 2) {
        return 2;
    }
    return 1;

Or even

    return l >= access_size_max ? access_size_max :
           l >= 8               ? 8               :
           l >= 4               ? 4               :
           l >= 2               ? 2               :
           1;

But this is just bikeshedding, so I'm not suggesting it.

Regarding function... I can at least understand this code. So, you want
to find the most significant bit set in "l", and clear everything else.
If said leftmost bit is to the left of bit#3, then use bit#3 instead.

This idea should work if "l" is already a whole power of two.

    if (l >= access_size_max) {
        return access_size_max;
    }
    return 1 << max(3, lmb(l));

What Paolo posted seems almost identical.

clz32(l):                     leading zeros in "l"
qemu_fls(l) == 32 - clz32(l): position of leftmost bit set, 1-based
qemu_fls(l) - 1:              position of leftmost bit set, 0-based

Not sure if the (l & (l - 1)) check is needed in Paolo's patch. clz32()
is not generally usable when l==0, so maybe that's (too) what the check
is for. OTOH maybe l==0 is not even possible when entering
memory_access_size().

Second, Paolo's patch might lack the "max(3, ...)" part. Since you
didn't call my previous example with l==9 retarded, I guess clamping
(qemu_fls(l) - 1) at 3 would be necessary.

Third, clz32() is probably very fast when gcc has a builtin for it, and
probably slower than your open-coded version otherwsie.

I still don't know enough about this topic, but I like this patch
because I can understand the intent at least :)

Reviewed-by: Laszlo Ersek <lersek@redhat.com>

(Bit-counting is a great complement to the Saturday morning espresso :))

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH v4] exec: Fix non-power-of-2 sized accesses
  2013-08-17  8:23 ` Laszlo Ersek
@ 2013-08-17  9:16   ` Laszlo Ersek
  2013-08-17 15:14   ` Alex Williamson
  2013-08-17 17:58   ` Paolo Bonzini
  2 siblings, 0 replies; 7+ messages in thread
From: Laszlo Ersek @ 2013-08-17  9:16 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Paolo Bonzini, qemu-stable, qemu-devel, rth

(side point)

On 08/17/13 10:23, Laszlo Ersek wrote:

>     if (l >= access_size_max) {
>         return access_size_max;
>     }
>     return 1 << max(3, lmb(l));

lol, of course this should have been min()...

Alex's patch is OK of course.

Laszlo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH v4] exec: Fix non-power-of-2 sized accesses
  2013-08-17  8:23 ` Laszlo Ersek
  2013-08-17  9:16   ` Laszlo Ersek
@ 2013-08-17 15:14   ` Alex Williamson
  2013-08-17 17:58   ` Paolo Bonzini
  2 siblings, 0 replies; 7+ messages in thread
From: Alex Williamson @ 2013-08-17 15:14 UTC (permalink / raw)
  To: Laszlo Ersek; +Cc: Paolo Bonzini, rth, qemu-devel, qemu-stable

[-- Attachment #1: Type: text/plain, Size: 4623 bytes --]

On Sat, 2013-08-17 at 10:23 +0200, Laszlo Ersek wrote:
> On 08/16/13 23:58, Alex Williamson wrote:
> > Since commit 23326164 we align access sizes to match the alignment of
> > the address, but we don't align the access size itself.  This means we
> > let illegal access sizes (ex. 3) slip through if the address is
> > sufficiently aligned (ex. 4).  This results in an abort which would be
> > easy for a guest to trigger.  Account for aligning the access size.
> > 
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > Cc: qemu-stable@nongnu.org
> > ---
> > 
> > v4: KISS
> > v3: Highest power of 2, not lowest
> > v2: Remove unnecessary loop condition
> > 
> >  exec.c |   18 +++++++++++++-----
> >  1 file changed, 13 insertions(+), 5 deletions(-)
> > 
> > diff --git a/exec.c b/exec.c
> > index 3ca9381..67a822c 100644
> > --- a/exec.c
> > +++ b/exec.c
> > @@ -1924,12 +1924,20 @@ static int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr)
> >          }
> >      }
> >  
> > -    /* Don't attempt accesses larger than the maximum.  */
> > -    if (l > access_size_max) {
> > -        l = access_size_max;
> > +    /* Don't attempt accesses larger than the maximum or unsupported sizes.  */
> > +    if (l >= access_size_max) {
> > +        return access_size_max;
> > +    } else {
> > +        if (l >= 8) {
> > +            return 8;
> > +        } else if (l >= 4) {
> > +            return 4;
> > +        } else if (l >= 2) {
> > +            return 2;
> > +        } else {
> > +            return 1;
> > +        }
> >      }
> > -
> > -    return l;
> >  }
> >  
> >  bool address_space_rw(AddressSpace *as, hwaddr addr, uint8_t *buf,
> > 
> 
> Considering that each block contains a return statement, I'd drop the
> else's:
> 
>     if (l >= access_size_max) {
>         return access_size_max;
>     }
>     if (l >= 8) {
>         return 8;
>     }
>     if (l >= 4) {
>         return 4;
>     }
>     if (l >= 2) {
>         return 2;
>     }
>     return 1;
> 
> Or even
> 
>     return l >= access_size_max ? access_size_max :
>            l >= 8               ? 8               :
>            l >= 4               ? 4               :
>            l >= 2               ? 2               :
>            1;
> 
> But this is just bikeshedding, so I'm not suggesting it.
> 
> Regarding function... I can at least understand this code. So, you want
> to find the most significant bit set in "l", and clear everything else.
> If said leftmost bit is to the left of bit#3, then use bit#3 instead.
> 
> This idea should work if "l" is already a whole power of two.
> 
>     if (l >= access_size_max) {
>         return access_size_max;
>     }
>     return 1 << max(3, lmb(l));
> 
> What Paolo posted seems almost identical.
> 
> clz32(l):                     leading zeros in "l"
> qemu_fls(l) == 32 - clz32(l): position of leftmost bit set, 1-based
> qemu_fls(l) - 1:              position of leftmost bit set, 0-based
> 
> Not sure if the (l & (l - 1)) check is needed in Paolo's patch. clz32()
> is not generally usable when l==0, so maybe that's (too) what the check
> is for. OTOH maybe l==0 is not even possible when entering
> memory_access_size().
> 
> Second, Paolo's patch might lack the "max(3, ...)" part. Since you
> didn't call my previous example with l==9 retarded, I guess clamping
> (qemu_fls(l) - 1) at 3 would be necessary.

Whether we need to clamp on 3 really depends on the caller.  I'm
actually doubtful that this function ever gets called with l > 8.  So I
think Paolo's code works ok.  It's possible your example of l == 9 was a
red herring for my code, but I didn't have enough faith in it anyway.

> Third, clz32() is probably very fast when gcc has a builtin for it, and
> probably slower than your open-coded version otherwsie.

Nope, the open coded version in v4 is significantly faster.  See the
attached test programs.  On my laptop I get these results (compiled with
-O):

$ time ./test-open

real	0m7.442s
user	0m7.412s
sys	0m0.005s

$ time ./test-fls

real	0m9.202s
user	0m9.117s
sys	0m0.024s

$ time ./test-pow2floor

real	0m13.884s
user	0m13.796s
sys	0m0.013s


At higher optimization levels the race gets a lot closer, but the open
coded version still seems to have an advantage (assuming the test code
even remains relevant at higher levels).  So, I conclude that it's
faster to open code for the very limited range of a power-of-2 function
we need here.

> I still don't know enough about this topic, but I like this patch
> because I can understand the intent at least :)
> 
> Reviewed-by: Laszlo Ersek <lersek@redhat.com>

Thanks!
Alex


[-- Attachment #2: test-open.c --]
[-- Type: text/x-csrc, Size: 409 bytes --]

unsigned foo(unsigned max, unsigned size)
{
	if (size >= max)
		return max;
	else {
		if (size >= 8)
			return 8;
		else if (size >= 4)
			return 4;
		else if (size >= 2)
			return 2;
		else
			return 1;
	}
}

int main(void)
{
	unsigned i, l, max;
	volatile val;

	for (i = 0; i < 100000000; i++) {
		for (max = 1; max <= 8; max <<= 1) {
			for (l = 1; l <= 8; l++)
				val = foo(max, l);
		}
	}
	return 0;
}

[-- Attachment #3: test-fls.c --]
[-- Type: text/x-csrc, Size: 484 bytes --]

static inline int clz32(unsigned long val)
{
	return val ? __builtin_clz(val) : 32;
}

int fls(int value)
{
	return 32 - clz32(value);
}

unsigned foo(unsigned max, unsigned size)
{
	if (size > max)
		size = max;
	if (size & (size - 1))
		size = 1 << (fls(size) - 1);
	return size;
}

int main(void)
{
	unsigned i, l, max;
	volatile val;

	for (i = 0; i < 100000000; i++) {
		for (max = 1; max <= 8; max <<= 1) {
			for (l = 1; l <= 8; l++)
				val = foo(max, l);
		}
	}
	return 0;
}

[-- Attachment #4: test-pow2floor.c --]
[-- Type: text/x-csrc, Size: 646 bytes --]

static inline int clz64(unsigned long val)
{
	return val ? __builtin_clzll(val) : 64;
}

static inline int is_power_of_2(unsigned long value)
{
	if (!value)
		return 0;

	return !(value & (value - 1));
}

long pow2floor(long value)
{
	if (!is_power_of_2(value))
		value = 0x8000000000000000ULL >> clz64(value);

	return value;
}

unsigned foo(unsigned max, unsigned size)
{
	if (size > max)
		size = max;
	size = pow2floor(size);
	return size;
}

int main(void)
{
	unsigned i, l, max;
	volatile val;

	for (i = 0; i < 100000000; i++) {
		for (max = 1; max <= 8; max <<= 1) {
			for (l = 1; l <= 8; l++)
				val = foo(max, l);
		}
	}
	return 0;
}

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH v4] exec: Fix non-power-of-2 sized accesses
  2013-08-17  8:23 ` Laszlo Ersek
  2013-08-17  9:16   ` Laszlo Ersek
  2013-08-17 15:14   ` Alex Williamson
@ 2013-08-17 17:58   ` Paolo Bonzini
  2 siblings, 0 replies; 7+ messages in thread
From: Paolo Bonzini @ 2013-08-17 17:58 UTC (permalink / raw)
  To: Laszlo Ersek; +Cc: Alex Williamson, qemu-stable, qemu-devel, rth

Il 17/08/2013 10:23, Laszlo Ersek ha scritto:
> What Paolo posted seems almost identical.
> 
> clz32(l):                     leading zeros in "l"
> qemu_fls(l) == 32 - clz32(l): position of leftmost bit set, 1-based
> qemu_fls(l) - 1:              position of leftmost bit set, 0-based
> 
> Not sure if the (l & (l - 1)) check is needed in Paolo's patch. clz32()
> is not generally usable when l==0, so maybe that's (too) what the check
> is for. OTOH maybe l==0 is not even possible when entering
> memory_access_size().

The check was an attempt at placating complaints about possible
performance problems. :)

> Second, Paolo's patch might lack the "max(3, ...)" part. Since you
> didn't call my previous example with l==9 retarded, I guess clamping
> (qemu_fls(l) - 1) at 3 would be necessary.

That shouldn't happen, since an uint64_t is all you have for the datum.
 access_size_max should never exceed 8.

I don't really care which patch goes in, Alex's is fine as well.

Paolo

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-08-17 18:02 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-16 21:58 [Qemu-devel] [PATCH v4] exec: Fix non-power-of-2 sized accesses Alex Williamson
2013-08-17  6:33 ` Paolo Bonzini
2013-08-17 15:19   ` Alex Williamson
2013-08-17  8:23 ` Laszlo Ersek
2013-08-17  9:16   ` Laszlo Ersek
2013-08-17 15:14   ` Alex Williamson
2013-08-17 17:58   ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).