* [Qemu-devel] [RFC] find_next_bit optimizations @ 2013-03-11 13:44 Peter Lieven 2013-03-11 14:04 ` Peter Maydell ` (2 more replies) 0 siblings, 3 replies; 22+ messages in thread From: Peter Lieven @ 2013-03-11 13:44 UTC (permalink / raw) To: qemu-devel@nongnu.org; +Cc: Orit Wasserman, Corentin Chary, Paolo Bonzini Hi, I ever since had a few VMs which are very hard to migrate because of a lot of memory I/O. I found that finding the next dirty bit seemed to be one of the culprits (apart from removing locking which Paolo is working on). I have to following proposal which seems to help a lot in my case. Just wanted to have some feedback. I applied the same unrolling idea like in buffer_is_zero(). Peter --- a/util/bitops.c +++ b/util/bitops.c @@ -24,12 +24,13 @@ unsigned long find_next_bit(const unsigned long *addr, unsigned long size, const unsigned long *p = addr + BITOP_WORD(offset); unsigned long result = offset & ~(BITS_PER_LONG-1); unsigned long tmp; + unsigned long d0,d1,d2,d3; if (offset >= size) { return size; } size -= result; - offset %= BITS_PER_LONG; + offset &= (BITS_PER_LONG-1); if (offset) { tmp = *(p++); tmp &= (~0UL << offset); @@ -43,6 +44,18 @@ unsigned long find_next_bit(const unsigned long *addr, unsigned long size, result += BITS_PER_LONG; } while (size & ~(BITS_PER_LONG-1)) { + while (!(size & (4*BITS_PER_LONG-1))) { + d0 = *p; + d1 = *(p+1); + d2 = *(p+2); + d3 = *(p+3); + if (d0 || d1 || d2 || d3) { + break; + } + p+=4; + result += 4*BITS_PER_LONG; + size -= 4*BITS_PER_LONG; + } if ((tmp = *(p++))) { goto found_middle; } ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [RFC] find_next_bit optimizations 2013-03-11 13:44 [Qemu-devel] [RFC] find_next_bit optimizations Peter Lieven @ 2013-03-11 14:04 ` Peter Maydell 2013-03-11 14:14 ` Paolo Bonzini 2013-03-12 8:35 ` Stefan Hajnoczi 2 siblings, 0 replies; 22+ messages in thread From: Peter Maydell @ 2013-03-11 14:04 UTC (permalink / raw) To: Peter Lieven Cc: Orit Wasserman, qemu-devel@nongnu.org, Corentin Chary, Paolo Bonzini On 11 March 2013 13:44, Peter Lieven <pl@dlhnet.de> wrote: > @@ -24,12 +24,13 @@ unsigned long find_next_bit(const unsigned long *addr, > unsigned long size, > const unsigned long *p = addr + BITOP_WORD(offset); > unsigned long result = offset & ~(BITS_PER_LONG-1); > unsigned long tmp; > + unsigned long d0,d1,d2,d3; > > if (offset >= size) { > return size; > } > size -= result; > - offset %= BITS_PER_LONG; > + offset &= (BITS_PER_LONG-1); This change at least is unnecessary -- I just checked, and gcc is already smart enough to turn the % operation into a logical and. The generated object files are identical. -- PMM ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [RFC] find_next_bit optimizations 2013-03-11 13:44 [Qemu-devel] [RFC] find_next_bit optimizations Peter Lieven 2013-03-11 14:04 ` Peter Maydell @ 2013-03-11 14:14 ` Paolo Bonzini 2013-03-11 14:22 ` Peter Lieven 2013-03-12 8:35 ` Stefan Hajnoczi 2 siblings, 1 reply; 22+ messages in thread From: Paolo Bonzini @ 2013-03-11 14:14 UTC (permalink / raw) To: Peter Lieven; +Cc: Orit Wasserman, qemu-devel@nongnu.org, Corentin Chary Il 11/03/2013 14:44, Peter Lieven ha scritto: > Hi, > > I ever since had a few VMs which are very hard to migrate because of a > lot of memory I/O. I found that finding the next dirty bit > seemed to be one of the culprits (apart from removing locking which > Paolo is working on). > > I have to following proposal which seems to help a lot in my case. Just > wanted to have some feedback. > I applied the same unrolling idea like in buffer_is_zero(). > > Peter > > --- a/util/bitops.c > +++ b/util/bitops.c > @@ -24,12 +24,13 @@ unsigned long find_next_bit(const unsigned long > *addr, unsigned long size, > const unsigned long *p = addr + BITOP_WORD(offset); > unsigned long result = offset & ~(BITS_PER_LONG-1); > unsigned long tmp; > + unsigned long d0,d1,d2,d3; > > if (offset >= size) { > return size; > } > size -= result; > - offset %= BITS_PER_LONG; > + offset &= (BITS_PER_LONG-1); > if (offset) { > tmp = *(p++); > tmp &= (~0UL << offset); > @@ -43,6 +44,18 @@ unsigned long find_next_bit(const unsigned long > *addr, unsigned long size, > result += BITS_PER_LONG; > } > while (size & ~(BITS_PER_LONG-1)) { > + while (!(size & (4*BITS_PER_LONG-1))) { This really means if (!(size & (4*BITS_PER_LONG-1))) { while (1) { ... } } because the subtraction will not change the result of the "while" loop condition. What you want is probably "while (size & ~(4*BITS_PER_LONG-1))", which in turn means "while (size >= 4*BITS_PER_LONG). Please change both while loops to use a ">=" condition, it's easier to read. Paolo > + d0 = *p; > + d1 = *(p+1); > + d2 = *(p+2); > + d3 = *(p+3); > + if (d0 || d1 || d2 || d3) { > + break; > + } > + p+=4; > + result += 4*BITS_PER_LONG; > + size -= 4*BITS_PER_LONG; > + } > if ((tmp = *(p++))) { > goto found_middle; > } ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [RFC] find_next_bit optimizations 2013-03-11 14:14 ` Paolo Bonzini @ 2013-03-11 14:22 ` Peter Lieven 2013-03-11 14:29 ` Peter Lieven 2013-03-11 14:35 ` Paolo Bonzini 0 siblings, 2 replies; 22+ messages in thread From: Peter Lieven @ 2013-03-11 14:22 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Orit Wasserman, qemu-devel@nongnu.org, Corentin Chary Am 11.03.2013 um 15:14 schrieb Paolo Bonzini <pbonzini@redhat.com>: > Il 11/03/2013 14:44, Peter Lieven ha scritto: >> Hi, >> >> I ever since had a few VMs which are very hard to migrate because of a >> lot of memory I/O. I found that finding the next dirty bit >> seemed to be one of the culprits (apart from removing locking which >> Paolo is working on). >> >> I have to following proposal which seems to help a lot in my case. Just >> wanted to have some feedback. >> I applied the same unrolling idea like in buffer_is_zero(). >> >> Peter >> >> --- a/util/bitops.c >> +++ b/util/bitops.c >> @@ -24,12 +24,13 @@ unsigned long find_next_bit(const unsigned long >> *addr, unsigned long size, >> const unsigned long *p = addr + BITOP_WORD(offset); >> unsigned long result = offset & ~(BITS_PER_LONG-1); >> unsigned long tmp; >> + unsigned long d0,d1,d2,d3; >> >> if (offset >= size) { >> return size; >> } >> size -= result; >> - offset %= BITS_PER_LONG; >> + offset &= (BITS_PER_LONG-1); >> if (offset) { >> tmp = *(p++); >> tmp &= (~0UL << offset); >> @@ -43,6 +44,18 @@ unsigned long find_next_bit(const unsigned long >> *addr, unsigned long size, >> result += BITS_PER_LONG; >> } >> while (size & ~(BITS_PER_LONG-1)) { >> + while (!(size & (4*BITS_PER_LONG-1))) { > > This really means > > if (!(size & (4*BITS_PER_LONG-1))) { > while (1) { > ... > } > } > > because the subtraction will not change the result of the "while" loop > condition. Are you sure? The above is working nicely for me (wondering why ;-)) I think !(size & (4*BITS_PER_LONG-1)) is the same as what you propose. If size & (4*BITS_PER_LONG-1) is not zero its not dividable by 4*BITS_PER_LONG. But it see it might be a problem for size == 0. > > What you want is probably "while (size & ~(4*BITS_PER_LONG-1))", which > in turn means "while (size >= 4*BITS_PER_LONG). > > Please change both while loops to use a ">=" condition, it's easier to read. Good idea, its easier to understand. Has anyone evidence if unrolling 4 longs is optimal on today processors? I just chooses 4 longs because it was the same in buffer_is_zero. Peter > > Paolo > >> + d0 = *p; >> + d1 = *(p+1); >> + d2 = *(p+2); >> + d3 = *(p+3); >> + if (d0 || d1 || d2 || d3) { >> + break; >> + } >> + p+=4; >> + result += 4*BITS_PER_LONG; >> + size -= 4*BITS_PER_LONG; >> + } >> if ((tmp = *(p++))) { >> goto found_middle; >> } > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [RFC] find_next_bit optimizations 2013-03-11 14:22 ` Peter Lieven @ 2013-03-11 14:29 ` Peter Lieven 2013-03-11 14:35 ` Paolo Bonzini 1 sibling, 0 replies; 22+ messages in thread From: Peter Lieven @ 2013-03-11 14:29 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Orit Wasserman, qemu-devel@nongnu.org, Corentin Chary Am 11.03.2013 um 15:22 schrieb Peter Lieven <pl@dlhnet.de>: > > Am 11.03.2013 um 15:14 schrieb Paolo Bonzini <pbonzini@redhat.com>: > >> Il 11/03/2013 14:44, Peter Lieven ha scritto: >>> Hi, >>> >>> I ever since had a few VMs which are very hard to migrate because of a >>> lot of memory I/O. I found that finding the next dirty bit >>> seemed to be one of the culprits (apart from removing locking which >>> Paolo is working on). >>> >>> I have to following proposal which seems to help a lot in my case. Just >>> wanted to have some feedback. >>> I applied the same unrolling idea like in buffer_is_zero(). >>> >>> Peter >>> >>> --- a/util/bitops.c >>> +++ b/util/bitops.c >>> @@ -24,12 +24,13 @@ unsigned long find_next_bit(const unsigned long >>> *addr, unsigned long size, >>> const unsigned long *p = addr + BITOP_WORD(offset); >>> unsigned long result = offset & ~(BITS_PER_LONG-1); >>> unsigned long tmp; >>> + unsigned long d0,d1,d2,d3; >>> >>> if (offset >= size) { >>> return size; >>> } >>> size -= result; >>> - offset %= BITS_PER_LONG; >>> + offset &= (BITS_PER_LONG-1); >>> if (offset) { >>> tmp = *(p++); >>> tmp &= (~0UL << offset); >>> @@ -43,6 +44,18 @@ unsigned long find_next_bit(const unsigned long >>> *addr, unsigned long size, >>> result += BITS_PER_LONG; >>> } >>> while (size & ~(BITS_PER_LONG-1)) { >>> + while (!(size & (4*BITS_PER_LONG-1))) { >> >> This really means >> >> if (!(size & (4*BITS_PER_LONG-1))) { >> while (1) { >> ... >> } >> } >> >> because the subtraction will not change the result of the "while" loop >> condition. > > Are you sure? The above is working nicely for me (wondering why ;-)) > I think !(size & (4*BITS_PER_LONG-1)) is the same as what you > propose. If size & (4*BITS_PER_LONG-1) is not zero its not dividable > by 4*BITS_PER_LONG. But it see it might be a problem for size == 0. > >> >> What you want is probably "while (size & ~(4*BITS_PER_LONG-1))", which >> in turn means "while (size >= 4*BITS_PER_LONG). >> >> Please change both while loops to use a ">=" condition, it's easier to read. Thinking again, in case a bit is found, this might lead to unnecessary iterations in the while loop if the bit is in d1, d2 or d3. > > Good idea, its easier to understand. > > Has anyone evidence if unrolling 4 longs is optimal on today processors? > I just chooses 4 longs because it was the same in buffer_is_zero. > > Peter > >> >> Paolo >> >>> + d0 = *p; >>> + d1 = *(p+1); >>> + d2 = *(p+2); >>> + d3 = *(p+3); >>> + if (d0 || d1 || d2 || d3) { >>> + break; >>> + } >>> + p+=4; >>> + result += 4*BITS_PER_LONG; >>> + size -= 4*BITS_PER_LONG; >>> + } >>> if ((tmp = *(p++))) { >>> goto found_middle; >>> } >> > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [RFC] find_next_bit optimizations 2013-03-11 14:22 ` Peter Lieven 2013-03-11 14:29 ` Peter Lieven @ 2013-03-11 14:35 ` Paolo Bonzini 2013-03-11 15:24 ` Peter Lieven 1 sibling, 1 reply; 22+ messages in thread From: Paolo Bonzini @ 2013-03-11 14:35 UTC (permalink / raw) To: Peter Lieven; +Cc: Orit Wasserman, qemu-devel@nongnu.org, Corentin Chary Il 11/03/2013 15:22, Peter Lieven ha scritto: > > Am 11.03.2013 um 15:14 schrieb Paolo Bonzini <pbonzini@redhat.com>: > >> Il 11/03/2013 14:44, Peter Lieven ha scritto: >>> Hi, >>> >>> I ever since had a few VMs which are very hard to migrate because of a >>> lot of memory I/O. I found that finding the next dirty bit >>> seemed to be one of the culprits (apart from removing locking which >>> Paolo is working on). >>> >>> I have to following proposal which seems to help a lot in my case. Just >>> wanted to have some feedback. >>> I applied the same unrolling idea like in buffer_is_zero(). >>> >>> Peter >>> >>> --- a/util/bitops.c >>> +++ b/util/bitops.c >>> @@ -24,12 +24,13 @@ unsigned long find_next_bit(const unsigned long >>> *addr, unsigned long size, >>> const unsigned long *p = addr + BITOP_WORD(offset); >>> unsigned long result = offset & ~(BITS_PER_LONG-1); >>> unsigned long tmp; >>> + unsigned long d0,d1,d2,d3; >>> >>> if (offset >= size) { >>> return size; >>> } >>> size -= result; >>> - offset %= BITS_PER_LONG; >>> + offset &= (BITS_PER_LONG-1); >>> if (offset) { >>> tmp = *(p++); >>> tmp &= (~0UL << offset); >>> @@ -43,6 +44,18 @@ unsigned long find_next_bit(const unsigned long >>> *addr, unsigned long size, >>> result += BITS_PER_LONG; >>> } >>> while (size & ~(BITS_PER_LONG-1)) { >>> + while (!(size & (4*BITS_PER_LONG-1))) { >> >> This really means >> >> if (!(size & (4*BITS_PER_LONG-1))) { >> while (1) { >> ... >> } >> } >> >> because the subtraction will not change the result of the "while" loop >> condition. > > Are you sure? The above is working nicely for me (wondering why ;-)) while (!(size & (4*BITS_PER_LONG-1))) => while (!(size % (4*BITS_PER_LONG)) => while ((size % (4*BITS_PER_LONG)) == 0) Subtracting 4*BITS_PER_LONG doesn't change the modulus. > I think !(size & (4*BITS_PER_LONG-1)) is the same as what you > propose. If size & (4*BITS_PER_LONG-1) is not zero its not dividable > by 4*BITS_PER_LONG. But it see it might be a problem for size == 0. In fact I'm not really sure why it works for you. :) >> What you want is probably "while (size & ~(4*BITS_PER_LONG-1))", which >> in turn means "while (size >= 4*BITS_PER_LONG). >> >> Please change both while loops to use a ">=" condition, it's easier to read. > > Good idea, its easier to understand. > >>> Please change both while loops to use a ">=" condition, it's easier to read. > > Thinking again, in case a bit is found, this might lead to unnecessary iterations > in the while loop if the bit is in d1, d2 or d3. How would that be different in your patch? But you can solve it by making two >= loops, one checking for 4*BITS_PER_LONG and one checking BITS_PER_LONG. Paolo > > Peter > >> >> Paolo >> >>> + d0 = *p; >>> + d1 = *(p+1); >>> + d2 = *(p+2); >>> + d3 = *(p+3); >>> + if (d0 || d1 || d2 || d3) { >>> + break; >>> + } >>> + p+=4; >>> + result += 4*BITS_PER_LONG; >>> + size -= 4*BITS_PER_LONG; >>> + } >>> if ((tmp = *(p++))) { >>> goto found_middle; >>> } >> > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [RFC] find_next_bit optimizations 2013-03-11 14:35 ` Paolo Bonzini @ 2013-03-11 15:24 ` Peter Lieven 2013-03-11 15:25 ` Peter Maydell ` (2 more replies) 0 siblings, 3 replies; 22+ messages in thread From: Peter Lieven @ 2013-03-11 15:24 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Orit Wasserman, qemu-devel@nongnu.org, Corentin Chary > How would that be different in your patch? But you can solve it by > making two >= loops, one checking for 4*BITS_PER_LONG and one checking > BITS_PER_LONG. This is what I have now: diff --git a/util/bitops.c b/util/bitops.c index e72237a..b0dc93f 100644 --- a/util/bitops.c +++ b/util/bitops.c @@ -24,12 +24,13 @@ unsigned long find_next_bit(const unsigned long *addr, unsigned long size, const unsigned long *p = addr + BITOP_WORD(offset); unsigned long result = offset & ~(BITS_PER_LONG-1); unsigned long tmp; + unsigned long d0,d1,d2,d3; if (offset >= size) { return size; } size -= result; - offset %= BITS_PER_LONG; + offset &= (BITS_PER_LONG-1); if (offset) { tmp = *(p++); tmp &= (~0UL << offset); @@ -42,7 +43,19 @@ unsigned long find_next_bit(const unsigned long *addr, unsigned long size, size -= BITS_PER_LONG; result += BITS_PER_LONG; } - while (size & ~(BITS_PER_LONG-1)) { + while (size >= 4*BITS_PER_LONG) { + d0 = *p; + d1 = *(p+1); + d2 = *(p+2); + d3 = *(p+3); + if (d0 || d1 || d2 || d3) { + break; + } + p+=4; + result += 4*BITS_PER_LONG; + size -= 4*BITS_PER_LONG; + } + while (size >= BITS_PER_LONG) { if ((tmp = *(p++))) { goto found_middle; } ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [RFC] find_next_bit optimizations 2013-03-11 15:24 ` Peter Lieven @ 2013-03-11 15:25 ` Peter Maydell 2013-03-11 15:29 ` Paolo Bonzini 2013-03-11 15:37 ` [Qemu-devel] [RFC] find_next_bit optimizations Peter Maydell 2 siblings, 0 replies; 22+ messages in thread From: Peter Maydell @ 2013-03-11 15:25 UTC (permalink / raw) To: Peter Lieven Cc: Paolo Bonzini, qemu-devel@nongnu.org, Corentin Chary, Orit Wasserman On 11 March 2013 15:24, Peter Lieven <pl@dlhnet.de> wrote: > - offset %= BITS_PER_LONG; > + offset &= (BITS_PER_LONG-1); Still pointless. -- PMM ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [RFC] find_next_bit optimizations 2013-03-11 15:24 ` Peter Lieven 2013-03-11 15:25 ` Peter Maydell @ 2013-03-11 15:29 ` Paolo Bonzini 2013-03-11 15:37 ` Peter Lieven 2013-03-11 15:37 ` [Qemu-devel] [RFC] find_next_bit optimizations Peter Maydell 2 siblings, 1 reply; 22+ messages in thread From: Paolo Bonzini @ 2013-03-11 15:29 UTC (permalink / raw) To: Peter Lieven; +Cc: Orit Wasserman, qemu-devel@nongnu.org, Corentin Chary Il 11/03/2013 16:24, Peter Lieven ha scritto: > >> How would that be different in your patch? But you can solve it by >> making two >= loops, one checking for 4*BITS_PER_LONG and one checking >> BITS_PER_LONG. > > This is what I have now: > > diff --git a/util/bitops.c b/util/bitops.c > index e72237a..b0dc93f 100644 > --- a/util/bitops.c > +++ b/util/bitops.c > @@ -24,12 +24,13 @@ unsigned long find_next_bit(const unsigned long *addr, unsigned long size, > const unsigned long *p = addr + BITOP_WORD(offset); > unsigned long result = offset & ~(BITS_PER_LONG-1); > unsigned long tmp; > + unsigned long d0,d1,d2,d3; > > if (offset >= size) { > return size; > } > size -= result; > - offset %= BITS_PER_LONG; > + offset &= (BITS_PER_LONG-1); > if (offset) { > tmp = *(p++); > tmp &= (~0UL << offset); > @@ -42,7 +43,19 @@ unsigned long find_next_bit(const unsigned long *addr, unsigned long size, > size -= BITS_PER_LONG; > result += BITS_PER_LONG; > } > - while (size & ~(BITS_PER_LONG-1)) { > + while (size >= 4*BITS_PER_LONG) { > + d0 = *p; > + d1 = *(p+1); > + d2 = *(p+2); > + d3 = *(p+3); > + if (d0 || d1 || d2 || d3) { > + break; > + } > + p+=4; > + result += 4*BITS_PER_LONG; > + size -= 4*BITS_PER_LONG; > + } > + while (size >= BITS_PER_LONG) { > if ((tmp = *(p++))) { > goto found_middle; > } > Minus the %= vs. &=, Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Perhaps: tmp = *p; d1 = *(p+1); d2 = *(p+2); d3 = *(p+3); if (tmp) { goto found_middle; } if (d1 || d2 || d3) { break; } Paolo ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [RFC] find_next_bit optimizations 2013-03-11 15:29 ` Paolo Bonzini @ 2013-03-11 15:37 ` Peter Lieven 2013-03-11 15:58 ` Paolo Bonzini 0 siblings, 1 reply; 22+ messages in thread From: Peter Lieven @ 2013-03-11 15:37 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Orit Wasserman, qemu-devel@nongnu.org, Corentin Chary Am 11.03.2013 um 16:29 schrieb Paolo Bonzini <pbonzini@redhat.com>: > Il 11/03/2013 16:24, Peter Lieven ha scritto: >> >>> How would that be different in your patch? But you can solve it by >>> making two >= loops, one checking for 4*BITS_PER_LONG and one checking >>> BITS_PER_LONG. >> >> This is what I have now: >> >> diff --git a/util/bitops.c b/util/bitops.c >> index e72237a..b0dc93f 100644 >> --- a/util/bitops.c >> +++ b/util/bitops.c >> @@ -24,12 +24,13 @@ unsigned long find_next_bit(const unsigned long *addr, unsigned long size, >> const unsigned long *p = addr + BITOP_WORD(offset); >> unsigned long result = offset & ~(BITS_PER_LONG-1); >> unsigned long tmp; >> + unsigned long d0,d1,d2,d3; >> >> if (offset >= size) { >> return size; >> } >> size -= result; >> - offset %= BITS_PER_LONG; >> + offset &= (BITS_PER_LONG-1); >> if (offset) { >> tmp = *(p++); >> tmp &= (~0UL << offset); >> @@ -42,7 +43,19 @@ unsigned long find_next_bit(const unsigned long *addr, unsigned long size, >> size -= BITS_PER_LONG; >> result += BITS_PER_LONG; >> } >> - while (size & ~(BITS_PER_LONG-1)) { >> + while (size >= 4*BITS_PER_LONG) { >> + d0 = *p; >> + d1 = *(p+1); >> + d2 = *(p+2); >> + d3 = *(p+3); >> + if (d0 || d1 || d2 || d3) { >> + break; >> + } >> + p+=4; >> + result += 4*BITS_PER_LONG; >> + size -= 4*BITS_PER_LONG; >> + } >> + while (size >= BITS_PER_LONG) { >> if ((tmp = *(p++))) { >> goto found_middle; >> } >> > > Minus the %= vs. &=, > > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> > > Perhaps: > > tmp = *p; > d1 = *(p+1); > d2 = *(p+2); > d3 = *(p+3); > if (tmp) { > goto found_middle; > } > if (d1 || d2 || d3) { > break; > } i do not know what gcc interally makes of the d0 || d1 || d2 || d3 ? i would guess its sth like one addition w/ carry and 1 test? your proposed change would introduce 2 tests (maybe)? what about this to be sure? tmp = *p; d1 = *(p+1); d2 = *(p+2); d3 = *(p+3); if (tmp || d1 || d2 || d3) { if (tmp) { goto found_middle; } break; } Peter ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [RFC] find_next_bit optimizations 2013-03-11 15:37 ` Peter Lieven @ 2013-03-11 15:58 ` Paolo Bonzini 2013-03-11 17:06 ` ronnie sahlberg 0 siblings, 1 reply; 22+ messages in thread From: Paolo Bonzini @ 2013-03-11 15:58 UTC (permalink / raw) To: Peter Lieven; +Cc: Orit Wasserman, qemu-devel@nongnu.org, Corentin Chary Il 11/03/2013 16:37, Peter Lieven ha scritto: > > Am 11.03.2013 um 16:29 schrieb Paolo Bonzini <pbonzini@redhat.com>: > >> Il 11/03/2013 16:24, Peter Lieven ha scritto: >>> >>>> How would that be different in your patch? But you can solve it by >>>> making two >= loops, one checking for 4*BITS_PER_LONG and one checking >>>> BITS_PER_LONG. >>> >>> This is what I have now: >>> >>> diff --git a/util/bitops.c b/util/bitops.c >>> index e72237a..b0dc93f 100644 >>> --- a/util/bitops.c >>> +++ b/util/bitops.c >>> @@ -24,12 +24,13 @@ unsigned long find_next_bit(const unsigned long *addr, unsigned long size, >>> const unsigned long *p = addr + BITOP_WORD(offset); >>> unsigned long result = offset & ~(BITS_PER_LONG-1); >>> unsigned long tmp; >>> + unsigned long d0,d1,d2,d3; >>> >>> if (offset >= size) { >>> return size; >>> } >>> size -= result; >>> - offset %= BITS_PER_LONG; >>> + offset &= (BITS_PER_LONG-1); >>> if (offset) { >>> tmp = *(p++); >>> tmp &= (~0UL << offset); >>> @@ -42,7 +43,19 @@ unsigned long find_next_bit(const unsigned long *addr, unsigned long size, >>> size -= BITS_PER_LONG; >>> result += BITS_PER_LONG; >>> } >>> - while (size & ~(BITS_PER_LONG-1)) { >>> + while (size >= 4*BITS_PER_LONG) { >>> + d0 = *p; >>> + d1 = *(p+1); >>> + d2 = *(p+2); >>> + d3 = *(p+3); >>> + if (d0 || d1 || d2 || d3) { >>> + break; >>> + } >>> + p+=4; >>> + result += 4*BITS_PER_LONG; >>> + size -= 4*BITS_PER_LONG; >>> + } >>> + while (size >= BITS_PER_LONG) { >>> if ((tmp = *(p++))) { >>> goto found_middle; >>> } >>> >> >> Minus the %= vs. &=, >> >> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> >> >> Perhaps: >> >> tmp = *p; >> d1 = *(p+1); >> d2 = *(p+2); >> d3 = *(p+3); >> if (tmp) { >> goto found_middle; >> } >> if (d1 || d2 || d3) { >> break; >> } > > i do not know what gcc interally makes of the d0 || d1 || d2 || d3 ? It depends on the target and how expensive branches are. > i would guess its sth like one addition w/ carry and 1 test? It could be either 4 compare-and-jump sequences, or 3 bitwise ORs followed by a compare-and-jump. That is, either: test %r8, %r8 jz second_loop test %r9, %r9 jz second_loop test %r10, %r10 jz second_loop test %r11, %r11 jz second_loop or or %r9, %r8 or %r11, %r10 or %r8, %r10 jz second_loop Don't let the length of the code fool you. The processor knows how to optimize all of these, and GCC knows too. > your proposed change would introduce 2 tests (maybe)? Yes, but I expect they to be fairly well predicted. > what about this to be sure? > > tmp = *p; > d1 = *(p+1); > d2 = *(p+2); > d3 = *(p+3); > if (tmp || d1 || d2 || d3) { > if (tmp) { > goto found_middle; I suspect that GCC would rewrite it my version (definitely if it produces 4 compare-and-jumps; but possibly it does it even if it goes for bitwise ORs, I haven't checked. Regarding your other question ("one last thought. would it make sense to update only `size`in the while loops and compute the `result` at the end as `orgsize` - `size`?"), again the compiler knows better and might even do this for you. It will likely drop the p increases and use p[result], so if you do that change you may even get the same code, only this time p is increased and you get an extra subtraction at the end. :) Bottom line: don't try to outsmart an optimizing C compiler on micro-optimization, unless you have benchmarked it and it shows there is a problem. Paolo > } > break; > } > > Peter > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [RFC] find_next_bit optimizations 2013-03-11 15:58 ` Paolo Bonzini @ 2013-03-11 17:06 ` ronnie sahlberg 2013-03-11 17:07 ` Paolo Bonzini 0 siblings, 1 reply; 22+ messages in thread From: ronnie sahlberg @ 2013-03-11 17:06 UTC (permalink / raw) To: Paolo Bonzini Cc: Orit Wasserman, Peter Lieven, qemu-devel@nongnu.org, Corentin Chary Even more efficient might be to do bitwise instead of logical or > if (tmp | d1 | d2 | d3) { that should remove 3 of the 4 conditional jumps and should become 3 bitwise ors and one conditional jump On Mon, Mar 11, 2013 at 8:58 AM, Paolo Bonzini <pbonzini@redhat.com> wrote: > Il 11/03/2013 16:37, Peter Lieven ha scritto: >> >> Am 11.03.2013 um 16:29 schrieb Paolo Bonzini <pbonzini@redhat.com>: >> >>> Il 11/03/2013 16:24, Peter Lieven ha scritto: >>>> >>>>> How would that be different in your patch? But you can solve it by >>>>> making two >= loops, one checking for 4*BITS_PER_LONG and one checking >>>>> BITS_PER_LONG. >>>> >>>> This is what I have now: >>>> >>>> diff --git a/util/bitops.c b/util/bitops.c >>>> index e72237a..b0dc93f 100644 >>>> --- a/util/bitops.c >>>> +++ b/util/bitops.c >>>> @@ -24,12 +24,13 @@ unsigned long find_next_bit(const unsigned long *addr, unsigned long size, >>>> const unsigned long *p = addr + BITOP_WORD(offset); >>>> unsigned long result = offset & ~(BITS_PER_LONG-1); >>>> unsigned long tmp; >>>> + unsigned long d0,d1,d2,d3; >>>> >>>> if (offset >= size) { >>>> return size; >>>> } >>>> size -= result; >>>> - offset %= BITS_PER_LONG; >>>> + offset &= (BITS_PER_LONG-1); >>>> if (offset) { >>>> tmp = *(p++); >>>> tmp &= (~0UL << offset); >>>> @@ -42,7 +43,19 @@ unsigned long find_next_bit(const unsigned long *addr, unsigned long size, >>>> size -= BITS_PER_LONG; >>>> result += BITS_PER_LONG; >>>> } >>>> - while (size & ~(BITS_PER_LONG-1)) { >>>> + while (size >= 4*BITS_PER_LONG) { >>>> + d0 = *p; >>>> + d1 = *(p+1); >>>> + d2 = *(p+2); >>>> + d3 = *(p+3); >>>> + if (d0 || d1 || d2 || d3) { >>>> + break; >>>> + } >>>> + p+=4; >>>> + result += 4*BITS_PER_LONG; >>>> + size -= 4*BITS_PER_LONG; >>>> + } >>>> + while (size >= BITS_PER_LONG) { >>>> if ((tmp = *(p++))) { >>>> goto found_middle; >>>> } >>>> >>> >>> Minus the %= vs. &=, >>> >>> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> >>> >>> Perhaps: >>> >>> tmp = *p; >>> d1 = *(p+1); >>> d2 = *(p+2); >>> d3 = *(p+3); >>> if (tmp) { >>> goto found_middle; >>> } >>> if (d1 || d2 || d3) { >>> break; >>> } >> >> i do not know what gcc interally makes of the d0 || d1 || d2 || d3 ? > > It depends on the target and how expensive branches are. > >> i would guess its sth like one addition w/ carry and 1 test? > > It could be either 4 compare-and-jump sequences, or 3 bitwise ORs > followed by a compare-and-jump. > > That is, either: > > test %r8, %r8 > jz second_loop > test %r9, %r9 > jz second_loop > test %r10, %r10 > jz second_loop > test %r11, %r11 > jz second_loop > > or > > or %r9, %r8 > or %r11, %r10 > or %r8, %r10 > jz second_loop > > Don't let the length of the code fool you. The processor knows how to > optimize all of these, and GCC knows too. > >> your proposed change would introduce 2 tests (maybe)? > > Yes, but I expect they to be fairly well predicted. > >> what about this to be sure? >> >> tmp = *p; >> d1 = *(p+1); >> d2 = *(p+2); >> d3 = *(p+3); >> if (tmp || d1 || d2 || d3) { >> if (tmp) { >> goto found_middle; > > I suspect that GCC would rewrite it my version (definitely if it > produces 4 compare-and-jumps; but possibly it does it even if it goes > for bitwise ORs, I haven't checked. > > Regarding your other question ("one last thought. would it make sense to > update only `size`in the while loops and compute the `result` at the end > as `orgsize` - `size`?"), again the compiler knows better and might even > do this for you. It will likely drop the p increases and use p[result], > so if you do that change you may even get the same code, only this time > p is increased and you get an extra subtraction at the end. :) > > Bottom line: don't try to outsmart an optimizing C compiler on > micro-optimization, unless you have benchmarked it and it shows there is > a problem. > > Paolo > >> } >> break; >> } >> >> Peter >> > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [RFC] find_next_bit optimizations 2013-03-11 17:06 ` ronnie sahlberg @ 2013-03-11 17:07 ` Paolo Bonzini 2013-03-11 18:20 ` Peter Lieven 2013-03-12 7:32 ` [Qemu-devel] [PATCH] bitops: unroll while loop in find_next_bit() Peter Lieven 0 siblings, 2 replies; 22+ messages in thread From: Paolo Bonzini @ 2013-03-11 17:07 UTC (permalink / raw) To: ronnie sahlberg Cc: Orit Wasserman, Peter Lieven, qemu-devel@nongnu.org, Corentin Chary Il 11/03/2013 18:06, ronnie sahlberg ha scritto: > Even more efficient might be to do bitwise instead of logical or > >> > if (tmp | d1 | d2 | d3) { > that should remove 3 of the 4 conditional jumps > and should become 3 bitwise ors and one conditional jump Without any serious profiling, please let the compiler do that. Paolo ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [RFC] find_next_bit optimizations 2013-03-11 17:07 ` Paolo Bonzini @ 2013-03-11 18:20 ` Peter Lieven 2013-03-12 7:32 ` [Qemu-devel] [PATCH] bitops: unroll while loop in find_next_bit() Peter Lieven 1 sibling, 0 replies; 22+ messages in thread From: Peter Lieven @ 2013-03-11 18:20 UTC (permalink / raw) To: Paolo Bonzini Cc: Orit Wasserman, qemu-devel@nongnu.org, ronnie sahlberg, Corentin Chary Am 11.03.2013 18:07, schrieb Paolo Bonzini: > Il 11/03/2013 18:06, ronnie sahlberg ha scritto: >> Even more efficient might be to do bitwise instead of logical or >> >>>> if (tmp | d1 | d2 | d3) { >> that should remove 3 of the 4 conditional jumps >> and should become 3 bitwise ors and one conditional jump > > Without any serious profiling, please let the compiler do that. Paolo is right, i ran some tests with gcc 4.6.3 on x86_64 (with -O3) and tried the various ideas. They all made no significant difference. Even unrolling to 8 unsigned longs didn't change anything. What I tried is running 1^20 interations of find_next_bit(bitfield,4194304,0); I choosed the bitfield to be 4MByte which equals a 16GB VM. The bitfield was complete zeroed so find_next_bit had to run completely through the bitfield. The original version took 1 minute and 10 seconds whereas all other took approx. 37-38 seconds which is almost a 100% boost ;-) So I think this here is the final version: while (size >= 4*BITS_PER_LONG) { unsigned long d1, d2, d3; tmp = *p; d1 = *(p+1); d2 = *(p+2); d3 = *(p+3); if (tmp) { goto found_middle; } if (d1 || d2 || d3) { break; } p += 4; result += 4*BITS_PER_LONG; size -= 4*BITS_PER_LONG; } Peter ^ permalink raw reply [flat|nested] 22+ messages in thread
* [Qemu-devel] [PATCH] bitops: unroll while loop in find_next_bit(). 2013-03-11 17:07 ` Paolo Bonzini 2013-03-11 18:20 ` Peter Lieven @ 2013-03-12 7:32 ` Peter Lieven 1 sibling, 0 replies; 22+ messages in thread From: Peter Lieven @ 2013-03-12 7:32 UTC (permalink / raw) To: qemu-devel@nongnu.org Cc: peter.maydell, Paolo Bonzini, Corentin Chary, ronnie sahlberg, Orit Wasserman this patch adopts the loop unrolling idea of bitmap_is_zero() to speed up the skipping of large areas with zeros in find_next_bit(). this routine is extensively used to find dirty pages in live migration. testing only the find_next_bit performance on a zeroed bitfield the loop onrolling decreased executing time by approx. 50% on x86_64. Signed-off-by: Peter Lieven <pl@kamp.de> --- util/bitops.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/util/bitops.c b/util/bitops.c index e72237a..227c38b 100644 --- a/util/bitops.c +++ b/util/bitops.c @@ -42,7 +42,23 @@ unsigned long find_next_bit(const unsigned long *addr, unsigned long size, size -= BITS_PER_LONG; result += BITS_PER_LONG; } - while (size & ~(BITS_PER_LONG-1)) { + while (size >= 4*BITS_PER_LONG) { + unsigned long d1, d2, d3; + tmp = *p; + d1 = *(p+1); + d2 = *(p+2); + d3 = *(p+3); + if (tmp) { + goto found_middle; + } + if (d1 | d2 | d3) { + break; + } + p += 4; + result += 4*BITS_PER_LONG; + size -= 4*BITS_PER_LONG; + } + while (size >= BITS_PER_LONG) { if ((tmp = *(p++))) { goto found_middle; } -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [RFC] find_next_bit optimizations 2013-03-11 15:24 ` Peter Lieven 2013-03-11 15:25 ` Peter Maydell 2013-03-11 15:29 ` Paolo Bonzini @ 2013-03-11 15:37 ` Peter Maydell 2013-03-11 15:41 ` Peter Lieven 2 siblings, 1 reply; 22+ messages in thread From: Peter Maydell @ 2013-03-11 15:37 UTC (permalink / raw) To: Peter Lieven Cc: Paolo Bonzini, qemu-devel@nongnu.org, Corentin Chary, Orit Wasserman On 11 March 2013 15:24, Peter Lieven <pl@dlhnet.de> wrote: > + unsigned long d0,d1,d2,d3; These commas should have spaces after them. Also, since the variables are only used inside the scope of your newly added while loop: > - while (size & ~(BITS_PER_LONG-1)) { > + while (size >= 4*BITS_PER_LONG) { it would be better to declare them here. > + d0 = *p; > + d1 = *(p+1); > + d2 = *(p+2); > + d3 = *(p+3); > + if (d0 || d1 || d2 || d3) { > + break; > + } > + p+=4; > + result += 4*BITS_PER_LONG; > + size -= 4*BITS_PER_LONG; > + } > + while (size >= BITS_PER_LONG) { > if ((tmp = *(p++))) { > goto found_middle; > } thanks -- PMM ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [RFC] find_next_bit optimizations 2013-03-11 15:37 ` [Qemu-devel] [RFC] find_next_bit optimizations Peter Maydell @ 2013-03-11 15:41 ` Peter Lieven 2013-03-11 15:42 ` Paolo Bonzini 0 siblings, 1 reply; 22+ messages in thread From: Peter Lieven @ 2013-03-11 15:41 UTC (permalink / raw) To: Peter Maydell Cc: Paolo Bonzini, qemu-devel@nongnu.org, Corentin Chary, Orit Wasserman Am 11.03.2013 um 16:37 schrieb Peter Maydell <peter.maydell@linaro.org>: > On 11 March 2013 15:24, Peter Lieven <pl@dlhnet.de> wrote: >> + unsigned long d0,d1,d2,d3; > > These commas should have spaces after them. Also, since > the variables are only used inside the scope of your > newly added while loop: > >> - while (size & ~(BITS_PER_LONG-1)) { >> + while (size >= 4*BITS_PER_LONG) { > > it would be better to declare them here. can you verify if this does not make difference in the generated object code? in buffer_is_zero() its outside the loop. thanks peter ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [RFC] find_next_bit optimizations 2013-03-11 15:41 ` Peter Lieven @ 2013-03-11 15:42 ` Paolo Bonzini 2013-03-11 15:48 ` Peter Lieven 0 siblings, 1 reply; 22+ messages in thread From: Paolo Bonzini @ 2013-03-11 15:42 UTC (permalink / raw) To: Peter Lieven Cc: Peter Maydell, qemu-devel@nongnu.org, Corentin Chary, Orit Wasserman Il 11/03/2013 16:41, Peter Lieven ha scritto: > can you verify if this does not make difference in the generated object code? > in buffer_is_zero() its outside the loop. No, it doesn't. Paolo ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [RFC] find_next_bit optimizations 2013-03-11 15:42 ` Paolo Bonzini @ 2013-03-11 15:48 ` Peter Lieven 0 siblings, 0 replies; 22+ messages in thread From: Peter Lieven @ 2013-03-11 15:48 UTC (permalink / raw) To: Paolo Bonzini Cc: Peter Maydell, qemu-devel@nongnu.org, Corentin Chary, Orit Wasserman Am 11.03.2013 um 16:42 schrieb Paolo Bonzini <pbonzini@redhat.com>: > Il 11/03/2013 16:41, Peter Lieven ha scritto: >> can you verify if this does not make difference in the generated object code? >> in buffer_is_zero() its outside the loop. > > No, it doesn't. ok, i will sent the final patch tomorrow. one last thought. would it make sense to update only `size`in the while loops and compute the `result` at the end as `orgsize` - `size`? Peter ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [RFC] find_next_bit optimizations 2013-03-11 13:44 [Qemu-devel] [RFC] find_next_bit optimizations Peter Lieven 2013-03-11 14:04 ` Peter Maydell 2013-03-11 14:14 ` Paolo Bonzini @ 2013-03-12 8:35 ` Stefan Hajnoczi 2013-03-12 8:41 ` Peter Lieven 2 siblings, 1 reply; 22+ messages in thread From: Stefan Hajnoczi @ 2013-03-12 8:35 UTC (permalink / raw) To: Peter Lieven Cc: Orit Wasserman, qemu-devel@nongnu.org, Corentin Chary, Paolo Bonzini On Mon, Mar 11, 2013 at 02:44:03PM +0100, Peter Lieven wrote: > I ever since had a few VMs which are very hard to migrate because of a lot of memory I/O. I found that finding the next dirty bit > seemed to be one of the culprits (apart from removing locking which Paolo is working on). > > I have to following proposal which seems to help a lot in my case. Just wanted to have some feedback. Hi Peter, Do you have any performance numbers for this patch? I'm just curious how big the win is. Stefan ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [RFC] find_next_bit optimizations 2013-03-12 8:35 ` Stefan Hajnoczi @ 2013-03-12 8:41 ` Peter Lieven 2013-03-12 15:12 ` Stefan Hajnoczi 0 siblings, 1 reply; 22+ messages in thread From: Peter Lieven @ 2013-03-12 8:41 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Orit Wasserman, qemu-devel@nongnu.org, Corentin Chary, Paolo Bonzini Am 12.03.2013 um 09:35 schrieb Stefan Hajnoczi <stefanha@gmail.com>: > On Mon, Mar 11, 2013 at 02:44:03PM +0100, Peter Lieven wrote: >> I ever since had a few VMs which are very hard to migrate because of a lot of memory I/O. I found that finding the next dirty bit >> seemed to be one of the culprits (apart from removing locking which Paolo is working on). >> >> I have to following proposal which seems to help a lot in my case. Just wanted to have some feedback. > > Hi Peter, > Do you have any performance numbers for this patch? I'm just curious > how big the win is. Hi Stefan, please see my recent email to the list with the final patch. The win is up to 100%. Worst case execution time (whole array is zero) is halved on x86_64. Peter > > Stefan ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [RFC] find_next_bit optimizations 2013-03-12 8:41 ` Peter Lieven @ 2013-03-12 15:12 ` Stefan Hajnoczi 0 siblings, 0 replies; 22+ messages in thread From: Stefan Hajnoczi @ 2013-03-12 15:12 UTC (permalink / raw) To: Peter Lieven Cc: Orit Wasserman, qemu-devel@nongnu.org, Corentin Chary, Paolo Bonzini On Tue, Mar 12, 2013 at 09:41:04AM +0100, Peter Lieven wrote: > > Am 12.03.2013 um 09:35 schrieb Stefan Hajnoczi <stefanha@gmail.com>: > > > On Mon, Mar 11, 2013 at 02:44:03PM +0100, Peter Lieven wrote: > >> I ever since had a few VMs which are very hard to migrate because of a lot of memory I/O. I found that finding the next dirty bit > >> seemed to be one of the culprits (apart from removing locking which Paolo is working on). > >> > >> I have to following proposal which seems to help a lot in my case. Just wanted to have some feedback. > > > > Hi Peter, > > Do you have any performance numbers for this patch? I'm just curious > > how big the win is. > > Hi Stefan, > > please see my recent email to the list with the final patch. > The win is up to 100%. Worst case execution time (whole > array is zero) is halved on x86_64. Thanks! Stefan ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2013-03-12 15:12 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-03-11 13:44 [Qemu-devel] [RFC] find_next_bit optimizations Peter Lieven 2013-03-11 14:04 ` Peter Maydell 2013-03-11 14:14 ` Paolo Bonzini 2013-03-11 14:22 ` Peter Lieven 2013-03-11 14:29 ` Peter Lieven 2013-03-11 14:35 ` Paolo Bonzini 2013-03-11 15:24 ` Peter Lieven 2013-03-11 15:25 ` Peter Maydell 2013-03-11 15:29 ` Paolo Bonzini 2013-03-11 15:37 ` Peter Lieven 2013-03-11 15:58 ` Paolo Bonzini 2013-03-11 17:06 ` ronnie sahlberg 2013-03-11 17:07 ` Paolo Bonzini 2013-03-11 18:20 ` Peter Lieven 2013-03-12 7:32 ` [Qemu-devel] [PATCH] bitops: unroll while loop in find_next_bit() Peter Lieven 2013-03-11 15:37 ` [Qemu-devel] [RFC] find_next_bit optimizations Peter Maydell 2013-03-11 15:41 ` Peter Lieven 2013-03-11 15:42 ` Paolo Bonzini 2013-03-11 15:48 ` Peter Lieven 2013-03-12 8:35 ` Stefan Hajnoczi 2013-03-12 8:41 ` Peter Lieven 2013-03-12 15:12 ` Stefan Hajnoczi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).