* [Qemu-devel] [PATCH v3 0/1] ARM64: Live migration optimization
@ 2016-06-29 8:47 vijayak
2016-06-29 8:47 ` [Qemu-devel] [PATCH v3 1/1] target-arm: Use Neon for zero checking vijayak
0 siblings, 1 reply; 8+ messages in thread
From: vijayak @ 2016-06-29 8:47 UTC (permalink / raw)
To: qemu-arm, peter.maydell, pbonzini
Cc: qemu-devel, Prasun.Kapoor, vijay.kilari, Vijaya Kumar K
From: Vijaya Kumar K <vijayak@caviumnetworks.com>
To optimize Live migration time on ARM64 machine,
Neon instructions are used for Zero page checking.
With these changes, total migration time comes down
from 3.5 seconds to 2.9 seconds.
These patches are tested on top of (GICv3 live migration support)
https://lists.gnu.org/archive/html/qemu-devel/2015-10/msg05284.html
However there is no direct dependency on these patches.
v2 -> v3 changes:
- Dropped Thunderx specific patches(2) from this series. Will
be added on kernel exposing midr register to userspace.
- Used generic zero page checking function. Only macros
are updated.
v1 -> v2 changes:
----------------
- Dropped 'target-arm: Update page size for aarch64' patch.
- Each loop in zero buffer check function is reduced to
16 from 32.
- Replaced vorrq_u64 with '|' in Neon macros
- Renamed local variable to reflect 128 bit.
- Introduced new file cpuinfo.c to parse /proc/cpuinfo
- Added Thunderx specific patches to add prefetch in
zero buffer check function.
Vijay (1):
target-arm: Use Neon for zero checking
util/cutils.c | 7 +++++++
1 file changed, 7 insertions(+)
--
1.7.9.5
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Qemu-devel] [PATCH v3 1/1] target-arm: Use Neon for zero checking
2016-06-29 8:47 [Qemu-devel] [PATCH v3 0/1] ARM64: Live migration optimization vijayak
@ 2016-06-29 8:47 ` vijayak
2016-06-29 12:53 ` Paolo Bonzini
2016-06-30 13:45 ` Peter Maydell
0 siblings, 2 replies; 8+ messages in thread
From: vijayak @ 2016-06-29 8:47 UTC (permalink / raw)
To: qemu-arm, peter.maydell, pbonzini
Cc: qemu-devel, Prasun.Kapoor, vijay.kilari, Vijay, Suresh
From: Vijay <vijayak@cavium.com>
Use Neon instructions to perform zero checking of
buffer. This is helps in reducing total migration time.
Use case: Idle VM live migration with 4 VCPUS and 8GB ram
running CentOS 7.
Without Neon, the Total migration time is 3.5 Sec
Migration status: completed
total time: 3560 milliseconds
downtime: 33 milliseconds
setup: 5 milliseconds
transferred ram: 297907 kbytes
throughput: 685.76 mbps
remaining ram: 0 kbytes
total ram: 8519872 kbytes
duplicate: 2062760 pages
skipped: 0 pages
normal: 69808 pages
normal bytes: 279232 kbytes
dirty sync count: 3
With Neon, the total migration time is 2.9 Sec
Migration status: completed
total time: 2960 milliseconds
downtime: 65 milliseconds
setup: 4 milliseconds
transferred ram: 299869 kbytes
throughput: 830.19 mbps
remaining ram: 0 kbytes
total ram: 8519872 kbytes
duplicate: 2064313 pages
skipped: 0 pages
normal: 70294 pages
normal bytes: 281176 kbytes
dirty sync count: 3
Signed-off-by: Vijaya Kumar K <vijayak@cavium.com>
Signed-off-by: Suresh <ksuresh@cavium.com>
---
util/cutils.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/util/cutils.c b/util/cutils.c
index 5830a68..4779403 100644
--- a/util/cutils.c
+++ b/util/cutils.c
@@ -184,6 +184,13 @@ int qemu_fdatasync(int fd)
#define SPLAT(p) _mm_set1_epi8(*(p))
#define ALL_EQ(v1, v2) (_mm_movemask_epi8(_mm_cmpeq_epi8(v1, v2)) == 0xFFFF)
#define VEC_OR(v1, v2) (_mm_or_si128(v1, v2))
+#elif __aarch64__
+#include "arm_neon.h"
+#define VECTYPE uint64x2_t
+#define ALL_EQ(v1, v2) \
+ ((vgetq_lane_u64(v1, 0) == vgetq_lane_u64(v2, 0)) && \
+ (vgetq_lane_u64(v1, 1) == vgetq_lane_u64(v2, 1)))
+#define VEC_OR(v1, v2) ((v1) | (v2))
#else
#define VECTYPE unsigned long
#define SPLAT(p) (*(p) * (~0UL / 255))
--
1.7.9.5
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH v3 1/1] target-arm: Use Neon for zero checking
2016-06-29 8:47 ` [Qemu-devel] [PATCH v3 1/1] target-arm: Use Neon for zero checking vijayak
@ 2016-06-29 12:53 ` Paolo Bonzini
2016-06-30 13:45 ` Peter Maydell
1 sibling, 0 replies; 8+ messages in thread
From: Paolo Bonzini @ 2016-06-29 12:53 UTC (permalink / raw)
To: vijayak, qemu-arm, peter.maydell
Cc: qemu-devel, Prasun.Kapoor, vijay.kilari, Suresh
On 29/06/2016 10:47, vijayak@cavium.com wrote:
> From: Vijay <vijayak@cavium.com>
>
> Use Neon instructions to perform zero checking of
> buffer. This is helps in reducing total migration time.
>
> Use case: Idle VM live migration with 4 VCPUS and 8GB ram
> running CentOS 7.
>
> Without Neon, the Total migration time is 3.5 Sec
>
> Migration status: completed
> total time: 3560 milliseconds
> downtime: 33 milliseconds
> setup: 5 milliseconds
> transferred ram: 297907 kbytes
> throughput: 685.76 mbps
> remaining ram: 0 kbytes
> total ram: 8519872 kbytes
> duplicate: 2062760 pages
> skipped: 0 pages
> normal: 69808 pages
> normal bytes: 279232 kbytes
> dirty sync count: 3
>
> With Neon, the total migration time is 2.9 Sec
>
> Migration status: completed
> total time: 2960 milliseconds
> downtime: 65 milliseconds
> setup: 4 milliseconds
> transferred ram: 299869 kbytes
> throughput: 830.19 mbps
> remaining ram: 0 kbytes
> total ram: 8519872 kbytes
> duplicate: 2064313 pages
> skipped: 0 pages
> normal: 70294 pages
> normal bytes: 281176 kbytes
> dirty sync count: 3
>
> Signed-off-by: Vijaya Kumar K <vijayak@cavium.com>
> Signed-off-by: Suresh <ksuresh@cavium.com>
> ---
> util/cutils.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/util/cutils.c b/util/cutils.c
> index 5830a68..4779403 100644
> --- a/util/cutils.c
> +++ b/util/cutils.c
> @@ -184,6 +184,13 @@ int qemu_fdatasync(int fd)
> #define SPLAT(p) _mm_set1_epi8(*(p))
> #define ALL_EQ(v1, v2) (_mm_movemask_epi8(_mm_cmpeq_epi8(v1, v2)) == 0xFFFF)
> #define VEC_OR(v1, v2) (_mm_or_si128(v1, v2))
> +#elif __aarch64__
> +#include "arm_neon.h"
> +#define VECTYPE uint64x2_t
> +#define ALL_EQ(v1, v2) \
> + ((vgetq_lane_u64(v1, 0) == vgetq_lane_u64(v2, 0)) && \
> + (vgetq_lane_u64(v1, 1) == vgetq_lane_u64(v2, 1)))
> +#define VEC_OR(v1, v2) ((v1) | (v2))
> #else
> #define VECTYPE unsigned long
> #define SPLAT(p) (*(p) * (~0UL / 255))
>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH v3 1/1] target-arm: Use Neon for zero checking
2016-06-29 8:47 ` [Qemu-devel] [PATCH v3 1/1] target-arm: Use Neon for zero checking vijayak
2016-06-29 12:53 ` Paolo Bonzini
@ 2016-06-30 13:45 ` Peter Maydell
2016-07-01 22:07 ` Richard Henderson
1 sibling, 1 reply; 8+ messages in thread
From: Peter Maydell @ 2016-06-30 13:45 UTC (permalink / raw)
To: Vijay
Cc: qemu-arm, Paolo Bonzini, QEMU Developers, Prasun.Kapoor,
Vijay Kilari, Suresh
On 29 June 2016 at 09:47, <vijayak@cavium.com> wrote:
> From: Vijay <vijayak@cavium.com>
>
> Use Neon instructions to perform zero checking of
> buffer. This is helps in reducing total migration time.
> diff --git a/util/cutils.c b/util/cutils.c
> index 5830a68..4779403 100644
> --- a/util/cutils.c
> +++ b/util/cutils.c
> @@ -184,6 +184,13 @@ int qemu_fdatasync(int fd)
> #define SPLAT(p) _mm_set1_epi8(*(p))
> #define ALL_EQ(v1, v2) (_mm_movemask_epi8(_mm_cmpeq_epi8(v1, v2)) == 0xFFFF)
> #define VEC_OR(v1, v2) (_mm_or_si128(v1, v2))
> +#elif __aarch64__
> +#include "arm_neon.h"
> +#define VECTYPE uint64x2_t
> +#define ALL_EQ(v1, v2) \
> + ((vgetq_lane_u64(v1, 0) == vgetq_lane_u64(v2, 0)) && \
> + (vgetq_lane_u64(v1, 1) == vgetq_lane_u64(v2, 1)))
> +#define VEC_OR(v1, v2) ((v1) | (v2))
Should be '#elif defined(__aarch64__)'. I have made this
tweak and put this patch in target-arm.next.
thanks
-- PMM
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH v3 1/1] target-arm: Use Neon for zero checking
2016-06-30 13:45 ` Peter Maydell
@ 2016-07-01 22:07 ` Richard Henderson
2016-07-02 9:42 ` Peter Maydell
2016-07-05 12:24 ` Vijay Kilari
0 siblings, 2 replies; 8+ messages in thread
From: Richard Henderson @ 2016-07-01 22:07 UTC (permalink / raw)
To: Peter Maydell, Vijay
Cc: Prasun.Kapoor, Vijay Kilari, Suresh, QEMU Developers, qemu-arm,
Paolo Bonzini
On 06/30/2016 06:45 AM, Peter Maydell wrote:
> On 29 June 2016 at 09:47, <vijayak@cavium.com> wrote:
>> From: Vijay <vijayak@cavium.com>
>>
>> Use Neon instructions to perform zero checking of
>> buffer. This is helps in reducing total migration time.
>
>> diff --git a/util/cutils.c b/util/cutils.c
>> index 5830a68..4779403 100644
>> --- a/util/cutils.c
>> +++ b/util/cutils.c
>> @@ -184,6 +184,13 @@ int qemu_fdatasync(int fd)
>> #define SPLAT(p) _mm_set1_epi8(*(p))
>> #define ALL_EQ(v1, v2) (_mm_movemask_epi8(_mm_cmpeq_epi8(v1, v2)) == 0xFFFF)
>> #define VEC_OR(v1, v2) (_mm_or_si128(v1, v2))
>> +#elif __aarch64__
>> +#include "arm_neon.h"
>> +#define VECTYPE uint64x2_t
>> +#define ALL_EQ(v1, v2) \
>> + ((vgetq_lane_u64(v1, 0) == vgetq_lane_u64(v2, 0)) && \
>> + (vgetq_lane_u64(v1, 1) == vgetq_lane_u64(v2, 1)))
>> +#define VEC_OR(v1, v2) ((v1) | (v2))
>
> Should be '#elif defined(__aarch64__)'. I have made this
> tweak and put this patch in target-arm.next.
Consider
#define VECTYPE uint32x4_t
#define ALL_EQ(v1, v2) (vmaxvq_u32((v1) ^ (v2)) == 0)
which compiles down to
1c: 6e211c00 eor v0.16b, v0.16b, v1.16b
20: 6eb0a800 umaxv s0, v0.4s
24: 1e260000 fmov w0, s0
28: 6b1f001f cmp w0, wzr
2c: 1a9f17e0 cset w0, eq
30: d65f03c0 ret
vs
34: 4e083c20 mov x0, v1.d[0]
38: 4e083c01 mov x1, v0.d[0]
3c: eb00003f cmp x1, x0
40: 52800000 mov w0, #0
44: 54000040 b.eq 4c <f0+0x18>
48: d65f03c0 ret
4c: 4e183c20 mov x0, v1.d[1]
50: 4e183c01 mov x1, v0.d[1]
54: eb00003f cmp x1, x0
58: 1a9f17e0 cset w0, eq
5c: d65f03c0 ret
r~
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH v3 1/1] target-arm: Use Neon for zero checking
2016-07-01 22:07 ` Richard Henderson
@ 2016-07-02 9:42 ` Peter Maydell
2016-07-05 12:24 ` Vijay Kilari
1 sibling, 0 replies; 8+ messages in thread
From: Peter Maydell @ 2016-07-02 9:42 UTC (permalink / raw)
To: Richard Henderson
Cc: Vijay, Prasun.Kapoor, Vijay Kilari, Suresh, QEMU Developers,
qemu-arm, Paolo Bonzini
On 1 July 2016 at 23:07, Richard Henderson <rth@twiddle.net> wrote:
> On 06/30/2016 06:45 AM, Peter Maydell wrote:
>>
>> On 29 June 2016 at 09:47, <vijayak@cavium.com> wrote:
>>>
>>> From: Vijay <vijayak@cavium.com>
>>>
>>> Use Neon instructions to perform zero checking of
>>> buffer. This is helps in reducing total migration time.
>>
>>
>>> diff --git a/util/cutils.c b/util/cutils.c
>>> index 5830a68..4779403 100644
>>> --- a/util/cutils.c
>>> +++ b/util/cutils.c
>>> @@ -184,6 +184,13 @@ int qemu_fdatasync(int fd)
>>> #define SPLAT(p) _mm_set1_epi8(*(p))
>>> #define ALL_EQ(v1, v2) (_mm_movemask_epi8(_mm_cmpeq_epi8(v1, v2)) ==
>>> 0xFFFF)
>>> #define VEC_OR(v1, v2) (_mm_or_si128(v1, v2))
>>> +#elif __aarch64__
>>> +#include "arm_neon.h"
>>> +#define VECTYPE uint64x2_t
>>> +#define ALL_EQ(v1, v2) \
>>> + ((vgetq_lane_u64(v1, 0) == vgetq_lane_u64(v2, 0)) && \
>>> + (vgetq_lane_u64(v1, 1) == vgetq_lane_u64(v2, 1)))
>>> +#define VEC_OR(v1, v2) ((v1) | (v2))
>>
>>
>> Should be '#elif defined(__aarch64__)'. I have made this
>> tweak and put this patch in target-arm.next.
>
>
> Consider
>
> #define VECTYPE uint32x4_t
> #define ALL_EQ(v1, v2) (vmaxvq_u32((v1) ^ (v2)) == 0)
Sounds good. Vijay, could you benchmark that variant, please?
thanks
-- PMM
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH v3 1/1] target-arm: Use Neon for zero checking
2016-07-01 22:07 ` Richard Henderson
2016-07-02 9:42 ` Peter Maydell
@ 2016-07-05 12:24 ` Vijay Kilari
2016-07-11 17:55 ` Peter Maydell
1 sibling, 1 reply; 8+ messages in thread
From: Vijay Kilari @ 2016-07-05 12:24 UTC (permalink / raw)
To: Richard Henderson
Cc: Peter Maydell, Vijay, prasun.kapoor, Suresh, QEMU Developers,
qemu-arm, Paolo Bonzini
On Sat, Jul 2, 2016 at 3:37 AM, Richard Henderson <rth@twiddle.net> wrote:
> On 06/30/2016 06:45 AM, Peter Maydell wrote:
>>
>> On 29 June 2016 at 09:47, <vijayak@cavium.com> wrote:
>>>
>>> From: Vijay <vijayak@cavium.com>
>>>
>>> Use Neon instructions to perform zero checking of
>>> buffer. This is helps in reducing total migration time.
>>
>>
>>> diff --git a/util/cutils.c b/util/cutils.c
>>> index 5830a68..4779403 100644
>>> --- a/util/cutils.c
>>> +++ b/util/cutils.c
>>> @@ -184,6 +184,13 @@ int qemu_fdatasync(int fd)
>>> #define SPLAT(p) _mm_set1_epi8(*(p))
>>> #define ALL_EQ(v1, v2) (_mm_movemask_epi8(_mm_cmpeq_epi8(v1, v2)) ==
>>> 0xFFFF)
>>> #define VEC_OR(v1, v2) (_mm_or_si128(v1, v2))
>>> +#elif __aarch64__
>>> +#include "arm_neon.h"
>>> +#define VECTYPE uint64x2_t
>>> +#define ALL_EQ(v1, v2) \
>>> + ((vgetq_lane_u64(v1, 0) == vgetq_lane_u64(v2, 0)) && \
>>> + (vgetq_lane_u64(v1, 1) == vgetq_lane_u64(v2, 1)))
>>> +#define VEC_OR(v1, v2) ((v1) | (v2))
>>
>>
>> Should be '#elif defined(__aarch64__)'. I have made this
>> tweak and put this patch in target-arm.next.
>
>
> Consider
>
> #define VECTYPE uint32x4_t
> #define ALL_EQ(v1, v2) (vmaxvq_u32((v1) ^ (v2)) == 0)
>
>
> which compiles down to
>
> 1c: 6e211c00 eor v0.16b, v0.16b, v1.16b
> 20: 6eb0a800 umaxv s0, v0.4s
> 24: 1e260000 fmov w0, s0
> 28: 6b1f001f cmp w0, wzr
> 2c: 1a9f17e0 cset w0, eq
> 30: d65f03c0 ret
For me this code compiles as below and migration time is ~100ms more.
See below 3 trails of migration time
7039cc: 6eb0a800 umaxv s0, v0.4s
7039d0: 0e043c02 mov w2, v0.s[0]
7039d4: 350000c2 cbnz w2, 7039ec <f0+0xf4>
7039d8: 91002084 add x4, x4, #0x8
7039dc: 91020063 add x3, x3, #0x80
7039e0: eb01009f cmp x4, x1
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off
zero-blocks: off compress: off events: off x-postcopy-ram: off
Migration status: completed
total time: 3070 milliseconds
downtime: 55 milliseconds
setup: 4 milliseconds
transferred ram: 300637 kbytes
throughput: 802.49 mbps
remaining ram: 0 kbytes
total ram: 8519872 kbytes
duplicate: 2062834 pages
skipped: 0 pages
normal: 70489 pages
normal bytes: 281956 kbytes
dirty sync count: 3
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off
zero-blocks: off compress: off events: off x-postcopy-ram: off
Migration status: completed
total time: 3067 milliseconds
downtime: 47 milliseconds
setup: 5 milliseconds
transferred ram: 290277 kbytes
throughput: 775.61 mbps
remaining ram: 0 kbytes
total ram: 8519872 kbytes
duplicate: 2064185 pages
skipped: 0 pages
normal: 67901 pages
normal bytes: 271604 kbytes
dirty sync count: 3
(qemu)
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off
zero-blocks: off compress: off events: off x-postcopy-ram: off
Migration status: completed
total time: 3067 milliseconds
downtime: 34 milliseconds
setup: 5 milliseconds
transferred ram: 294614 kbytes
throughput: 787.19 mbps
remaining ram: 0 kbytes
total ram: 8519872 kbytes
duplicate: 2063365 pages
skipped: 0 pages
normal: 68985 pages
normal bytes: 275940 kbytes
dirty sync count: 3
>
> vs
>
> 34: 4e083c20 mov x0, v1.d[0]
> 38: 4e083c01 mov x1, v0.d[0]
> 3c: eb00003f cmp x1, x0
> 40: 52800000 mov w0, #0
> 44: 54000040 b.eq 4c <f0+0x18>
> 48: d65f03c0 ret
> 4c: 4e183c20 mov x0, v1.d[1]
> 50: 4e183c01 mov x1, v0.d[1]
> 54: eb00003f cmp x1, x0
> 58: 1a9f17e0 cset w0, eq
> 5c: d65f03c0 ret
>
My patch compiles to below code and takes ~100ms less time
#define VECTYPE uint64x2_t
#define ALL_EQ(v1, v2) \
((vgetq_lane_u64(v1, 0) == vgetq_lane_u64(v2, 0)) && \
(vgetq_lane_u64(v1, 1) == vgetq_lane_u64(v2, 1)))
7039d0: 4e083c02 mov x2, v0.d[0]
7039d4: b5000102 cbnz x2, 7039f4 <f0+0xfc>
7039d8: 4e183c02 mov x2, v0.d[1]
7039dc: b50000c2 cbnz x2, 7039f4 <f0+0xfc>
7039e0: 91002084 add x4, x4, #0x8
7039e4: 91020063 add x3, x3, #0x80
7039e8: eb04003f cmp x1, x4
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off
zero-blocks: off compress: off events: off x-postcopy-ram: off
Migration status: completed
total time: 2973 milliseconds
downtime: 67 milliseconds
setup: 5 milliseconds
transferred ram: 293659 kbytes
throughput: 809.45 mbps
remaining ram: 0 kbytes
total ram: 8519872 kbytes
duplicate: 2062791 pages
skipped: 0 pages
normal: 68748 pages
normal bytes: 274992 kbytes
dirty sync count: 3
(qemu)
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off
zero-blocks: off compress: off events: off x-postcopy-ram: off
Migration status: completed
total time: 2972 milliseconds
downtime: 47 milliseconds
setup: 5 milliseconds
transferred ram: 295972 kbytes
throughput: 816.10 mbps
remaining ram: 0 kbytes
total ram: 8519872 kbytes
duplicate: 2062861 pages
skipped: 0 pages
normal: 69325 pages
normal bytes: 277300 kbytes
dirty sync count: 3
(qemu)
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off
zero-blocks: off compress: off events: off x-postcopy-ram: off
Migration status: completed
total time: 2982 milliseconds
downtime: 40 milliseconds
setup: 5 milliseconds
transferred ram: 293386 kbytes
throughput: 806.26 mbps
remaining ram: 0 kbytes
total ram: 8519872 kbytes
duplicate: 2063199 pages
skipped: 0 pages
normal: 68679 pages
normal bytes: 274716 kbytes
dirty sync count: 4
(qemu)
Regards
Vijay
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH v3 1/1] target-arm: Use Neon for zero checking
2016-07-05 12:24 ` Vijay Kilari
@ 2016-07-11 17:55 ` Peter Maydell
0 siblings, 0 replies; 8+ messages in thread
From: Peter Maydell @ 2016-07-11 17:55 UTC (permalink / raw)
To: Vijay Kilari
Cc: Richard Henderson, Vijay, prasun.kapoor, Suresh, QEMU Developers,
qemu-arm, Paolo Bonzini
On 5 July 2016 at 13:24, Vijay Kilari <vijay.kilari@gmail.com> wrote:
> On Sat, Jul 2, 2016 at 3:37 AM, Richard Henderson <rth@twiddle.net> wrote:
>> Consider
>>
>> #define VECTYPE uint32x4_t
>> #define ALL_EQ(v1, v2) (vmaxvq_u32((v1) ^ (v2)) == 0)
>>
>>
>> which compiles down to
>>
>> 1c: 6e211c00 eor v0.16b, v0.16b, v1.16b
>> 20: 6eb0a800 umaxv s0, v0.4s
>> 24: 1e260000 fmov w0, s0
>> 28: 6b1f001f cmp w0, wzr
>> 2c: 1a9f17e0 cset w0, eq
>> 30: d65f03c0 ret
>
> For me this code compiles as below and migration time is ~100ms more.
Thanks for benchmarking this. I'll take your original patch into
target-arm.next.
-- PMM
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-07-11 17:56 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-29 8:47 [Qemu-devel] [PATCH v3 0/1] ARM64: Live migration optimization vijayak
2016-06-29 8:47 ` [Qemu-devel] [PATCH v3 1/1] target-arm: Use Neon for zero checking vijayak
2016-06-29 12:53 ` Paolo Bonzini
2016-06-30 13:45 ` Peter Maydell
2016-07-01 22:07 ` Richard Henderson
2016-07-02 9:42 ` Peter Maydell
2016-07-05 12:24 ` Vijay Kilari
2016-07-11 17:55 ` Peter Maydell
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).