[Qemu-devel] [RFC PATCH v1 0/2] Live migration optimization for Thunderx platform

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [RFC PATCH v1 0/2] Live migration optimization for Thunderx platform
@ 2016-08-02 10:20 vijay.kilari
  2016-08-02 10:20 ` [Qemu-devel] [RFC PATCH v1 1/2] utils: Add helper to read arm MIDR_EL1 register vijay.kilari
  2016-08-02 10:20 ` [Qemu-devel] [RFC PATCH v1 2/2] utils: Add prefetch for Thunderx platform vijay.kilari
  0 siblings, 2 replies; 10+ messages in thread
From: vijay.kilari @ 2016-08-02 10:20 UTC (permalink / raw)
  To: qemu-arm, peter.maydell, pbonzini
  Cc: qemu-devel, Prasun.Kapoor, vijay.kilari, Vijaya Kumar K

From: Vijaya Kumar K <Vijaya.Kumar@cavium.com>

The CPU MIDR_EL1 register is exposed to userspace for arm64
with the below patch.
https://lkml.org/lkml/2016/7/8/467

Thunderx platform requires explicit prefetch instruction to
provide prefetch hint. Using MIDR_EL1 information, provided
by above kernel patch, prefetch is executed if the platform
is Thunderx.

The results of live migration time improvement is provided
in commit message of patch 2.

Vijaya Kumar K (2):
  utils: Add helper to read arm MIDR_EL1 register
  utils: Add prefetch for Thunderx platform

 include/qemu-common.h |    2 ++
 util/Makefile.objs    |    1 +
 util/cpuinfo.c        |   90 +++++++++++++++++++++++++++++++++++++++++++++++++
 util/cutils.c         |   22 ++++++++++++
 4 files changed, 115 insertions(+)
 create mode 100644 util/cpuinfo.c

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Qemu-devel] [RFC PATCH v1 1/2] utils: Add helper to read arm MIDR_EL1 register
  2016-08-02 10:20 [Qemu-devel] [RFC PATCH v1 0/2] Live migration optimization for Thunderx platform vijay.kilari
@ 2016-08-02 10:20 ` vijay.kilari
  2016-08-02 10:48   ` Paolo Bonzini
  2016-08-02 11:10   ` Peter Maydell
  2016-08-02 10:20 ` [Qemu-devel] [RFC PATCH v1 2/2] utils: Add prefetch for Thunderx platform vijay.kilari
  1 sibling, 2 replies; 10+ messages in thread
From: vijay.kilari @ 2016-08-02 10:20 UTC (permalink / raw)
  To: qemu-arm, peter.maydell, pbonzini
  Cc: qemu-devel, Prasun.Kapoor, vijay.kilari, Vijaya Kumar K

From: Vijaya Kumar K <Vijaya.Kumar@cavium.com>

Add helper API to read MIDR_EL1 registers to fetch
cpu identification information. This helps in
adding errata's and architecture specific features.

This is implemented only for arm architecture.

Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@cavium.com>
---
 include/qemu-common.h |    1 +
 util/Makefile.objs    |    1 +
 util/cpuinfo.c        |   52 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 54 insertions(+)

diff --git a/include/qemu-common.h b/include/qemu-common.h
index 1f2cb94..62ad674 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -134,4 +134,5 @@ void page_size_init(void);
  * returned. */
 bool dump_in_progress(void);
 
+long int qemu_read_cpuid_info(void);
 #endif
diff --git a/util/Makefile.objs b/util/Makefile.objs
index 96cb1e0..9d25a72 100644
--- a/util/Makefile.objs
+++ b/util/Makefile.objs
@@ -35,3 +35,4 @@ util-obj-y += log.o
 util-obj-y += qdist.o
 util-obj-y += qht.o
 util-obj-y += range.o
+util-obj-y += cpuinfo.o
diff --git a/util/cpuinfo.c b/util/cpuinfo.c
new file mode 100644
index 0000000..3ba7194
--- /dev/null
+++ b/util/cpuinfo.c
@@ -0,0 +1,52 @@
+/*
+ * Dealing with arm cpu identification information.
+ *
+ * Copyright (C) 2016 Cavium, Inc.
+ *
+ * Authors:
+ *  Vijaya Kumar K <Vijaya.Kumar@cavium.com>
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.1
+ * or later.  See the COPYING.LIB file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "qemu/cutils.h"
+
+#if defined(__aarch64__)
+
+long int qemu_read_cpuid_info(void)
+{
+    FILE *fp;
+    char *buf;
+    long int midr = 0;
+#define BUF_SIZE 32
+
+    fp = fopen("/sys/devices/system/cpu/cpu0/regs/identification/midr_el1",
+               "r");
+    if (!fp) {
+        return 0;
+    }
+
+    buf = g_malloc0(BUF_SIZE);
+    if (!buf) {
+        fclose(fp);
+        return 0;
+    }
+
+    if (buf != fgets(buf, BUF_SIZE - 1, fp)) {
+        goto out;
+    }
+
+    if (qemu_strtol(buf, NULL, 0, &midr) < 0) {
+        goto out;
+    }
+
+out:
+    g_free(buf);
+    fclose(fp);
+
+    return midr;
+}
+#endif
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v1 1/2] utils: Add helper to read arm MIDR_EL1 register
  2016-08-02 10:20 ` [Qemu-devel] [RFC PATCH v1 1/2] utils: Add helper to read arm MIDR_EL1 register vijay.kilari
@ 2016-08-02 10:48   ` Paolo Bonzini
  2016-08-04  9:16     ` Vijay Kilari
  2016-08-02 11:10   ` Peter Maydell
  1 sibling, 1 reply; 10+ messages in thread
From: Paolo Bonzini @ 2016-08-02 10:48 UTC (permalink / raw)
  To: vijay kilari
  Cc: qemu-arm, peter maydell, qemu-devel, Prasun Kapoor,
	Vijaya Kumar K

----- Original Message -----
> From: "vijay kilari" <vijay.kilari@gmail.com>
> To: qemu-arm@nongnu.org, "peter maydell" <peter.maydell@linaro.org>, pbonzini@redhat.com
> Cc: qemu-devel@nongnu.org, "Prasun Kapoor" <Prasun.Kapoor@cavium.com>, "vijay kilari" <vijay.kilari@gmail.com>,
> "Vijaya Kumar K" <Vijaya.Kumar@cavium.com>
> Sent: Tuesday, August 2, 2016 12:20:15 PM
> Subject: [RFC PATCH v1 1/2] utils: Add helper to read arm MIDR_EL1 register
> 
> From: Vijaya Kumar K <Vijaya.Kumar@cavium.com>
> 
> Add helper API to read MIDR_EL1 registers to fetch
> cpu identification information. This helps in
> adding errata's and architecture specific features.
> 
> This is implemented only for arm architecture.
> 
> Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@cavium.com>
> ---
>  include/qemu-common.h |    1 +
>  util/Makefile.objs    |    1 +
>  util/cpuinfo.c        |   52
>  +++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 54 insertions(+)
> 
> diff --git a/include/qemu-common.h b/include/qemu-common.h
> index 1f2cb94..62ad674 100644
> --- a/include/qemu-common.h
> +++ b/include/qemu-common.h
> @@ -134,4 +134,5 @@ void page_size_init(void);
>   * returned. */
>  bool dump_in_progress(void);
>  
> +long int qemu_read_cpuid_info(void);

First, please avoid adding to include/qemu-common.h (it really should
go away).

Second, this is too generic a name.  Please call it something like
qemu_read_aarch64_midr_el1.

Third, it's probably a bad idea to call this function from generic code, so
make it static and add the detection function from patch 2/2 already here.
By making it static, it's also possible to define it only if CONFIG_LINUX
is defined; the ThunderX detection will then return false if !CONFIG_LINUX.

Thanks,

Paolo

>  #endif
> diff --git a/util/Makefile.objs b/util/Makefile.objs
> index 96cb1e0..9d25a72 100644
> --- a/util/Makefile.objs
> +++ b/util/Makefile.objs
> @@ -35,3 +35,4 @@ util-obj-y += log.o
>  util-obj-y += qdist.o
>  util-obj-y += qht.o
>  util-obj-y += range.o
> +util-obj-y += cpuinfo.o
> diff --git a/util/cpuinfo.c b/util/cpuinfo.c
> new file mode 100644
> index 0000000..3ba7194
> --- /dev/null
> +++ b/util/cpuinfo.c
> @@ -0,0 +1,52 @@
> +/*
> + * Dealing with arm cpu identification information.
> + *
> + * Copyright (C) 2016 Cavium, Inc.
> + *
> + * Authors:
> + *  Vijaya Kumar K <Vijaya.Kumar@cavium.com>
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2.1
> + * or later.  See the COPYING.LIB file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu-common.h"
> +#include "qemu/cutils.h"
> +
> +#if defined(__aarch64__)
> +
> +long int qemu_read_cpuid_info(void)
> +{
> +    FILE *fp;
> +    char *buf;
> +    long int midr = 0;
> +#define BUF_SIZE 32
> +
> +    fp = fopen("/sys/devices/system/cpu/cpu0/regs/identification/midr_el1",
> +               "r");
> +    if (!fp) {
> +        return 0;
> +    }
> +
> +    buf = g_malloc0(BUF_SIZE);
> +    if (!buf) {
> +        fclose(fp);
> +        return 0;
> +    }
> +
> +    if (buf != fgets(buf, BUF_SIZE - 1, fp)) {
> +        goto out;
> +    }
> +
> +    if (qemu_strtol(buf, NULL, 0, &midr) < 0) {
> +        goto out;
> +    }
> +
> +out:
> +    g_free(buf);
> +    fclose(fp);
> +
> +    return midr;
> +}
> +#endif
> --
> 1.7.9.5
> 
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v1 1/2] utils: Add helper to read arm MIDR_EL1 register
  2016-08-02 10:48   ` Paolo Bonzini
@ 2016-08-04  9:16     ` Vijay Kilari
  2016-08-04 12:16       ` Paolo Bonzini
  0 siblings, 1 reply; 10+ messages in thread
From: Vijay Kilari @ 2016-08-04  9:16 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-arm, peter maydell, QEMU Developers, Prasun Kapoor,
	Vijaya Kumar K

Hi Paolo,

On Tue, Aug 2, 2016 at 4:18 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> ----- Original Message -----
>> From: "vijay kilari" <vijay.kilari@gmail.com>
>> To: qemu-arm@nongnu.org, "peter maydell" <peter.maydell@linaro.org>, pbonzini@redhat.com
>> Cc: qemu-devel@nongnu.org, "Prasun Kapoor" <Prasun.Kapoor@cavium.com>, "vijay kilari" <vijay.kilari@gmail.com>,
>> "Vijaya Kumar K" <Vijaya.Kumar@cavium.com>
>> Sent: Tuesday, August 2, 2016 12:20:15 PM
>> Subject: [RFC PATCH v1 1/2] utils: Add helper to read arm MIDR_EL1 register
>>
>> From: Vijaya Kumar K <Vijaya.Kumar@cavium.com>
>>
>> Add helper API to read MIDR_EL1 registers to fetch
>> cpu identification information. This helps in
>> adding errata's and architecture specific features.
>>
>> This is implemented only for arm architecture.
>>
>> Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@cavium.com>
>> ---
>>  include/qemu-common.h |    1 +
>>  util/Makefile.objs    |    1 +
>>  util/cpuinfo.c        |   52
>>  +++++++++++++++++++++++++++++++++++++++++++++++++
>>  3 files changed, 54 insertions(+)
>>
>> diff --git a/include/qemu-common.h b/include/qemu-common.h
>> index 1f2cb94..62ad674 100644
>> --- a/include/qemu-common.h
>> +++ b/include/qemu-common.h
>> @@ -134,4 +134,5 @@ void page_size_init(void);
>>   * returned. */
>>  bool dump_in_progress(void);
>>
>> +long int qemu_read_cpuid_info(void);
>
> First, please avoid adding to include/qemu-common.h (it really should
> go away).
>
> Second, this is too generic a name.  Please call it something like
> qemu_read_aarch64_midr_el1.

OK
>
> Third, it's probably a bad idea to call this function from generic code, so
> make it static and add the detection function from patch 2/2 already here.
> By making it static, it's also possible to define it only if CONFIG_LINUX
> is defined; the ThunderX detection will then return false if !CONFIG_LINUX.
>

  You mean to say, move contents of this patch to util/cutils.c and make it
static and define under __aarch64__ and CONFIG_LINUX?.

> Thanks,
>
> Paolo
>
>>  #endif
>> diff --git a/util/Makefile.objs b/util/Makefile.objs
>> index 96cb1e0..9d25a72 100644
>> --- a/util/Makefile.objs
>> +++ b/util/Makefile.objs
>> @@ -35,3 +35,4 @@ util-obj-y += log.o
>>  util-obj-y += qdist.o
>>  util-obj-y += qht.o
>>  util-obj-y += range.o
>> +util-obj-y += cpuinfo.o
>> diff --git a/util/cpuinfo.c b/util/cpuinfo.c
>> new file mode 100644
>> index 0000000..3ba7194
>> --- /dev/null
>> +++ b/util/cpuinfo.c
>> @@ -0,0 +1,52 @@
>> +/*
>> + * Dealing with arm cpu identification information.
>> + *
>> + * Copyright (C) 2016 Cavium, Inc.
>> + *
>> + * Authors:
>> + *  Vijaya Kumar K <Vijaya.Kumar@cavium.com>
>> + *
>> + * This work is licensed under the terms of the GNU LGPL, version 2.1
>> + * or later.  See the COPYING.LIB file in the top-level directory.
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "qemu-common.h"
>> +#include "qemu/cutils.h"
>> +
>> +#if defined(__aarch64__)
>> +
>> +long int qemu_read_cpuid_info(void)
>> +{
>> +    FILE *fp;
>> +    char *buf;
>> +    long int midr = 0;
>> +#define BUF_SIZE 32
>> +
>> +    fp = fopen("/sys/devices/system/cpu/cpu0/regs/identification/midr_el1",
>> +               "r");
>> +    if (!fp) {
>> +        return 0;
>> +    }
>> +
>> +    buf = g_malloc0(BUF_SIZE);
>> +    if (!buf) {
>> +        fclose(fp);
>> +        return 0;
>> +    }
>> +
>> +    if (buf != fgets(buf, BUF_SIZE - 1, fp)) {
>> +        goto out;
>> +    }
>> +
>> +    if (qemu_strtol(buf, NULL, 0, &midr) < 0) {
>> +        goto out;
>> +    }
>> +
>> +out:
>> +    g_free(buf);
>> +    fclose(fp);
>> +
>> +    return midr;
>> +}
>> +#endif
>> --
>> 1.7.9.5
>>
>>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v1 1/2] utils: Add helper to read arm MIDR_EL1 register
  2016-08-04  9:16     ` Vijay Kilari
@ 2016-08-04 12:16       ` Paolo Bonzini
  0 siblings, 0 replies; 10+ messages in thread
From: Paolo Bonzini @ 2016-08-04 12:16 UTC (permalink / raw)
  To: Vijay Kilari
  Cc: qemu-arm, peter maydell, QEMU Developers, Prasun Kapoor,
	Vijaya Kumar K


> > Third, it's probably a bad idea to call this function from generic code, so
> > make it static and add the detection function from patch 2/2 already here.
> > By making it static, it's also possible to define it only if CONFIG_LINUX
> > is defined; the ThunderX detection will then return false if !CONFIG_LINUX.
> >
> 
>   You mean to say, move contents of this patch to util/cutils.c and make it
> static and define under __aarch64__ and CONFIG_LINUX?.

I don't think util/cutils.c is the right file.  It should be a new file, something
like util/aarch64-cpuid.c.

If CONFIG_LINUX is not defined, the ThunderX detection function should return
zero.

If __aarch64__ is not defined, the function should not be defined at all.

Paolo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v1 1/2] utils: Add helper to read arm MIDR_EL1 register
  2016-08-02 10:20 ` [Qemu-devel] [RFC PATCH v1 1/2] utils: Add helper to read arm MIDR_EL1 register vijay.kilari
  2016-08-02 10:48   ` Paolo Bonzini
@ 2016-08-02 11:10   ` Peter Maydell
  1 sibling, 0 replies; 10+ messages in thread
From: Peter Maydell @ 2016-08-02 11:10 UTC (permalink / raw)
  To: Vijay Kilari
  Cc: qemu-arm, Paolo Bonzini, QEMU Developers, prasun.kapoor,
	Vijaya Kumar K

On 2 August 2016 at 11:20,  <vijay.kilari@gmail.com> wrote:
> +long int qemu_read_cpuid_info(void)

Don't use "long" here, it might be 32 or 64 bits.
The kernel ABI for the /sys/ file we're reading says it
is a 64-bit value, so uint64_t is what you want.

> +{
> +    FILE *fp;
> +    char *buf;
> +    long int midr = 0;
> +#define BUF_SIZE 32
> +
> +    fp = fopen("/sys/devices/system/cpu/cpu0/regs/identification/midr_el1",
> +               "r");
> +    if (!fp) {
> +        return 0;
> +    }
> +
> +    buf = g_malloc0(BUF_SIZE);
> +    if (!buf) {
> +        fclose(fp);
> +        return 0;
> +    }
> +
> +    if (buf != fgets(buf, BUF_SIZE - 1, fp)) {
> +        goto out;
> +    }

g_file_get_contents() is probably easier than manually
opening the file and reading it into an allocated buffer.

> +
> +    if (qemu_strtol(buf, NULL, 0, &midr) < 0) {

qemu_strtoull().

> +        goto out;
> +    }
> +
> +out:
> +    g_free(buf);
> +    fclose(fp);
> +
> +    return midr;
> +}
> +#endif
> --
> 1.7.9.5
>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Qemu-devel] [RFC PATCH v1 2/2] utils: Add prefetch for Thunderx platform
  2016-08-02 10:20 [Qemu-devel] [RFC PATCH v1 0/2] Live migration optimization for Thunderx platform vijay.kilari
  2016-08-02 10:20 ` [Qemu-devel] [RFC PATCH v1 1/2] utils: Add helper to read arm MIDR_EL1 register vijay.kilari
@ 2016-08-02 10:20 ` vijay.kilari
  2016-08-06 10:17   ` Richard Henderson
  1 sibling, 1 reply; 10+ messages in thread
From: vijay.kilari @ 2016-08-02 10:20 UTC (permalink / raw)
  To: qemu-arm, peter.maydell, pbonzini
  Cc: qemu-devel, Prasun.Kapoor, vijay.kilari, Vijaya Kumar K

From: Vijaya Kumar K <Vijaya.Kumar@cavium.com>

Thunderx pass2 chip requires explicit prefetch
instruction to give prefetch hint.

To speed up live migration on Thunderx platform,
prefetch instruction is added in zero buffer check
function.

The below results show live migration time improvement
with prefetch instruction with 1K and 4K page size.
VM with 4 VCPUs, 8GB RAM is migrated.

1K page size, no prefetch
=========================
Migration status: completed
total time: 13012 milliseconds
downtime: 10 milliseconds
setup: 15 milliseconds
transferred ram: 268131 kbytes
throughput: 168.84 mbps
remaining ram: 0 kbytes
total ram: 8519872 kbytes
duplicate: 8338072 pages
skipped: 0 pages
normal: 193335 pages
normal bytes: 193335 kbytes
dirty sync count: 4

1K page size with prefetch
=========================
Migration status: completed
total time: 7493 milliseconds
downtime: 71 milliseconds
setup: 16 milliseconds
transferred ram: 269666 kbytes
throughput: 294.88 mbps
remaining ram: 0 kbytes
total ram: 8519872 kbytes
duplicate: 8340596 pages
skipped: 0 pages
normal: 194837 pages
normal bytes: 194837 kbytes
dirty sync count: 3

4K page size with no prefetch
=============================
Migration status: completed
total time: 10456 milliseconds
downtime: 49 milliseconds
setup: 5 milliseconds
transferred ram: 231726 kbytes
throughput: 181.59 mbps
remaining ram: 0 kbytes
total ram: 8519872 kbytes
duplicate: 2079914 pages
skipped: 0 pages
normal: 53257 pages
normal bytes: 213028 kbytes
dirty sync count: 3

4K page size with prefetch
==========================
Migration status: completed
total time: 3937 milliseconds
downtime: 23 milliseconds
setup: 5 milliseconds
transferred ram: 229283 kbytes
throughput: 477.19 mbps
remaining ram: 0 kbytes
total ram: 8519872 kbytes
duplicate: 2079775 pages
skipped: 0 pages
normal: 52648 pages
normal bytes: 210592 kbytes
dirty sync count: 3

Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@cavium.com>
---
 include/qemu-common.h |    1 +
 util/cpuinfo.c        |   38 ++++++++++++++++++++++++++++++++++++++
 util/cutils.c         |   22 ++++++++++++++++++++++
 3 files changed, 61 insertions(+)

diff --git a/include/qemu-common.h b/include/qemu-common.h
index 62ad674..3d8a32c 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -135,4 +135,5 @@ void page_size_init(void);
 bool dump_in_progress(void);
 
 long int qemu_read_cpuid_info(void);
+bool is_thunder_pass2_cpu(void);
 #endif
diff --git a/util/cpuinfo.c b/util/cpuinfo.c
index 3ba7194..0e72a34 100644
--- a/util/cpuinfo.c
+++ b/util/cpuinfo.c
@@ -16,6 +16,26 @@
 
 #if defined(__aarch64__)
 
+#define MIDR_IMPLEMENTER_SHIFT  24
+#define MIDR_IMPLEMENTER_MASK   (0xffULL << MIDR_IMPLEMENTER_SHIFT)
+#define MIDR_ARCHITECTURE_SHIFT 16
+#define MIDR_ARCHITECTURE_MASK  (0xf << MIDR_ARCHITECTURE_SHIFT)
+#define MIDR_PARTNUM_SHIFT      4
+#define MIDR_PARTNUM_MASK       (0xfff << MIDR_PARTNUM_SHIFT)
+
+#define MIDR_CPU_PART(imp, partnum) \
+        (((imp)                 << MIDR_IMPLEMENTER_SHIFT)  | \
+        (0xf                    << MIDR_ARCHITECTURE_SHIFT) | \
+        ((partnum)              << MIDR_PARTNUM_SHIFT))
+
+#define ARM_CPU_IMP_CAVIUM        0x43
+#define CAVIUM_CPU_PART_THUNDERX  0x0A1
+
+#define MIDR_THUNDERX  \
+               MIDR_CPU_PART(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX)
+#define CPU_MODEL_MASK  (MIDR_IMPLEMENTER_MASK | MIDR_ARCHITECTURE_MASK | \
+                         MIDR_PARTNUM_MASK)
+
 long int qemu_read_cpuid_info(void)
 {
     FILE *fp;
@@ -49,4 +69,22 @@ out:
 
     return midr;
 }
+
+bool is_thunder_pass2_cpu(void)
+{
+    static bool cpu_info_read;
+    static long int midr_thunder_val;
+
+    if (!cpu_info_read) {
+        midr_thunder_val = qemu_read_cpuid_info();
+        midr_thunder_val &= CPU_MODEL_MASK;
+        cpu_info_read = 1;
+    }
+
+    if (midr_thunder_val == MIDR_THUNDERX) {
+        return 1;
+    } else {
+        return 0;
+    }
+}
 #endif
diff --git a/util/cutils.c b/util/cutils.c
index 7505fda..66c816b 100644
--- a/util/cutils.c
+++ b/util/cutils.c
@@ -191,6 +191,8 @@ int qemu_fdatasync(int fd)
         ((vgetq_lane_u64(v1, 0) == vgetq_lane_u64(v2, 0)) && \
          (vgetq_lane_u64(v1, 1) == vgetq_lane_u64(v2, 1)))
 #define VEC_OR(v1, v2) ((v1) | (v2))
+#define VEC_PREFETCH(base, index) \
+        asm volatile ("prfm pldl1strm, [%x[a]]\n" : : [a]"r"(&base[(index)]))
 #else
 #define VECTYPE        unsigned long
 #define SPLAT(p)       (*(p) * (~0UL / 255))
@@ -233,6 +235,9 @@ static size_t buffer_find_nonzero_offset_inner(const void *buf, size_t len)
     const VECTYPE *p = buf;
     const VECTYPE zero = (VECTYPE){0};
     size_t i;
+#if defined (__aarch64__)
+    bool do_prefetch;
+#endif
 
     assert(can_use_buffer_find_nonzero_offset_inner(buf, len));
 
@@ -246,9 +251,26 @@ static size_t buffer_find_nonzero_offset_inner(const void *buf, size_t len)
         }
     }
 
+#if defined (__aarch64__)
+    do_prefetch = is_thunder_pass2_cpu();
+    if (do_prefetch) {
+        VEC_PREFETCH(p, 8);
+        VEC_PREFETCH(p, 16);
+        VEC_PREFETCH(p, 24);
+    }
+#endif
+
     for (i = BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR;
          i < len / sizeof(VECTYPE);
          i += BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR) {
+
+#if defined (__aarch64__)
+        if (do_prefetch) {
+            VEC_PREFETCH(p, i+32);
+            VEC_PREFETCH(p, i+40);
+        }
+#endif
+
         VECTYPE tmp0 = VEC_OR(p[i + 0], p[i + 1]);
         VECTYPE tmp1 = VEC_OR(p[i + 2], p[i + 3]);
         VECTYPE tmp2 = VEC_OR(p[i + 4], p[i + 5]);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v1 2/2] utils: Add prefetch for Thunderx platform
  2016-08-02 10:20 ` [Qemu-devel] [RFC PATCH v1 2/2] utils: Add prefetch for Thunderx platform vijay.kilari
@ 2016-08-06 10:17   ` Richard Henderson
  2016-08-12 11:32     ` Vijay Kilari
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Henderson @ 2016-08-06 10:17 UTC (permalink / raw)
  To: vijay.kilari, qemu-arm, peter.maydell, pbonzini
  Cc: Prasun.Kapoor, qemu-devel, Vijaya Kumar K

On 08/02/2016 03:50 PM, vijay.kilari@gmail.com wrote:
> +#define VEC_PREFETCH(base, index) \
> +        asm volatile ("prfm pldl1strm, [%x[a]]\n" : : [a]"r"(&base[(index)]))

Is this not __builtin_prefetch(base + index) ?

I.e. you can defined this generically for all targets.

> +#if defined (__aarch64__)
> +    do_prefetch = is_thunder_pass2_cpu();
> +    if (do_prefetch) {
> +        VEC_PREFETCH(p, 8);
> +        VEC_PREFETCH(p, 16);
> +        VEC_PREFETCH(p, 24);
> +    }
> +#endif
> +
>      for (i = BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR;
>           i < len / sizeof(VECTYPE);
>           i += BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR) {
> +
> +#if defined (__aarch64__)
> +        if (do_prefetch) {
> +            VEC_PREFETCH(p, i+32);
> +            VEC_PREFETCH(p, i+40);
> +        }
> +#endif
> +

Surely we shouldn't be adding a conditional around a prefetch inside of the 
inner loop.  Does it really hurt to perform this prefetch for all aarch64 cpus?

I'll note that you're also prefetching too much, off the end of the block, and 
that you're probably not prefetching far enough.  You'd need to break off the 
last iteration(s) of the loop.

I'll note that you're also prefetching too close.  The loop operates on 
8*vecsize units.  In the case of aarch64, 128 byte units.  Both i+32 and i+40 
are within the current loop.  I believe you want to prefetch at

   i + BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR * N

where N is the number of iterations in advance to be fetched.  Probably N is 1 
or 2, unless the memory controller is really slow.

r~

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v1 2/2] utils: Add prefetch for Thunderx platform
  2016-08-06 10:17   ` Richard Henderson
@ 2016-08-12 11:32     ` Vijay Kilari
  2016-08-12 13:20       ` Richard Henderson
  0 siblings, 1 reply; 10+ messages in thread
From: Vijay Kilari @ 2016-08-12 11:32 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-arm, Peter Maydell, Paolo Bonzini, prasun.kapoor,
	QEMU Developers, Vijaya Kumar K

On Sat, Aug 6, 2016 at 3:47 PM, Richard Henderson <rth@twiddle.net> wrote:
> On 08/02/2016 03:50 PM, vijay.kilari@gmail.com wrote:
>>
>> +#define VEC_PREFETCH(base, index) \
>> +        asm volatile ("prfm pldl1strm, [%x[a]]\n" : :
>> [a]"r"(&base[(index)]))
>
>
> Is this not __builtin_prefetch(base + index) ?
>
> I.e. you can defined this generically for all targets.

__builtin_prefetch() is available only in gcc 5.3 for arm64.

>
>> +#if defined (__aarch64__)
>> +    do_prefetch = is_thunder_pass2_cpu();
>> +    if (do_prefetch) {
>> +        VEC_PREFETCH(p, 8);
>> +        VEC_PREFETCH(p, 16);
>> +        VEC_PREFETCH(p, 24);
>> +    }
>> +#endif
>> +
>>      for (i = BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR;
>>           i < len / sizeof(VECTYPE);
>>           i += BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR) {
>> +
>> +#if defined (__aarch64__)
>> +        if (do_prefetch) {
>> +            VEC_PREFETCH(p, i+32);
>> +            VEC_PREFETCH(p, i+40);
>> +        }
>> +#endif
>> +
>
>
> Surely we shouldn't be adding a conditional around a prefetch inside of the
> inner loop.  Does it really hurt to perform this prefetch for all aarch64
> cpus?

prefetch is only required for thunderx pass2 platform. I will remove this
condition check.

>
> I'll note that you're also prefetching too much, off the end of the block,
> and that you're probably not prefetching far enough.  You'd need to break
> off the last iteration(s) of the loop.
>
> I'll note that you're also prefetching too close.  The loop operates on
> 8*vecsize units.  In the case of aarch64, 128 byte units.  Both i+32 and

128 unit is specific to thunder. I will move this to thunder
specific function

> i+40 are within the current loop.  I believe you want to prefetch at
>

I am dropping i+40

>   i + BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR * N
>
> where N is the number of iterations in advance to be fetched.  Probably N is
> 1 or 2, unless the memory controller is really slow.
>
>
> r~

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v1 2/2] utils: Add prefetch for Thunderx platform
  2016-08-12 11:32     ` Vijay Kilari
@ 2016-08-12 13:20       ` Richard Henderson
  0 siblings, 0 replies; 10+ messages in thread
From: Richard Henderson @ 2016-08-12 13:20 UTC (permalink / raw)
  To: Vijay Kilari
  Cc: qemu-arm, Peter Maydell, Paolo Bonzini, prasun.kapoor,
	QEMU Developers, Vijaya Kumar K

On 08/12/2016 12:32 PM, Vijay Kilari wrote:
> On Sat, Aug 6, 2016 at 3:47 PM, Richard Henderson <rth@twiddle.net> wrote:
>> On 08/02/2016 03:50 PM, vijay.kilari@gmail.com wrote:
>>>
>>> +#define VEC_PREFETCH(base, index) \
>>> +        asm volatile ("prfm pldl1strm, [%x[a]]\n" : :
>>> [a]"r"(&base[(index)]))
>>
>>
>> Is this not __builtin_prefetch(base + index) ?
>>
>> I.e. you can defined this generically for all targets.
>
> __builtin_prefetch() is available only in gcc 5.3 for arm64.

So?  You can't really defend the position that you care about aarch64 code 
quality if you're using gcc 4.x.  Essentially all of the performance work has 
been done for later versions.

>> I'll note that you're also prefetching too much, off the end of the block,
>> and that you're probably not prefetching far enough.  You'd need to break
>> off the last iteration(s) of the loop.
>>
>> I'll note that you're also prefetching too close.  The loop operates on
>> 8*vecsize units.  In the case of aarch64, 128 byte units.  Both i+32 and
>
> 128 unit is specific to thunder. I will move this to thunder
> specific function

No, you misunderstand.

While it's true that thunderx is unique within other aarch64 implementations in 
having a 128-byte cacheline size, the "128" I mention above has nothing to do 
with that.

The loop is operating on BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR bytes, which 
is defined above as 8 * sizeof(vector), which happens to be 128.


r~

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-08-12 14:31 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-02 10:20 [Qemu-devel] [RFC PATCH v1 0/2] Live migration optimization for Thunderx platform vijay.kilari
2016-08-02 10:20 ` [Qemu-devel] [RFC PATCH v1 1/2] utils: Add helper to read arm MIDR_EL1 register vijay.kilari
2016-08-02 10:48   ` Paolo Bonzini
2016-08-04  9:16     ` Vijay Kilari
2016-08-04 12:16       ` Paolo Bonzini
2016-08-02 11:10   ` Peter Maydell
2016-08-02 10:20 ` [Qemu-devel] [RFC PATCH v1 2/2] utils: Add prefetch for Thunderx platform vijay.kilari
2016-08-06 10:17   ` Richard Henderson
2016-08-12 11:32     ` Vijay Kilari
2016-08-12 13:20       ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).