qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC PATCH v2 0/3] ARM64: Live migration optimization
@ 2016-04-07  9:58 vijayak
  2016-04-07  9:58 ` [Qemu-devel] [RFC PATCH v2 1/3] target-arm: Use Neon for zero checking vijayak
  2016-04-07  9:58 ` [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo vijayak
  0 siblings, 2 replies; 23+ messages in thread
From: vijayak @ 2016-04-07  9:58 UTC (permalink / raw)
  To: qemu-arm, peter.maydell, pbonzini
  Cc: Prasun.Kapoor, knv.suresh2009, Vijaya Kumar K, qemu-devel,
	vijay.kilari

From: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>

To optimize Live migration time on ARM64 machine following
changes are made.
 - Neon instructions are used for Zero page checking.
 - Added prefetch for Thunderx platform

With these changes, total migration time comes down
from 10 seconds to 2.5 seconds.

These patches are tested on top of (GICv3 live migration support)
https://lists.gnu.org/archive/html/qemu-devel/2015-10/msg05284.html
However there is no direct dependency on these patches.

v1 -> v2 changes:
----------------
  - Dropped 'target-arm: Update page size for aarch64' patch.
  - Each loop in zero buffer check function is reduced to
    16 from 32.
  - Replaced vorrq_u64 with '|' in Neon macros
  - Renamed local variable to reflect 128 bit.
  - Introduced new file cpuinfo.c to parse /proc/cpuinfo
  - Added Thunderx specific patches to add prefetch in
    zero buffer check function.

Vijay (1):
  target-arm: Use Neon for zero checking

Vijaya Kumar K (2):
  utils: Add cpuinfo helper to fetch /proc/cpuinfo
  utils: Add prefetch for Thunderx platform

 include/qemu-common.h |   12 ++++++
 util/Makefile.objs    |    1 +
 util/cpuinfo.c        |  115 +++++++++++++++++++++++++++++++++++++++++++++++++
 util/cutils.c         |   87 +++++++++++++++++++++++++++++++++++++
 4 files changed, 215 insertions(+)
 create mode 100644 util/cpuinfo.c

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Qemu-devel] [RFC PATCH v2 1/3] target-arm: Use Neon for zero checking
  2016-04-07  9:58 [Qemu-devel] [RFC PATCH v2 0/3] ARM64: Live migration optimization vijayak
@ 2016-04-07  9:58 ` vijayak
  2016-04-07 10:30   ` Paolo Bonzini
                     ` (2 more replies)
  2016-04-07  9:58 ` [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo vijayak
  1 sibling, 3 replies; 23+ messages in thread
From: vijayak @ 2016-04-07  9:58 UTC (permalink / raw)
  To: qemu-arm, peter.maydell, pbonzini
  Cc: vijay.kilari, Prasun.Kapoor, knv.suresh2009, qemu-devel,
	Vijaya Kumar K, Suresh, Vijay

From: Vijay <vijayak@cavium.com>

Use Neon instructions to perform zero checking of
buffer. This is helps in reducing downtime during
live migration.

Signed-off-by: Vijaya Kumar K <vijayak@caviumnetworks.com>
Signed-off-by: Suresh <ksuresh@caviumnetworks.com>
---
 util/cutils.c |   74 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 74 insertions(+)

diff --git a/util/cutils.c b/util/cutils.c
index 43d1afb..bb61c91 100644
--- a/util/cutils.c
+++ b/util/cutils.c
@@ -352,6 +352,80 @@ static void *can_use_buffer_find_nonzero_offset_ifunc(void)
     return func;
 }
 #pragma GCC pop_options
+
+#elif defined __aarch64__
+#include "arm_neon.h"
+
+#define NEON_VECTYPE               uint64x2_t
+#define NEON_LOAD_N_ORR(v1, v2)    (vld1q_u64(&v1) | vld1q_u64(&v2))
+#define NEON_ORR(v1, v2)           ((v1) | (v2))
+#define NEON_NOT_EQ_ZERO(v1) \
+        ((vgetq_lane_u64(v1, 0) != 0) || (vgetq_lane_u64(v1, 1) != 0))
+
+#define BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR_NEON 16
+
+/*
+ * Zero page/buffer checking using SIMD(Neon)
+ */
+
+static bool
+can_use_buffer_find_nonzero_offset_neon(const void *buf, size_t len)
+{
+    return (len % (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR_NEON
+                   * sizeof(NEON_VECTYPE)) == 0
+            && ((uintptr_t) buf) % sizeof(NEON_VECTYPE) == 0);
+}
+
+static size_t buffer_find_nonzero_offset_neon(const void *buf, size_t len)
+{
+    size_t i;
+    NEON_VECTYPE qword0, qword1, qword2, qword3, qword4, qword5, qword6;
+    uint64_t const *data = buf;
+
+    if (!len) {
+        return 0;
+    }
+
+    assert(can_use_buffer_find_nonzero_offset_neon(buf, len));
+    len /= sizeof(unsigned long);
+
+    for (i = 0; i < len; i += BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR_NEON) {
+        qword0 = NEON_LOAD_N_ORR(data[i], data[i + 2]);
+        qword1 = NEON_LOAD_N_ORR(data[i + 4], data[i + 6]);
+        qword2 = NEON_LOAD_N_ORR(data[i + 8], data[i + 10]);
+        qword3 = NEON_LOAD_N_ORR(data[i + 12], data[i + 14]);
+        qword4 = NEON_ORR(qword0, qword1);
+        qword5 = NEON_ORR(qword2, qword3);
+        qword6 = NEON_ORR(qword4, qword5);
+
+        if (NEON_NOT_EQ_ZERO(qword6)) {
+            break;
+        }
+    }
+
+    return i * sizeof(unsigned long);
+}
+
+static inline bool neon_support(void)
+{
+    /*
+     * Check if neon feature is supported.
+     * By default neon is supported for aarch64.
+     */
+    return true;
+}
+
+bool can_use_buffer_find_nonzero_offset(const void *buf, size_t len)
+{
+    return neon_support() ? can_use_buffer_find_nonzero_offset_neon(buf, len) :
+           can_use_buffer_find_nonzero_offset_inner(buf, len);
+}
+
+size_t buffer_find_nonzero_offset(const void *buf, size_t len)
+{
+    return neon_support() ? buffer_find_nonzero_offset_neon(buf, len) :
+           buffer_find_nonzero_offset_inner(buf, len);
+}
 #else
 bool can_use_buffer_find_nonzero_offset(const void *buf, size_t len)
 {
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo
  2016-04-07  9:58 [Qemu-devel] [RFC PATCH v2 0/3] ARM64: Live migration optimization vijayak
  2016-04-07  9:58 ` [Qemu-devel] [RFC PATCH v2 1/3] target-arm: Use Neon for zero checking vijayak
@ 2016-04-07  9:58 ` vijayak
  2016-04-07 10:11   ` Peter Maydell
  1 sibling, 1 reply; 23+ messages in thread
From: vijayak @ 2016-04-07  9:58 UTC (permalink / raw)
  To: qemu-arm, peter.maydell, pbonzini
  Cc: vijay.kilari, Prasun.Kapoor, knv.suresh2009, Vijaya Kumar K,
	qemu-devel, Vijaya Kumar K, Suresh

From: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>

utils cannot read target cpu information to
fetch cpu information to implement cpu specific
features or erratas. For this parse /proc/cpuinfo
and fetch cpu information.

For now this helper only fetches cpu information
for arm architectures.

Signed-off-by: Vijaya Kumar K <vijayak@caviumnetworks.com>
Signed-off-by: Suresh <ksuresh@caviumnetworks.com>
---
 include/qemu-common.h |   11 ++++++
 util/Makefile.objs    |    1 +
 util/cpuinfo.c        |   94 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 106 insertions(+)

diff --git a/include/qemu-common.h b/include/qemu-common.h
index 163bcbb..364aa0a 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -170,4 +170,15 @@ void page_size_init(void);
  * returned. */
 bool dump_in_progress(void);
 
+/*
+ * cpu info structure read from /proc/cpuinfo
+ */
+
+struct cpu_info {
+    uint32_t imp;
+    uint32_t arch;
+    uint32_t part;
+};
+
+void qemu_read_cpu_info(struct cpu_info *cinf);
 #endif
diff --git a/util/Makefile.objs b/util/Makefile.objs
index a8a777e..59cde64 100644
--- a/util/Makefile.objs
+++ b/util/Makefile.objs
@@ -32,3 +32,4 @@ util-obj-y += buffer.o
 util-obj-y += timed-average.o
 util-obj-y += base64.o
 util-obj-y += log.o
+util-obj-y += cpuinfo.o
diff --git a/util/cpuinfo.c b/util/cpuinfo.c
new file mode 100644
index 0000000..e049672
--- /dev/null
+++ b/util/cpuinfo.c
@@ -0,0 +1,94 @@
+/*
+ * Dealing with /proc/cpuinfo
+ *
+ * Copyright (C) 2016 Cavium, Inc.
+ *
+ * Authors:
+ *  Vijaya Kumar K <vijayak@caviumnetworks.com>
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.1
+ * or later.  See the COPYING.LIB file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include <string.h>
+
+#if defined(__arm__) || defined(__aarch64__)
+static uint32_t read_arm_cpu_implementer(char *str)
+{
+    char *match;
+    uint32_t imp = 0;
+
+    match = strstr(str, "CPU implementer");
+    if (match != NULL) {
+        sscanf(match, "CPU implementer : 0x%x", &imp);
+    }
+
+    return imp;
+}
+
+static uint32_t read_arm_cpu_architecture(char *str)
+{
+    char *match;
+    uint32_t arch = 0;
+
+    match = strstr(str, "CPU architecture");
+    if (match != NULL) {
+        sscanf(match, "CPU architecture: %d", &arch);
+    }
+
+    return arch;
+}
+
+static uint32_t read_arm_cpu_part(char *str)
+{
+    char *match;
+    uint32_t part = 0;
+
+    match = strstr(str, "CPU part");
+    if (match != NULL) {
+        sscanf(match, "CPU part : 0x%x", &part);
+    }
+
+    return part;
+}
+#endif
+
+void qemu_read_cpu_info(struct cpu_info *cinf)
+{
+    FILE *fp;
+    char *buf;
+#define BUF_SIZE 1024
+    size_t bytes_read;
+
+    cinf->imp = cinf->arch = cinf->part = 0;
+    fp = fopen("/proc/cpuinfo", "r");
+    if (!fp) {
+        return;
+    }
+
+    buf = g_malloc0(BUF_SIZE);
+    if (!buf) {
+        fclose(fp);
+        return;
+    }
+
+    /* Read the contents of /proc/cpuinfo into the buffer. */
+    bytes_read = fread(buf, 1, BUF_SIZE, fp);
+    fclose(fp);
+
+    if (bytes_read == 0) {
+        g_free(buf);
+        return;
+    }
+
+    buf[bytes_read] = '\0';
+
+#if defined(__arm__) || defined(__aarch64__)
+    cinf->imp = read_arm_cpu_implementer(buf);
+    cinf->arch = read_arm_cpu_architecture(buf);
+    cinf->part = read_arm_cpu_part(buf);
+#endif
+    g_free(buf);
+}
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo
  2016-04-07  9:58 ` [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo vijayak
@ 2016-04-07 10:11   ` Peter Maydell
  2016-04-07 10:56     ` Vijay Kilari
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Maydell @ 2016-04-07 10:11 UTC (permalink / raw)
  To: Vijaya Kumar K
  Cc: Vijay Kilari, Prasun Kapoor, knv.suresh2009, Vijaya Kumar K,
	QEMU Developers, qemu-arm, Suresh, Paolo Bonzini

On 7 April 2016 at 10:58,  <vijayak@caviumnetworks.com> wrote:
> From: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
>
> utils cannot read target cpu information to
> fetch cpu information to implement cpu specific
> features or erratas. For this parse /proc/cpuinfo
> and fetch cpu information.
>
> For now this helper only fetches cpu information
> for arm architectures.

As I understand it /proc/cpuinfo is intended only for
humans to read. Please don't write code to parse it;
find a different way to get this information instead
if you really need it.

(I'm not really happy about such specific-to-a-particular-vendor
patches in QEMU anyway; we should have migration code that
works acceptably for any implementation.)

thanks
-- PMM

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 1/3] target-arm: Use Neon for zero checking
  2016-04-07  9:58 ` [Qemu-devel] [RFC PATCH v2 1/3] target-arm: Use Neon for zero checking vijayak
@ 2016-04-07 10:30   ` Paolo Bonzini
  2016-04-07 10:44     ` Peter Maydell
  2016-04-07 10:44   ` Peter Maydell
  2016-04-09 22:45   ` Richard Henderson
  2 siblings, 1 reply; 23+ messages in thread
From: Paolo Bonzini @ 2016-04-07 10:30 UTC (permalink / raw)
  To: vijayak
  Cc: peter maydell, vijay kilari, Prasun Kapoor, knv suresh2009,
	qemu-devel, qemu-arm, Suresh, Vijay


> +#elif defined __aarch64__
> +#include "arm_neon.h"
> +
> +#define NEON_VECTYPE               uint64x2_t
> +#define NEON_LOAD_N_ORR(v1, v2)    (vld1q_u64(&v1) | vld1q_u64(&v2))

Why is the load and orr necessary?  Is ((v1) | (v2)) enough?

> +#define NEON_ORR(v1, v2)           ((v1) | (v2))
> +#define NEON_NOT_EQ_ZERO(v1) \
> +        ((vgetq_lane_u64(v1, 0) != 0) || (vgetq_lane_u64(v1, 1) != 0))
> +
> +#define BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR_NEON 16

Unless you have numbers saying that a 16-unroll is better than an 8-unroll
(and then you should put those in the commit message), you do not need to
duplicate code, just add aarch64 definitions for the existing code.

---

I've now read the rest of the patches, and you're adding prefetch code
that is ARM-specific.  Please provide numbers separately for each
patch, not just in the cover letter.  The cover letter is lost when the
patch is committed, while the commit messages remain.

On top of this, "With these changes, total migration time comes down
from 10 seconds to 2.5 seconds" is not a reproducible experiment.
What was the RAM size?  Was the guest just booted and idle, or was
there a workload?  What was the host?

Thanks,

Paolo

> +/*
> + * Zero page/buffer checking using SIMD(Neon)
> + */
> +
> +static bool
> +can_use_buffer_find_nonzero_offset_neon(const void *buf, size_t len)
> +{
> +    return (len % (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR_NEON
> +                   * sizeof(NEON_VECTYPE)) == 0
> +            && ((uintptr_t) buf) % sizeof(NEON_VECTYPE) == 0);
> +}
> +
> +static size_t buffer_find_nonzero_offset_neon(const void *buf, size_t len)
> +{
> +    size_t i;
> +    NEON_VECTYPE qword0, qword1, qword2, qword3, qword4, qword5, qword6;
> +    uint64_t const *data = buf;
> +
> +    if (!len) {
> +        return 0;
> +    }
> +
> +    assert(can_use_buffer_find_nonzero_offset_neon(buf, len));
> +    len /= sizeof(unsigned long);
> +
> +    for (i = 0; i < len; i += BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR_NEON)
> {
> +        qword0 = NEON_LOAD_N_ORR(data[i], data[i + 2]);
> +        qword1 = NEON_LOAD_N_ORR(data[i + 4], data[i + 6]);
> +        qword2 = NEON_LOAD_N_ORR(data[i + 8], data[i + 10]);
> +        qword3 = NEON_LOAD_N_ORR(data[i + 12], data[i + 14]);
> +        qword4 = NEON_ORR(qword0, qword1);
> +        qword5 = NEON_ORR(qword2, qword3);
> +        qword6 = NEON_ORR(qword4, qword5);
> +
> +        if (NEON_NOT_EQ_ZERO(qword6)) {
> +            break;
> +        }
> +    }
> +
> +    return i * sizeof(unsigned long);
> +}
> +
> +static inline bool neon_support(void)
> +{
> +    /*
> +     * Check if neon feature is supported.
> +     * By default neon is supported for aarch64.
> +     */
> +    return true;

Then everything below this function is not necessary.

Paolo

> +}
> +
> +bool can_use_buffer_find_nonzero_offset(const void *buf, size_t len)
> +{
> +    return neon_support() ? can_use_buffer_find_nonzero_offset_neon(buf,
> len) :
> +           can_use_buffer_find_nonzero_offset_inner(buf, len);
> +}
> +
> +size_t buffer_find_nonzero_offset(const void *buf, size_t len)
> +{
> +    return neon_support() ? buffer_find_nonzero_offset_neon(buf, len) :
> +           buffer_find_nonzero_offset_inner(buf, len);
> +}
>  #else
>  bool can_use_buffer_find_nonzero_offset(const void *buf, size_t len)
>  {
> --
> 1.7.9.5
> 
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 1/3] target-arm: Use Neon for zero checking
  2016-04-07 10:30   ` Paolo Bonzini
@ 2016-04-07 10:44     ` Peter Maydell
  0 siblings, 0 replies; 23+ messages in thread
From: Peter Maydell @ 2016-04-07 10:44 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: vijay kilari, Vijaya Kumar K, knv suresh2009, QEMU Developers,
	Prasun Kapoor, qemu-arm, Suresh, Vijay

On 7 April 2016 at 11:30, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>> +#elif defined __aarch64__
>> +#include "arm_neon.h"
>> +
>> +#define NEON_VECTYPE               uint64x2_t
>> +#define NEON_LOAD_N_ORR(v1, v2)    (vld1q_u64(&v1) | vld1q_u64(&v2))
>
> Why is the load and orr necessary?  Is ((v1) | (v2)) enough?
>
>> +#define NEON_ORR(v1, v2)           ((v1) | (v2))
>> +#define NEON_NOT_EQ_ZERO(v1) \
>> +        ((vgetq_lane_u64(v1, 0) != 0) || (vgetq_lane_u64(v1, 1) != 0))
>> +
>> +#define BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR_NEON 16
>
> Unless you have numbers saying that a 16-unroll is better than an 8-unroll
> (and then you should put those in the commit message), you do not need to
> duplicate code, just add aarch64 definitions for the existing code.

This pure-neon code is also not doing the initial short-loop to
test for non-zero buffers, which means it's not an apples-to-apples
comparison. It seems unlikely that workload balances are going
to be different on ARM vs x86 such that it's worth doing the
small loop on one but not the other. (This is also why it's helpful
to explain your benchmarking method -- the short loop will slow
things down for some cases like "large and untouched RAM", but be
faster again for cases like "large RAM of which most pages have
been dirtied".)

thanks
-- PMM

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 1/3] target-arm: Use Neon for zero checking
  2016-04-07  9:58 ` [Qemu-devel] [RFC PATCH v2 1/3] target-arm: Use Neon for zero checking vijayak
  2016-04-07 10:30   ` Paolo Bonzini
@ 2016-04-07 10:44   ` Peter Maydell
  2016-04-09 22:45   ` Richard Henderson
  2 siblings, 0 replies; 23+ messages in thread
From: Peter Maydell @ 2016-04-07 10:44 UTC (permalink / raw)
  To: Vijaya Kumar K
  Cc: Vijay Kilari, Prasun Kapoor, knv.suresh2009, QEMU Developers,
	qemu-arm, Vijay, Suresh, Paolo Bonzini

On 7 April 2016 at 10:58,  <vijayak@caviumnetworks.com> wrote:
> From: Vijay <vijayak@cavium.com>
>
> Use Neon instructions to perform zero checking of
> buffer. This is helps in reducing downtime during
> live migration.
>
> Signed-off-by: Vijaya Kumar K <vijayak@caviumnetworks.com>
> Signed-off-by: Suresh <ksuresh@caviumnetworks.com>
> ---
>  util/cutils.c |   74 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 74 insertions(+)
>
> diff --git a/util/cutils.c b/util/cutils.c
> index 43d1afb..bb61c91 100644
> --- a/util/cutils.c
> +++ b/util/cutils.c
> @@ -352,6 +352,80 @@ static void *can_use_buffer_find_nonzero_offset_ifunc(void)
>      return func;
>  }
>  #pragma GCC pop_options
> +
> +#elif defined __aarch64__
> +#include "arm_neon.h"
> +
> +#define NEON_VECTYPE               uint64x2_t
> +#define NEON_LOAD_N_ORR(v1, v2)    (vld1q_u64(&v1) | vld1q_u64(&v2))
> +#define NEON_ORR(v1, v2)           ((v1) | (v2))
> +#define NEON_NOT_EQ_ZERO(v1) \
> +        ((vgetq_lane_u64(v1, 0) != 0) || (vgetq_lane_u64(v1, 1) != 0))
> +
> +#define BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR_NEON 16

This says 16 lots of loads of uint64x2_t...

> +    for (i = 0; i < len; i += BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR_NEON) {
> +        qword0 = NEON_LOAD_N_ORR(data[i], data[i + 2]);
> +        qword1 = NEON_LOAD_N_ORR(data[i + 4], data[i + 6]);
> +        qword2 = NEON_LOAD_N_ORR(data[i + 8], data[i + 10]);
> +        qword3 = NEON_LOAD_N_ORR(data[i + 12], data[i + 14]);
> +        qword4 = NEON_ORR(qword0, qword1);
> +        qword5 = NEON_ORR(qword2, qword3);
> +        qword6 = NEON_ORR(qword4, qword5);

...but the loop is only loading 8 lots of uint64x2_t.


thanks
-- PMM

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo
  2016-04-07 10:11   ` Peter Maydell
@ 2016-04-07 10:56     ` Vijay Kilari
  2016-04-07 11:45       ` Peter Maydell
  0 siblings, 1 reply; 23+ messages in thread
From: Vijay Kilari @ 2016-04-07 10:56 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Vijaya Kumar K, suresh knv, Vijaya Kumar K, QEMU Developers,
	Prasun Kapoor, qemu-arm, Suresh, Paolo Bonzini

On Thu, Apr 7, 2016 at 3:41 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 7 April 2016 at 10:58,  <vijayak@caviumnetworks.com> wrote:
>> From: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
>>
>> utils cannot read target cpu information to
>> fetch cpu information to implement cpu specific
>> features or erratas. For this parse /proc/cpuinfo
>> and fetch cpu information.
>>
>> For now this helper only fetches cpu information
>> for arm architectures.
>
> As I understand it /proc/cpuinfo is intended only for
> humans to read. Please don't write code to parse it;
> find a different way to get this information instead
> if you really need it.

  The utils code does not accept any dependency with target specific code.
The libqemuutil.a is compiled and linked before target specific
code is compiled.

  Also, utils functions neither have any cpu object to fetch
cpu identification information (ex: midr in case of arm) to identify the
cpu information nor utils cannot make any ioctl to read cpu information
from qemu.

Also unlike x86 there is no cpuid.h where we can get cpu identification
information for arm64.

So, I think userspace process can rely on /proc/cpuinfo for
fetching cpu information.

>
> (I'm not really happy about such specific-to-a-particular-vendor
> patches in QEMU anyway; we should have migration code that
> works acceptably for any implementation.)
>
> thanks
> -- PMM

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo
  2016-04-07 10:56     ` Vijay Kilari
@ 2016-04-07 11:45       ` Peter Maydell
  2016-04-08  6:21         ` Vijay Kilari
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Maydell @ 2016-04-07 11:45 UTC (permalink / raw)
  To: Vijay Kilari
  Cc: Vijaya Kumar K, suresh knv, Vijaya Kumar K, QEMU Developers,
	Prasun Kapoor, qemu-arm, Suresh, Paolo Bonzini

On 7 April 2016 at 11:56, Vijay Kilari <vijay.kilari@gmail.com> wrote:
> On Thu, Apr 7, 2016 at 3:41 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
>> On 7 April 2016 at 10:58,  <vijayak@caviumnetworks.com> wrote:
>>> From: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
>>>
>>> utils cannot read target cpu information to
>>> fetch cpu information to implement cpu specific
>>> features or erratas. For this parse /proc/cpuinfo
>>> and fetch cpu information.
>>>
>>> For now this helper only fetches cpu information
>>> for arm architectures.
>>
>> As I understand it /proc/cpuinfo is intended only for
>> humans to read. Please don't write code to parse it;
>> find a different way to get this information instead
>> if you really need it.

> Also unlike x86 there is no cpuid.h where we can get cpu identification
> information for arm64.

I'm told there are kernel patches in progress to get this sort
of information in a maintainable way to userspace, which are
currently somewhat stalled due to lack of anybody who wants to
consume it. If you have a use case then you should probably
flag it up with the kernel devs.

That said, I think we should probably hold off on this
discussion until we have clearer benchmarking info that
demonstrates that doing these prefetches really does make
a significant difference. I would much prefer to have a
single aarch64 routine that works for everybody, rather
than a thunderx-only special case.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo
  2016-04-07 11:45       ` Peter Maydell
@ 2016-04-08  6:21         ` Vijay Kilari
  2016-04-08  9:43           ` Peter Maydell
  0 siblings, 1 reply; 23+ messages in thread
From: Vijay Kilari @ 2016-04-08  6:21 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Vijaya Kumar K, qemu-arm, Paolo Bonzini, QEMU Developers,
	Prasun Kapoor, suresh knv, Vijaya Kumar K, Suresh

Hi Peter,

On Thu, Apr 7, 2016 at 5:15 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 7 April 2016 at 11:56, Vijay Kilari <vijay.kilari@gmail.com> wrote:
>> On Thu, Apr 7, 2016 at 3:41 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
>>> On 7 April 2016 at 10:58,  <vijayak@caviumnetworks.com> wrote:
>>>> From: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
>>>>
>>>> utils cannot read target cpu information to
>>>> fetch cpu information to implement cpu specific
>>>> features or erratas. For this parse /proc/cpuinfo
>>>> and fetch cpu information.
>>>>
>>>> For now this helper only fetches cpu information
>>>> for arm architectures.
>>>
>>> As I understand it /proc/cpuinfo is intended only for
>>> humans to read. Please don't write code to parse it;
>>> find a different way to get this information instead
>>> if you really need it.
>
>> Also unlike x86 there is no cpuid.h where we can get cpu identification
>> information for arm64.
>
> I'm told there are kernel patches in progress to get this sort
> of information in a maintainable way to userspace, which are
> currently somewhat stalled due to lack of anybody who wants to
> consume it. If you have a use case then you should probably
> flag it up with the kernel devs.

Can you please give references to those patches/discussion?

>
> That said, I think we should probably hold off on this
> discussion until we have clearer benchmarking info that
> demonstrates that doing these prefetches really does make
> a significant difference. I would much prefer to have a


Thunderx pass2 board does not have hardware prefetch. So
explicit sw prefetch instructions is required for this platform.
Here is the benchmarking result with and without prefetch.
of an idle VM with 4 VCPUS, 8GB RAM.

Without prefech, total migration time is 8.2 seconds
With prefetch total migration time is 2.7 seconds.

Without prefetch:
------------------------

(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off
zero-blocks: off compress: off events: off x-postcopy-ram: off
Migration status: completed
total time: 8217 milliseconds
downtime: 86 milliseconds
setup: 4 milliseconds
transferred ram: 212624 kbytes
throughput: 212.08 mbps
remaining ram: 0 kbytes
total ram: 8520128 kbytes
duplicate: 2085805 pages
skipped: 0 pages
normal: 48478 pages
normal bytes: 193912 kbytes
dirty sync count: 3

With prefetch:
--------------------
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off
zero-blocks: off compress: off events: off x-postcopy-ram: off
Migration status: completed
total time: 2744 milliseconds
downtime: 48 milliseconds
setup: 5 milliseconds
transferred ram: 213526 kbytes
throughput: 637.76 mbps
remaining ram: 0 kbytes
total ram: 8520128 kbytes
duplicate: 2085014 pages
skipped: 0 pages
normal: 48705 pages
normal bytes: 194820 kbytes
dirty sync count: 3

> single aarch64 routine that works for everybody, rather
> than a thunderx-only special case.

Now, I found that the generic existings function by name
buffer_find_nonzero_offset_inner()
 can be made to work with neon. So no need of special function by name
buffer_find_nonzero_offset_neon() for arm64 creating in this patch series.
However, adding prefetch code needs to be added for performance
reason.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo
  2016-04-08  6:21         ` Vijay Kilari
@ 2016-04-08  9:43           ` Peter Maydell
  2016-04-11  6:52             ` Vijay Kilari
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Maydell @ 2016-04-08  9:43 UTC (permalink / raw)
  To: Vijay Kilari
  Cc: Vijaya Kumar K, qemu-arm, Paolo Bonzini, QEMU Developers,
	Prasun Kapoor, suresh knv, Vijaya Kumar K, Suresh

On 8 April 2016 at 07:21, Vijay Kilari <vijay.kilari@gmail.com> wrote:
> On Thu, Apr 7, 2016 at 5:15 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
>> I'm told there are kernel patches in progress to get this sort
>> of information in a maintainable way to userspace, which are
>> currently somewhat stalled due to lack of anybody who wants to
>> consume it. If you have a use case then you should probably
>> flag it up with the kernel devs.
>
> Can you please give references to those patches/discussion?

I'm told the most recent thread is https://lkml.org/lkml/2015/10/5/517
(and that most of the patches in that series have gone in, except
for the last 4 or 5 which implement the ABI).

thanks
-- PMM

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 1/3] target-arm: Use Neon for zero checking
  2016-04-07  9:58 ` [Qemu-devel] [RFC PATCH v2 1/3] target-arm: Use Neon for zero checking vijayak
  2016-04-07 10:30   ` Paolo Bonzini
  2016-04-07 10:44   ` Peter Maydell
@ 2016-04-09 22:45   ` Richard Henderson
  2016-04-11 10:40     ` Peter Maydell
  2 siblings, 1 reply; 23+ messages in thread
From: Richard Henderson @ 2016-04-09 22:45 UTC (permalink / raw)
  To: vijayak, qemu-arm, peter.maydell, pbonzini
  Cc: vijay.kilari, Prasun.Kapoor, knv.suresh2009, qemu-devel, Suresh,
	Vijay

On 04/07/2016 02:58 AM, vijayak@caviumnetworks.com wrote:
> +#elif defined __aarch64__
> +#include "arm_neon.h"

A better test is __NEON__, which asserts that neon is available at compile time 
(which will be true basically always for aarch64), and then you don't need a 
runime test for neon.

You also get support for armv7 with neon.

> +#define NEON_VECTYPE               uint64x2_t
> +#define NEON_LOAD_N_ORR(v1, v2)    (vld1q_u64(&v1) | vld1q_u64(&v2))
> +#define NEON_ORR(v1, v2)           ((v1) | (v2))
> +#define NEON_NOT_EQ_ZERO(v1) \
> +        ((vgetq_lane_u64(v1, 0) != 0) || (vgetq_lane_u64(v1, 1) != 0))

FWIW, I think that vmaxvq_u32 would be a better reduction for aarch64. 
Extracting the individual lanes isn't as efficient as one would like.

For armv7, folding via vget_lane_u64(vget_high_u64(v1) | vget_low_u64(v1), 0) 
is probably best.


r~

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo
  2016-04-08  9:43           ` Peter Maydell
@ 2016-04-11  6:52             ` Vijay Kilari
  2016-04-11  9:37               ` Suzuki K Poulose
  0 siblings, 1 reply; 23+ messages in thread
From: Vijay Kilari @ 2016-04-11  6:52 UTC (permalink / raw)
  To: Peter Maydell, suzuki.poulose
  Cc: Vijaya Kumar K, qemu-arm, Paolo Bonzini, QEMU Developers,
	Prasun Kapoor, suresh knv, Vijaya Kumar K, Suresh

Adding Suzuki Poulose.

Hi Suzuki,

On Fri, Apr 8, 2016 at 3:13 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 8 April 2016 at 07:21, Vijay Kilari <vijay.kilari@gmail.com> wrote:
>> On Thu, Apr 7, 2016 at 5:15 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
>>> I'm told there are kernel patches in progress to get this sort
>>> of information in a maintainable way to userspace, which are
>>> currently somewhat stalled due to lack of anybody who wants to
>>> consume it. If you have a use case then you should probably
>>> flag it up with the kernel devs.
>>
>> Can you please give references to those patches/discussion?
>
> I'm told the most recent thread is https://lkml.org/lkml/2015/10/5/517
> (and that most of the patches in that series have gone in, except
> for the last 4 or 5 which implement the ABI).

Can you please throw some light on what is the status of ABI to
read cpu information in user space.
I wanted to know cpu implementer, part number in QEMU utils
to add prefetches to speed up live migration for Thunderx platform.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo
  2016-04-11  6:52             ` Vijay Kilari
@ 2016-04-11  9:37               ` Suzuki K Poulose
  2016-04-13  9:54                 ` Vijay Kilari
  0 siblings, 1 reply; 23+ messages in thread
From: Suzuki K Poulose @ 2016-04-11  9:37 UTC (permalink / raw)
  To: Vijay Kilari, Peter Maydell
  Cc: Vijaya Kumar K, qemu-arm, Paolo Bonzini, QEMU Developers,
	Prasun Kapoor, suresh knv, Vijaya Kumar K, Suresh,
	Catalin Marinas, Will Deacon

On 11/04/16 07:52, Vijay Kilari wrote:
> Adding Suzuki Poulose.
>
> Hi Suzuki,
>
> On Fri, Apr 8, 2016 at 3:13 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
>> On 8 April 2016 at 07:21, Vijay Kilari <vijay.kilari@gmail.com> wrote:
>>> On Thu, Apr 7, 2016 at 5:15 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
>>>> I'm told there are kernel patches in progress to get this sort
>>>> of information in a maintainable way to userspace, which are
>>>> currently somewhat stalled due to lack of anybody who wants to
>>>> consume it. If you have a use case then you should probably
>>>> flag it up with the kernel devs.
>>>
>>> Can you please give references to those patches/discussion?
>>
>> I'm told the most recent thread is https://lkml.org/lkml/2015/10/5/517
>> (and that most of the patches in that series have gone in, except
>> for the last 4 or 5 which implement the ABI).
>
> Can you please throw some light on what is the status of ABI to
> read cpu information in user space.
> I wanted to know cpu implementer, part number in QEMU utils
> to add prefetches to speed up live migration for Thunderx platform.
>

As for the patch series, except for that last 5 patches (which actually implements
the ABI), the infrastructure patches have been merged in v4.4.

We are awaiting feedback from possible consumers like toolchain (gcc, glibc).
If you think this will be suitable for you, thats good to know. There is
documentation available in the last patch in the above series. Could you please
try the series (on v4.4, which would be easier, by simply picking up the last
5 patches) and let us know if that works for you ?

Cheers
Suzuki

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 1/3] target-arm: Use Neon for zero checking
  2016-04-09 22:45   ` Richard Henderson
@ 2016-04-11 10:40     ` Peter Maydell
  0 siblings, 0 replies; 23+ messages in thread
From: Peter Maydell @ 2016-04-11 10:40 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Vijaya Kumar K, qemu-arm, Paolo Bonzini, Vijay Kilari,
	Prasun Kapoor, suresh knv, QEMU Developers, Suresh, Vijay

On 9 April 2016 at 23:45, Richard Henderson <rth@twiddle.net> wrote:
> On 04/07/2016 02:58 AM, vijayak@caviumnetworks.com wrote:
>>
>> +#elif defined __aarch64__
>> +#include "arm_neon.h"
>
>
> A better test is __NEON__, which asserts that neon is available at compile
> time (which will be true basically always for aarch64), and then you don't
> need a runime test for neon.

You don't need a runtime test for neon on aarch64 anyway, because
it will always be present.

> You also get support for armv7 with neon.

But if you do care about armv7 then you do need a runtime test,
because the defacto standard compile options are for armhf which
has FP but doesn't assume Neon.

Personally I think we should not worry about armv7 here, because
it's not actually a likely virtualization server platform, and
we shouldn't include code in QEMU we're not even compile testing.
So I think __aarch64__ here is fine.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo
  2016-04-11  9:37               ` Suzuki K Poulose
@ 2016-04-13  9:54                 ` Vijay Kilari
  2016-04-13  9:59                   ` Suzuki K Poulose
  0 siblings, 1 reply; 23+ messages in thread
From: Vijay Kilari @ 2016-04-13  9:54 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: Peter Maydell, Vijaya Kumar K, qemu-arm, Paolo Bonzini,
	QEMU Developers, Prasun Kapoor, suresh knv, Vijaya Kumar K,
	Suresh, Catalin Marinas, Will Deacon

On Mon, Apr 11, 2016 at 3:07 PM, Suzuki K Poulose
<Suzuki.Poulose@arm.com> wrote:
> On 11/04/16 07:52, Vijay Kilari wrote:
>>
>> Adding Suzuki Poulose.
>>
>> Hi Suzuki,
>>
>> On Fri, Apr 8, 2016 at 3:13 PM, Peter Maydell <peter.maydell@linaro.org>
>> wrote:
>>>
>>> On 8 April 2016 at 07:21, Vijay Kilari <vijay.kilari@gmail.com> wrote:
>>>>
>>>> On Thu, Apr 7, 2016 at 5:15 PM, Peter Maydell <peter.maydell@linaro.org>
>>>> wrote:
>>>>>
>>>>> I'm told there are kernel patches in progress to get this sort
>>>>> of information in a maintainable way to userspace, which are
>>>>> currently somewhat stalled due to lack of anybody who wants to
>>>>> consume it. If you have a use case then you should probably
>>>>> flag it up with the kernel devs

Hi Peter,

   Looks like getting Suzuki's patches merged might take some time.
I propose to use /proc/cpuinfo for now and later I can move to using
Suzuki's way.


>>>>
>>>>
>>>> Can you please give references to those patches/discussion?
>>>
>>>
>>> I'm told the most recent thread is https://lkml.org/lkml/2015/10/5/517
>>> (and that most of the patches in that series have gone in, except
>>> for the last 4 or 5 which implement the ABI).
>>
>>
>> Can you please throw some light on what is the status of ABI to
>> read cpu information in user space.
>> I wanted to know cpu implementer, part number in QEMU utils
>> to add prefetches to speed up live migration for Thunderx platform.
>>
>
> As for the patch series, except for that last 5 patches (which actually
> implements
> the ABI), the infrastructure patches have been merged in v4.4.
>
> We are awaiting feedback from possible consumers like toolchain (gcc,
> glibc).
> If you think this will be suitable for you, thats good to know. There is
> documentation available in the last patch in the above series. Could you
> please
> try the series (on v4.4, which would be easier, by simply picking up the
> last
> 5 patches) and let us know if that works for you ?

Hi Suzuki,

 The last 5 patches are not compiling on v4.4. Looks like your patch
series is not merged completely. Can you please
rebase your patches and let me know.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo
  2016-04-13  9:54                 ` Vijay Kilari
@ 2016-04-13  9:59                   ` Suzuki K Poulose
  2016-05-09  3:30                     ` Vijay Kilari
  0 siblings, 1 reply; 23+ messages in thread
From: Suzuki K Poulose @ 2016-04-13  9:59 UTC (permalink / raw)
  To: Vijay Kilari
  Cc: Peter Maydell, Vijaya Kumar K, qemu-arm, Paolo Bonzini,
	QEMU Developers, Prasun Kapoor, suresh knv, Vijaya Kumar K,
	Suresh, Catalin Marinas, Will Deacon

On 13/04/16 10:54, Vijay Kilari wrote:
> On Mon, Apr 11, 2016 at 3:07 PM, Suzuki K Poulose
> <Suzuki.Poulose@arm.com> wrote:
>> On 11/04/16 07:52, Vijay Kilari wrote:

>
> Hi Suzuki,
>
>   The last 5 patches are not compiling on v4.4. Looks like your patch
> series is not merged completely. Can you please
> rebase your patches and let me know.
>

Could you please give the tree below a try ?

git://linux-arm.org/linux-skp.git cpu-ftr/v3-4.3-rc4

Cheers
Suzuki

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo
  2016-04-13  9:59                   ` Suzuki K Poulose
@ 2016-05-09  3:30                     ` Vijay Kilari
  2016-05-09 10:59                       ` Suzuki K Poulose
  0 siblings, 1 reply; 23+ messages in thread
From: Vijay Kilari @ 2016-05-09  3:30 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: Peter Maydell, Vijaya Kumar K, qemu-arm, Paolo Bonzini,
	QEMU Developers, Prasun Kapoor, suresh knv, Vijaya Kumar K,
	Suresh, Catalin Marinas, Will Deacon

Hi Suzuki/Peter,

On Wed, Apr 13, 2016 at 5:59 PM, Suzuki K Poulose
<Suzuki.Poulose@arm.com> wrote:
> On 13/04/16 10:54, Vijay Kilari wrote:
>>
>> On Mon, Apr 11, 2016 at 3:07 PM, Suzuki K Poulose
>> <Suzuki.Poulose@arm.com> wrote:
>>>
>>> On 11/04/16 07:52, Vijay Kilari wrote:
>
>
>>
>> Hi Suzuki,
>>
>>   The last 5 patches are not compiling on v4.4. Looks like your patch
>> series is not merged completely. Can you please
>> rebase your patches and let me know.
>>
>
> Could you please give the tree below a try ?
>
> git://linux-arm.org/linux-skp.git cpu-ftr/v3-4.3-rc4

This works.
Now the question is, Are your patches getting merged anytime soon?.
If not, I prefer to go with /proc/cpuinfo.

Another solution is look for /sys/devices/system/cpu/cpu$ID/identification/midr
if not available then fall back on /proc/cpuinfo.

Regards
Vijay

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo
  2016-05-09  3:30                     ` Vijay Kilari
@ 2016-05-09 10:59                       ` Suzuki K Poulose
  2016-05-09 11:21                         ` Peter Maydell
  0 siblings, 1 reply; 23+ messages in thread
From: Suzuki K Poulose @ 2016-05-09 10:59 UTC (permalink / raw)
  To: Vijay Kilari, Catalin Marinas, Will Deacon
  Cc: Peter Maydell, Vijaya Kumar K, qemu-arm, Paolo Bonzini,
	QEMU Developers, Prasun Kapoor, suresh knv, Vijaya Kumar K,
	Suresh

On 09/05/16 04:30, Vijay Kilari wrote:
>>> Hi Suzuki,
>>>
>>>    The last 5 patches are not compiling on v4.4. Looks like your patch
>>> series is not merged completely. Can you please
>>> rebase your patches and let me know.
>>>
>>
>> Could you please give the tree below a try ?
>>
>> git://linux-arm.org/linux-skp.git cpu-ftr/v3-4.3-rc4
>
> This works.
> Now the question is, Are your patches getting merged anytime soon?.

Well, we have been waiting for a use case, like this, before we merge
the series.

Will, Catalin,

Now that we have some real users of the infrastructure, what do you think ?
I can post an updated/rebased series, if you would like.


Suzuki



> If not, I prefer to go with /proc/cpuinfo.
>
> Another solution is look for /sys/devices/system/cpu/cpu$ID/identification/midr
> if not available then fall back on /proc/cpuinfo.
>
> Regards
> Vijay
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo
  2016-05-09 10:59                       ` Suzuki K Poulose
@ 2016-05-09 11:21                         ` Peter Maydell
  2016-05-09 13:44                           ` Catalin Marinas
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Maydell @ 2016-05-09 11:21 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: Vijay Kilari, Catalin Marinas, Will Deacon, Vijaya Kumar K,
	qemu-arm, Paolo Bonzini, QEMU Developers, Prasun Kapoor,
	suresh knv, Vijaya Kumar K, Suresh

On 9 May 2016 at 11:59, Suzuki K Poulose <Suzuki.Poulose@arm.com> wrote:
> Well, we have been waiting for a use case, like this, before we merge
> the series.

This isn't a great strategy for moving people away from things
you'd like them to avoid like parsing /proc/cpuinfo, because typically
userspace app writers are not very interested in coding to facilities
which don't exist yet, and will prefer to make do with what's actually
present in the kernel today... You need to provide the improved API,
and then it needs to get out into kernel versions in distros and
otherwise, and only then are you likely to get app developers who
will start to say "this is useful".

thanks
-- PMM

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo
  2016-05-09 11:21                         ` Peter Maydell
@ 2016-05-09 13:44                           ` Catalin Marinas
  2016-05-10 10:24                             ` Will Deacon
  0 siblings, 1 reply; 23+ messages in thread
From: Catalin Marinas @ 2016-05-09 13:44 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Suzuki K Poulose, Vijay Kilari, Will Deacon, Vijaya Kumar K,
	qemu-arm, Paolo Bonzini, QEMU Developers, Prasun Kapoor,
	suresh knv, Vijaya Kumar K, Suresh

On Mon, May 09, 2016 at 12:21:08PM +0100, Peter Maydell wrote:
> On 9 May 2016 at 11:59, Suzuki K Poulose <Suzuki.Poulose@arm.com> wrote:
> > Well, we have been waiting for a use case, like this, before we merge
> > the series.
> 
> This isn't a great strategy for moving people away from things
> you'd like them to avoid like parsing /proc/cpuinfo, because typically
> userspace app writers are not very interested in coding to facilities
> which don't exist yet, and will prefer to make do with what's actually
> present in the kernel today... You need to provide the improved API,
> and then it needs to get out into kernel versions in distros and
> otherwise, and only then are you likely to get app developers who
> will start to say "this is useful".

The problem is that the way kernel people think the API may be improved
does not always match the use-cases required by app writers. One example
here is exposing MIDR via MRS emulation, we know there are problems with
big.LITTLE and the only clear answer I got so far is that we ignore such
configurations. We don't even have a way to tell user space that this is
a heterogeneous CPU configuration, unless we add another HWCAP bit
specifically for this (or the opposite: HWCAP_HOMOGENEOUS_CPUS).

That said, I'm perfectly fine with exposing:

	/sys/devices/system/cpu/cpu$ID/identification/
							\- midr
							\- revidr

I had the wrong impression that we already merged this part but Suzuki
just pointed out to me that it's not.

I think our 4.7-rc1 tree is pretty much frozen to new features now,
though the sysfs patch is relatively small (I'll let Will comment):

https://patches.linaro.org/patch/54502/

The MRS emulation, we should restart the discussion around big.LITTLE
implications and make a decision one way or another by the 4.8 merging
window.

-- 
Catalin

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo
  2016-05-09 13:44                           ` Catalin Marinas
@ 2016-05-10 10:24                             ` Will Deacon
  2016-05-10 13:06                               ` Catalin Marinas
  0 siblings, 1 reply; 23+ messages in thread
From: Will Deacon @ 2016-05-10 10:24 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Peter Maydell, Suzuki K Poulose, Vijay Kilari, Vijaya Kumar K,
	qemu-arm, Paolo Bonzini, QEMU Developers, Prasun Kapoor,
	suresh knv, Vijaya Kumar K, Suresh

On Mon, May 09, 2016 at 01:44:11PM +0000, Catalin Marinas wrote:
> On Mon, May 09, 2016 at 12:21:08PM +0100, Peter Maydell wrote:
> > On 9 May 2016 at 11:59, Suzuki K Poulose <Suzuki.Poulose@arm.com> wrote:
> > > Well, we have been waiting for a use case, like this, before we merge
> > > the series.
> > 
> > This isn't a great strategy for moving people away from things
> > you'd like them to avoid like parsing /proc/cpuinfo, because typically
> > userspace app writers are not very interested in coding to facilities
> > which don't exist yet, and will prefer to make do with what's actually
> > present in the kernel today... You need to provide the improved API,
> > and then it needs to get out into kernel versions in distros and
> > otherwise, and only then are you likely to get app developers who
> > will start to say "this is useful".
> 
> The problem is that the way kernel people think the API may be improved
> does not always match the use-cases required by app writers. One example
> here is exposing MIDR via MRS emulation, we know there are problems with
> big.LITTLE and the only clear answer I got so far is that we ignore such
> configurations. We don't even have a way to tell user space that this is
> a heterogeneous CPU configuration, unless we add another HWCAP bit
> specifically for this (or the opposite: HWCAP_HOMOGENEOUS_CPUS).

Personally, I think we should expose big.LITTLE as-is to userspace. That
is, if you execute an mrs instruction you'll get whichever core the
emulation happens to run on. This might even be useful to things like
pinned threadpools w/ userspace schedulers sitting on top.

> That said, I'm perfectly fine with exposing:
> 
> 	/sys/devices/system/cpu/cpu$ID/identification/
> 							\- midr
> 							\- revidr
> 
> I had the wrong impression that we already merged this part but Suzuki
> just pointed out to me that it's not.

Yes, there are use-cases for this interface as well. I don't think it's
a choice between one or the other and I firmly believe we need both (the
sysfs and mrs code).

> I think our 4.7-rc1 tree is pretty much frozen to new features now,
> though the sysfs patch is relatively small (I'll let Will comment):

The merge window opens in less than a week, so it's fixes only atm.

Will

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo
  2016-05-10 10:24                             ` Will Deacon
@ 2016-05-10 13:06                               ` Catalin Marinas
  0 siblings, 0 replies; 23+ messages in thread
From: Catalin Marinas @ 2016-05-10 13:06 UTC (permalink / raw)
  To: Will Deacon
  Cc: Peter Maydell, Suzuki K Poulose, Vijay Kilari, Vijaya Kumar K,
	qemu-arm, Paolo Bonzini, QEMU Developers, Prasun Kapoor,
	suresh knv, Vijaya Kumar K, Suresh, Ramana Radhakrishnan

Cc'ing Ramana to get some input from the toolchain side.

On Tue, May 10, 2016 at 11:24:04AM +0100, Will Deacon wrote:
> On Mon, May 09, 2016 at 01:44:11PM +0000, Catalin Marinas wrote:
> > On Mon, May 09, 2016 at 12:21:08PM +0100, Peter Maydell wrote:
> > > On 9 May 2016 at 11:59, Suzuki K Poulose <Suzuki.Poulose@arm.com> wrote:
> > > > Well, we have been waiting for a use case, like this, before we merge
> > > > the series.
> > > 
> > > This isn't a great strategy for moving people away from things
> > > you'd like them to avoid like parsing /proc/cpuinfo, because typically
> > > userspace app writers are not very interested in coding to facilities
> > > which don't exist yet, and will prefer to make do with what's actually
> > > present in the kernel today... You need to provide the improved API,
> > > and then it needs to get out into kernel versions in distros and
> > > otherwise, and only then are you likely to get app developers who
> > > will start to say "this is useful".
> > 
> > The problem is that the way kernel people think the API may be improved
> > does not always match the use-cases required by app writers. One example
> > here is exposing MIDR via MRS emulation, we know there are problems with
> > big.LITTLE and the only clear answer I got so far is that we ignore such
> > configurations. We don't even have a way to tell user space that this is
> > a heterogeneous CPU configuration, unless we add another HWCAP bit
> > specifically for this (or the opposite: HWCAP_HOMOGENEOUS_CPUS).
> 
> Personally, I think we should expose big.LITTLE as-is to userspace. That
> is, if you execute an mrs instruction you'll get whichever core the
> emulation happens to run on. This might even be useful to things like
> pinned threadpools w/ userspace schedulers sitting on top.

That's the point I try to make. We "think" there may be use-cases but
there are no concrete examples yet. IIRC, the only request for mrs
handling came from the tools guys for the ifunc support. However, they
don't seem to have a solution for big.LITTLE either and they may simply
ignore this feature. OTOH, we have to maintain it in the kernel on the
long run because it became ABI (that said, I would be fine if this was
complemented by another HWCAP bit for heterogeneous systems).

The CPU feature registers wouldn't be affected by the big.LITTLE
configurations as we provide a sanitised version anyway. But, again, do
we have concrete use-cases?

> > That said, I'm perfectly fine with exposing:
> > 
> > 	/sys/devices/system/cpu/cpu$ID/identification/
> > 							\- midr
> > 							\- revidr
> > 
> > I had the wrong impression that we already merged this part but Suzuki
> > just pointed out to me that it's not.
> 
> Yes, there are use-cases for this interface as well. I don't think it's
> a choice between one or the other and I firmly believe we need both (the
> sysfs and mrs code).

At least for this one we have a clear use-case: JVM and errata
workarounds.

> > I think our 4.7-rc1 tree is pretty much frozen to new features now,
> > though the sysfs patch is relatively small (I'll let Will comment):
> 
> The merge window opens in less than a week, so it's fixes only atm.

We have more time to debate ;)

-- 
Catalin

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2016-05-10 13:06 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-07  9:58 [Qemu-devel] [RFC PATCH v2 0/3] ARM64: Live migration optimization vijayak
2016-04-07  9:58 ` [Qemu-devel] [RFC PATCH v2 1/3] target-arm: Use Neon for zero checking vijayak
2016-04-07 10:30   ` Paolo Bonzini
2016-04-07 10:44     ` Peter Maydell
2016-04-07 10:44   ` Peter Maydell
2016-04-09 22:45   ` Richard Henderson
2016-04-11 10:40     ` Peter Maydell
2016-04-07  9:58 ` [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo vijayak
2016-04-07 10:11   ` Peter Maydell
2016-04-07 10:56     ` Vijay Kilari
2016-04-07 11:45       ` Peter Maydell
2016-04-08  6:21         ` Vijay Kilari
2016-04-08  9:43           ` Peter Maydell
2016-04-11  6:52             ` Vijay Kilari
2016-04-11  9:37               ` Suzuki K Poulose
2016-04-13  9:54                 ` Vijay Kilari
2016-04-13  9:59                   ` Suzuki K Poulose
2016-05-09  3:30                     ` Vijay Kilari
2016-05-09 10:59                       ` Suzuki K Poulose
2016-05-09 11:21                         ` Peter Maydell
2016-05-09 13:44                           ` Catalin Marinas
2016-05-10 10:24                             ` Will Deacon
2016-05-10 13:06                               ` Catalin Marinas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).