* [Qemu-devel] [RFC PATCH v2 0/3] ARM64: Live migration optimization @ 2016-04-07 9:58 vijayak 2016-04-07 9:58 ` [Qemu-devel] [RFC PATCH v2 1/3] target-arm: Use Neon for zero checking vijayak 2016-04-07 9:58 ` [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo vijayak 0 siblings, 2 replies; 23+ messages in thread From: vijayak @ 2016-04-07 9:58 UTC (permalink / raw) To: qemu-arm, peter.maydell, pbonzini Cc: Prasun.Kapoor, knv.suresh2009, Vijaya Kumar K, qemu-devel, vijay.kilari From: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com> To optimize Live migration time on ARM64 machine following changes are made. - Neon instructions are used for Zero page checking. - Added prefetch for Thunderx platform With these changes, total migration time comes down from 10 seconds to 2.5 seconds. These patches are tested on top of (GICv3 live migration support) https://lists.gnu.org/archive/html/qemu-devel/2015-10/msg05284.html However there is no direct dependency on these patches. v1 -> v2 changes: ---------------- - Dropped 'target-arm: Update page size for aarch64' patch. - Each loop in zero buffer check function is reduced to 16 from 32. - Replaced vorrq_u64 with '|' in Neon macros - Renamed local variable to reflect 128 bit. - Introduced new file cpuinfo.c to parse /proc/cpuinfo - Added Thunderx specific patches to add prefetch in zero buffer check function. Vijay (1): target-arm: Use Neon for zero checking Vijaya Kumar K (2): utils: Add cpuinfo helper to fetch /proc/cpuinfo utils: Add prefetch for Thunderx platform include/qemu-common.h | 12 ++++++ util/Makefile.objs | 1 + util/cpuinfo.c | 115 +++++++++++++++++++++++++++++++++++++++++++++++++ util/cutils.c | 87 +++++++++++++++++++++++++++++++++++++ 4 files changed, 215 insertions(+) create mode 100644 util/cpuinfo.c -- 1.7.9.5 ^ permalink raw reply [flat|nested] 23+ messages in thread
* [Qemu-devel] [RFC PATCH v2 1/3] target-arm: Use Neon for zero checking 2016-04-07 9:58 [Qemu-devel] [RFC PATCH v2 0/3] ARM64: Live migration optimization vijayak @ 2016-04-07 9:58 ` vijayak 2016-04-07 10:30 ` Paolo Bonzini ` (2 more replies) 2016-04-07 9:58 ` [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo vijayak 1 sibling, 3 replies; 23+ messages in thread From: vijayak @ 2016-04-07 9:58 UTC (permalink / raw) To: qemu-arm, peter.maydell, pbonzini Cc: vijay.kilari, Prasun.Kapoor, knv.suresh2009, qemu-devel, Vijaya Kumar K, Suresh, Vijay From: Vijay <vijayak@cavium.com> Use Neon instructions to perform zero checking of buffer. This is helps in reducing downtime during live migration. Signed-off-by: Vijaya Kumar K <vijayak@caviumnetworks.com> Signed-off-by: Suresh <ksuresh@caviumnetworks.com> --- util/cutils.c | 74 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 74 insertions(+) diff --git a/util/cutils.c b/util/cutils.c index 43d1afb..bb61c91 100644 --- a/util/cutils.c +++ b/util/cutils.c @@ -352,6 +352,80 @@ static void *can_use_buffer_find_nonzero_offset_ifunc(void) return func; } #pragma GCC pop_options + +#elif defined __aarch64__ +#include "arm_neon.h" + +#define NEON_VECTYPE uint64x2_t +#define NEON_LOAD_N_ORR(v1, v2) (vld1q_u64(&v1) | vld1q_u64(&v2)) +#define NEON_ORR(v1, v2) ((v1) | (v2)) +#define NEON_NOT_EQ_ZERO(v1) \ + ((vgetq_lane_u64(v1, 0) != 0) || (vgetq_lane_u64(v1, 1) != 0)) + +#define BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR_NEON 16 + +/* + * Zero page/buffer checking using SIMD(Neon) + */ + +static bool +can_use_buffer_find_nonzero_offset_neon(const void *buf, size_t len) +{ + return (len % (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR_NEON + * sizeof(NEON_VECTYPE)) == 0 + && ((uintptr_t) buf) % sizeof(NEON_VECTYPE) == 0); +} + +static size_t buffer_find_nonzero_offset_neon(const void *buf, size_t len) +{ + size_t i; + NEON_VECTYPE qword0, qword1, qword2, qword3, qword4, qword5, qword6; + uint64_t const *data = buf; + + if (!len) { + return 0; + } + + assert(can_use_buffer_find_nonzero_offset_neon(buf, len)); + len /= sizeof(unsigned long); + + for (i = 0; i < len; i += BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR_NEON) { + qword0 = NEON_LOAD_N_ORR(data[i], data[i + 2]); + qword1 = NEON_LOAD_N_ORR(data[i + 4], data[i + 6]); + qword2 = NEON_LOAD_N_ORR(data[i + 8], data[i + 10]); + qword3 = NEON_LOAD_N_ORR(data[i + 12], data[i + 14]); + qword4 = NEON_ORR(qword0, qword1); + qword5 = NEON_ORR(qword2, qword3); + qword6 = NEON_ORR(qword4, qword5); + + if (NEON_NOT_EQ_ZERO(qword6)) { + break; + } + } + + return i * sizeof(unsigned long); +} + +static inline bool neon_support(void) +{ + /* + * Check if neon feature is supported. + * By default neon is supported for aarch64. + */ + return true; +} + +bool can_use_buffer_find_nonzero_offset(const void *buf, size_t len) +{ + return neon_support() ? can_use_buffer_find_nonzero_offset_neon(buf, len) : + can_use_buffer_find_nonzero_offset_inner(buf, len); +} + +size_t buffer_find_nonzero_offset(const void *buf, size_t len) +{ + return neon_support() ? buffer_find_nonzero_offset_neon(buf, len) : + buffer_find_nonzero_offset_inner(buf, len); +} #else bool can_use_buffer_find_nonzero_offset(const void *buf, size_t len) { -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 1/3] target-arm: Use Neon for zero checking 2016-04-07 9:58 ` [Qemu-devel] [RFC PATCH v2 1/3] target-arm: Use Neon for zero checking vijayak @ 2016-04-07 10:30 ` Paolo Bonzini 2016-04-07 10:44 ` Peter Maydell 2016-04-07 10:44 ` Peter Maydell 2016-04-09 22:45 ` Richard Henderson 2 siblings, 1 reply; 23+ messages in thread From: Paolo Bonzini @ 2016-04-07 10:30 UTC (permalink / raw) To: vijayak Cc: peter maydell, vijay kilari, Prasun Kapoor, knv suresh2009, qemu-devel, qemu-arm, Suresh, Vijay > +#elif defined __aarch64__ > +#include "arm_neon.h" > + > +#define NEON_VECTYPE uint64x2_t > +#define NEON_LOAD_N_ORR(v1, v2) (vld1q_u64(&v1) | vld1q_u64(&v2)) Why is the load and orr necessary? Is ((v1) | (v2)) enough? > +#define NEON_ORR(v1, v2) ((v1) | (v2)) > +#define NEON_NOT_EQ_ZERO(v1) \ > + ((vgetq_lane_u64(v1, 0) != 0) || (vgetq_lane_u64(v1, 1) != 0)) > + > +#define BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR_NEON 16 Unless you have numbers saying that a 16-unroll is better than an 8-unroll (and then you should put those in the commit message), you do not need to duplicate code, just add aarch64 definitions for the existing code. --- I've now read the rest of the patches, and you're adding prefetch code that is ARM-specific. Please provide numbers separately for each patch, not just in the cover letter. The cover letter is lost when the patch is committed, while the commit messages remain. On top of this, "With these changes, total migration time comes down from 10 seconds to 2.5 seconds" is not a reproducible experiment. What was the RAM size? Was the guest just booted and idle, or was there a workload? What was the host? Thanks, Paolo > +/* > + * Zero page/buffer checking using SIMD(Neon) > + */ > + > +static bool > +can_use_buffer_find_nonzero_offset_neon(const void *buf, size_t len) > +{ > + return (len % (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR_NEON > + * sizeof(NEON_VECTYPE)) == 0 > + && ((uintptr_t) buf) % sizeof(NEON_VECTYPE) == 0); > +} > + > +static size_t buffer_find_nonzero_offset_neon(const void *buf, size_t len) > +{ > + size_t i; > + NEON_VECTYPE qword0, qword1, qword2, qword3, qword4, qword5, qword6; > + uint64_t const *data = buf; > + > + if (!len) { > + return 0; > + } > + > + assert(can_use_buffer_find_nonzero_offset_neon(buf, len)); > + len /= sizeof(unsigned long); > + > + for (i = 0; i < len; i += BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR_NEON) > { > + qword0 = NEON_LOAD_N_ORR(data[i], data[i + 2]); > + qword1 = NEON_LOAD_N_ORR(data[i + 4], data[i + 6]); > + qword2 = NEON_LOAD_N_ORR(data[i + 8], data[i + 10]); > + qword3 = NEON_LOAD_N_ORR(data[i + 12], data[i + 14]); > + qword4 = NEON_ORR(qword0, qword1); > + qword5 = NEON_ORR(qword2, qword3); > + qword6 = NEON_ORR(qword4, qword5); > + > + if (NEON_NOT_EQ_ZERO(qword6)) { > + break; > + } > + } > + > + return i * sizeof(unsigned long); > +} > + > +static inline bool neon_support(void) > +{ > + /* > + * Check if neon feature is supported. > + * By default neon is supported for aarch64. > + */ > + return true; Then everything below this function is not necessary. Paolo > +} > + > +bool can_use_buffer_find_nonzero_offset(const void *buf, size_t len) > +{ > + return neon_support() ? can_use_buffer_find_nonzero_offset_neon(buf, > len) : > + can_use_buffer_find_nonzero_offset_inner(buf, len); > +} > + > +size_t buffer_find_nonzero_offset(const void *buf, size_t len) > +{ > + return neon_support() ? buffer_find_nonzero_offset_neon(buf, len) : > + buffer_find_nonzero_offset_inner(buf, len); > +} > #else > bool can_use_buffer_find_nonzero_offset(const void *buf, size_t len) > { > -- > 1.7.9.5 > > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 1/3] target-arm: Use Neon for zero checking 2016-04-07 10:30 ` Paolo Bonzini @ 2016-04-07 10:44 ` Peter Maydell 0 siblings, 0 replies; 23+ messages in thread From: Peter Maydell @ 2016-04-07 10:44 UTC (permalink / raw) To: Paolo Bonzini Cc: vijay kilari, Vijaya Kumar K, knv suresh2009, QEMU Developers, Prasun Kapoor, qemu-arm, Suresh, Vijay On 7 April 2016 at 11:30, Paolo Bonzini <pbonzini@redhat.com> wrote: > >> +#elif defined __aarch64__ >> +#include "arm_neon.h" >> + >> +#define NEON_VECTYPE uint64x2_t >> +#define NEON_LOAD_N_ORR(v1, v2) (vld1q_u64(&v1) | vld1q_u64(&v2)) > > Why is the load and orr necessary? Is ((v1) | (v2)) enough? > >> +#define NEON_ORR(v1, v2) ((v1) | (v2)) >> +#define NEON_NOT_EQ_ZERO(v1) \ >> + ((vgetq_lane_u64(v1, 0) != 0) || (vgetq_lane_u64(v1, 1) != 0)) >> + >> +#define BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR_NEON 16 > > Unless you have numbers saying that a 16-unroll is better than an 8-unroll > (and then you should put those in the commit message), you do not need to > duplicate code, just add aarch64 definitions for the existing code. This pure-neon code is also not doing the initial short-loop to test for non-zero buffers, which means it's not an apples-to-apples comparison. It seems unlikely that workload balances are going to be different on ARM vs x86 such that it's worth doing the small loop on one but not the other. (This is also why it's helpful to explain your benchmarking method -- the short loop will slow things down for some cases like "large and untouched RAM", but be faster again for cases like "large RAM of which most pages have been dirtied".) thanks -- PMM ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 1/3] target-arm: Use Neon for zero checking 2016-04-07 9:58 ` [Qemu-devel] [RFC PATCH v2 1/3] target-arm: Use Neon for zero checking vijayak 2016-04-07 10:30 ` Paolo Bonzini @ 2016-04-07 10:44 ` Peter Maydell 2016-04-09 22:45 ` Richard Henderson 2 siblings, 0 replies; 23+ messages in thread From: Peter Maydell @ 2016-04-07 10:44 UTC (permalink / raw) To: Vijaya Kumar K Cc: Vijay Kilari, Prasun Kapoor, knv.suresh2009, QEMU Developers, qemu-arm, Vijay, Suresh, Paolo Bonzini On 7 April 2016 at 10:58, <vijayak@caviumnetworks.com> wrote: > From: Vijay <vijayak@cavium.com> > > Use Neon instructions to perform zero checking of > buffer. This is helps in reducing downtime during > live migration. > > Signed-off-by: Vijaya Kumar K <vijayak@caviumnetworks.com> > Signed-off-by: Suresh <ksuresh@caviumnetworks.com> > --- > util/cutils.c | 74 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 74 insertions(+) > > diff --git a/util/cutils.c b/util/cutils.c > index 43d1afb..bb61c91 100644 > --- a/util/cutils.c > +++ b/util/cutils.c > @@ -352,6 +352,80 @@ static void *can_use_buffer_find_nonzero_offset_ifunc(void) > return func; > } > #pragma GCC pop_options > + > +#elif defined __aarch64__ > +#include "arm_neon.h" > + > +#define NEON_VECTYPE uint64x2_t > +#define NEON_LOAD_N_ORR(v1, v2) (vld1q_u64(&v1) | vld1q_u64(&v2)) > +#define NEON_ORR(v1, v2) ((v1) | (v2)) > +#define NEON_NOT_EQ_ZERO(v1) \ > + ((vgetq_lane_u64(v1, 0) != 0) || (vgetq_lane_u64(v1, 1) != 0)) > + > +#define BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR_NEON 16 This says 16 lots of loads of uint64x2_t... > + for (i = 0; i < len; i += BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR_NEON) { > + qword0 = NEON_LOAD_N_ORR(data[i], data[i + 2]); > + qword1 = NEON_LOAD_N_ORR(data[i + 4], data[i + 6]); > + qword2 = NEON_LOAD_N_ORR(data[i + 8], data[i + 10]); > + qword3 = NEON_LOAD_N_ORR(data[i + 12], data[i + 14]); > + qword4 = NEON_ORR(qword0, qword1); > + qword5 = NEON_ORR(qword2, qword3); > + qword6 = NEON_ORR(qword4, qword5); ...but the loop is only loading 8 lots of uint64x2_t. thanks -- PMM ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 1/3] target-arm: Use Neon for zero checking 2016-04-07 9:58 ` [Qemu-devel] [RFC PATCH v2 1/3] target-arm: Use Neon for zero checking vijayak 2016-04-07 10:30 ` Paolo Bonzini 2016-04-07 10:44 ` Peter Maydell @ 2016-04-09 22:45 ` Richard Henderson 2016-04-11 10:40 ` Peter Maydell 2 siblings, 1 reply; 23+ messages in thread From: Richard Henderson @ 2016-04-09 22:45 UTC (permalink / raw) To: vijayak, qemu-arm, peter.maydell, pbonzini Cc: vijay.kilari, Prasun.Kapoor, knv.suresh2009, qemu-devel, Suresh, Vijay On 04/07/2016 02:58 AM, vijayak@caviumnetworks.com wrote: > +#elif defined __aarch64__ > +#include "arm_neon.h" A better test is __NEON__, which asserts that neon is available at compile time (which will be true basically always for aarch64), and then you don't need a runime test for neon. You also get support for armv7 with neon. > +#define NEON_VECTYPE uint64x2_t > +#define NEON_LOAD_N_ORR(v1, v2) (vld1q_u64(&v1) | vld1q_u64(&v2)) > +#define NEON_ORR(v1, v2) ((v1) | (v2)) > +#define NEON_NOT_EQ_ZERO(v1) \ > + ((vgetq_lane_u64(v1, 0) != 0) || (vgetq_lane_u64(v1, 1) != 0)) FWIW, I think that vmaxvq_u32 would be a better reduction for aarch64. Extracting the individual lanes isn't as efficient as one would like. For armv7, folding via vget_lane_u64(vget_high_u64(v1) | vget_low_u64(v1), 0) is probably best. r~ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 1/3] target-arm: Use Neon for zero checking 2016-04-09 22:45 ` Richard Henderson @ 2016-04-11 10:40 ` Peter Maydell 0 siblings, 0 replies; 23+ messages in thread From: Peter Maydell @ 2016-04-11 10:40 UTC (permalink / raw) To: Richard Henderson Cc: Vijaya Kumar K, qemu-arm, Paolo Bonzini, Vijay Kilari, Prasun Kapoor, suresh knv, QEMU Developers, Suresh, Vijay On 9 April 2016 at 23:45, Richard Henderson <rth@twiddle.net> wrote: > On 04/07/2016 02:58 AM, vijayak@caviumnetworks.com wrote: >> >> +#elif defined __aarch64__ >> +#include "arm_neon.h" > > > A better test is __NEON__, which asserts that neon is available at compile > time (which will be true basically always for aarch64), and then you don't > need a runime test for neon. You don't need a runtime test for neon on aarch64 anyway, because it will always be present. > You also get support for armv7 with neon. But if you do care about armv7 then you do need a runtime test, because the defacto standard compile options are for armhf which has FP but doesn't assume Neon. Personally I think we should not worry about armv7 here, because it's not actually a likely virtualization server platform, and we shouldn't include code in QEMU we're not even compile testing. So I think __aarch64__ here is fine. thanks -- PMM ^ permalink raw reply [flat|nested] 23+ messages in thread
* [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo 2016-04-07 9:58 [Qemu-devel] [RFC PATCH v2 0/3] ARM64: Live migration optimization vijayak 2016-04-07 9:58 ` [Qemu-devel] [RFC PATCH v2 1/3] target-arm: Use Neon for zero checking vijayak @ 2016-04-07 9:58 ` vijayak 2016-04-07 10:11 ` Peter Maydell 1 sibling, 1 reply; 23+ messages in thread From: vijayak @ 2016-04-07 9:58 UTC (permalink / raw) To: qemu-arm, peter.maydell, pbonzini Cc: vijay.kilari, Prasun.Kapoor, knv.suresh2009, Vijaya Kumar K, qemu-devel, Vijaya Kumar K, Suresh From: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com> utils cannot read target cpu information to fetch cpu information to implement cpu specific features or erratas. For this parse /proc/cpuinfo and fetch cpu information. For now this helper only fetches cpu information for arm architectures. Signed-off-by: Vijaya Kumar K <vijayak@caviumnetworks.com> Signed-off-by: Suresh <ksuresh@caviumnetworks.com> --- include/qemu-common.h | 11 ++++++ util/Makefile.objs | 1 + util/cpuinfo.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 106 insertions(+) diff --git a/include/qemu-common.h b/include/qemu-common.h index 163bcbb..364aa0a 100644 --- a/include/qemu-common.h +++ b/include/qemu-common.h @@ -170,4 +170,15 @@ void page_size_init(void); * returned. */ bool dump_in_progress(void); +/* + * cpu info structure read from /proc/cpuinfo + */ + +struct cpu_info { + uint32_t imp; + uint32_t arch; + uint32_t part; +}; + +void qemu_read_cpu_info(struct cpu_info *cinf); #endif diff --git a/util/Makefile.objs b/util/Makefile.objs index a8a777e..59cde64 100644 --- a/util/Makefile.objs +++ b/util/Makefile.objs @@ -32,3 +32,4 @@ util-obj-y += buffer.o util-obj-y += timed-average.o util-obj-y += base64.o util-obj-y += log.o +util-obj-y += cpuinfo.o diff --git a/util/cpuinfo.c b/util/cpuinfo.c new file mode 100644 index 0000000..e049672 --- /dev/null +++ b/util/cpuinfo.c @@ -0,0 +1,94 @@ +/* + * Dealing with /proc/cpuinfo + * + * Copyright (C) 2016 Cavium, Inc. + * + * Authors: + * Vijaya Kumar K <vijayak@caviumnetworks.com> + * + * This work is licensed under the terms of the GNU LGPL, version 2.1 + * or later. See the COPYING.LIB file in the top-level directory. + */ + +#include "qemu/osdep.h" +#include "qemu-common.h" +#include <string.h> + +#if defined(__arm__) || defined(__aarch64__) +static uint32_t read_arm_cpu_implementer(char *str) +{ + char *match; + uint32_t imp = 0; + + match = strstr(str, "CPU implementer"); + if (match != NULL) { + sscanf(match, "CPU implementer : 0x%x", &imp); + } + + return imp; +} + +static uint32_t read_arm_cpu_architecture(char *str) +{ + char *match; + uint32_t arch = 0; + + match = strstr(str, "CPU architecture"); + if (match != NULL) { + sscanf(match, "CPU architecture: %d", &arch); + } + + return arch; +} + +static uint32_t read_arm_cpu_part(char *str) +{ + char *match; + uint32_t part = 0; + + match = strstr(str, "CPU part"); + if (match != NULL) { + sscanf(match, "CPU part : 0x%x", &part); + } + + return part; +} +#endif + +void qemu_read_cpu_info(struct cpu_info *cinf) +{ + FILE *fp; + char *buf; +#define BUF_SIZE 1024 + size_t bytes_read; + + cinf->imp = cinf->arch = cinf->part = 0; + fp = fopen("/proc/cpuinfo", "r"); + if (!fp) { + return; + } + + buf = g_malloc0(BUF_SIZE); + if (!buf) { + fclose(fp); + return; + } + + /* Read the contents of /proc/cpuinfo into the buffer. */ + bytes_read = fread(buf, 1, BUF_SIZE, fp); + fclose(fp); + + if (bytes_read == 0) { + g_free(buf); + return; + } + + buf[bytes_read] = '\0'; + +#if defined(__arm__) || defined(__aarch64__) + cinf->imp = read_arm_cpu_implementer(buf); + cinf->arch = read_arm_cpu_architecture(buf); + cinf->part = read_arm_cpu_part(buf); +#endif + g_free(buf); +} -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo 2016-04-07 9:58 ` [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo vijayak @ 2016-04-07 10:11 ` Peter Maydell 2016-04-07 10:56 ` Vijay Kilari 0 siblings, 1 reply; 23+ messages in thread From: Peter Maydell @ 2016-04-07 10:11 UTC (permalink / raw) To: Vijaya Kumar K Cc: Vijay Kilari, Prasun Kapoor, knv.suresh2009, Vijaya Kumar K, QEMU Developers, qemu-arm, Suresh, Paolo Bonzini On 7 April 2016 at 10:58, <vijayak@caviumnetworks.com> wrote: > From: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com> > > utils cannot read target cpu information to > fetch cpu information to implement cpu specific > features or erratas. For this parse /proc/cpuinfo > and fetch cpu information. > > For now this helper only fetches cpu information > for arm architectures. As I understand it /proc/cpuinfo is intended only for humans to read. Please don't write code to parse it; find a different way to get this information instead if you really need it. (I'm not really happy about such specific-to-a-particular-vendor patches in QEMU anyway; we should have migration code that works acceptably for any implementation.) thanks -- PMM ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo 2016-04-07 10:11 ` Peter Maydell @ 2016-04-07 10:56 ` Vijay Kilari 2016-04-07 11:45 ` Peter Maydell 0 siblings, 1 reply; 23+ messages in thread From: Vijay Kilari @ 2016-04-07 10:56 UTC (permalink / raw) To: Peter Maydell Cc: Vijaya Kumar K, suresh knv, Vijaya Kumar K, QEMU Developers, Prasun Kapoor, qemu-arm, Suresh, Paolo Bonzini On Thu, Apr 7, 2016 at 3:41 PM, Peter Maydell <peter.maydell@linaro.org> wrote: > On 7 April 2016 at 10:58, <vijayak@caviumnetworks.com> wrote: >> From: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com> >> >> utils cannot read target cpu information to >> fetch cpu information to implement cpu specific >> features or erratas. For this parse /proc/cpuinfo >> and fetch cpu information. >> >> For now this helper only fetches cpu information >> for arm architectures. > > As I understand it /proc/cpuinfo is intended only for > humans to read. Please don't write code to parse it; > find a different way to get this information instead > if you really need it. The utils code does not accept any dependency with target specific code. The libqemuutil.a is compiled and linked before target specific code is compiled. Also, utils functions neither have any cpu object to fetch cpu identification information (ex: midr in case of arm) to identify the cpu information nor utils cannot make any ioctl to read cpu information from qemu. Also unlike x86 there is no cpuid.h where we can get cpu identification information for arm64. So, I think userspace process can rely on /proc/cpuinfo for fetching cpu information. > > (I'm not really happy about such specific-to-a-particular-vendor > patches in QEMU anyway; we should have migration code that > works acceptably for any implementation.) > > thanks > -- PMM ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo 2016-04-07 10:56 ` Vijay Kilari @ 2016-04-07 11:45 ` Peter Maydell 2016-04-08 6:21 ` Vijay Kilari 0 siblings, 1 reply; 23+ messages in thread From: Peter Maydell @ 2016-04-07 11:45 UTC (permalink / raw) To: Vijay Kilari Cc: Vijaya Kumar K, suresh knv, Vijaya Kumar K, QEMU Developers, Prasun Kapoor, qemu-arm, Suresh, Paolo Bonzini On 7 April 2016 at 11:56, Vijay Kilari <vijay.kilari@gmail.com> wrote: > On Thu, Apr 7, 2016 at 3:41 PM, Peter Maydell <peter.maydell@linaro.org> wrote: >> On 7 April 2016 at 10:58, <vijayak@caviumnetworks.com> wrote: >>> From: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com> >>> >>> utils cannot read target cpu information to >>> fetch cpu information to implement cpu specific >>> features or erratas. For this parse /proc/cpuinfo >>> and fetch cpu information. >>> >>> For now this helper only fetches cpu information >>> for arm architectures. >> >> As I understand it /proc/cpuinfo is intended only for >> humans to read. Please don't write code to parse it; >> find a different way to get this information instead >> if you really need it. > Also unlike x86 there is no cpuid.h where we can get cpu identification > information for arm64. I'm told there are kernel patches in progress to get this sort of information in a maintainable way to userspace, which are currently somewhat stalled due to lack of anybody who wants to consume it. If you have a use case then you should probably flag it up with the kernel devs. That said, I think we should probably hold off on this discussion until we have clearer benchmarking info that demonstrates that doing these prefetches really does make a significant difference. I would much prefer to have a single aarch64 routine that works for everybody, rather than a thunderx-only special case. thanks -- PMM ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo 2016-04-07 11:45 ` Peter Maydell @ 2016-04-08 6:21 ` Vijay Kilari 2016-04-08 9:43 ` Peter Maydell 0 siblings, 1 reply; 23+ messages in thread From: Vijay Kilari @ 2016-04-08 6:21 UTC (permalink / raw) To: Peter Maydell Cc: Vijaya Kumar K, qemu-arm, Paolo Bonzini, QEMU Developers, Prasun Kapoor, suresh knv, Vijaya Kumar K, Suresh Hi Peter, On Thu, Apr 7, 2016 at 5:15 PM, Peter Maydell <peter.maydell@linaro.org> wrote: > On 7 April 2016 at 11:56, Vijay Kilari <vijay.kilari@gmail.com> wrote: >> On Thu, Apr 7, 2016 at 3:41 PM, Peter Maydell <peter.maydell@linaro.org> wrote: >>> On 7 April 2016 at 10:58, <vijayak@caviumnetworks.com> wrote: >>>> From: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com> >>>> >>>> utils cannot read target cpu information to >>>> fetch cpu information to implement cpu specific >>>> features or erratas. For this parse /proc/cpuinfo >>>> and fetch cpu information. >>>> >>>> For now this helper only fetches cpu information >>>> for arm architectures. >>> >>> As I understand it /proc/cpuinfo is intended only for >>> humans to read. Please don't write code to parse it; >>> find a different way to get this information instead >>> if you really need it. > >> Also unlike x86 there is no cpuid.h where we can get cpu identification >> information for arm64. > > I'm told there are kernel patches in progress to get this sort > of information in a maintainable way to userspace, which are > currently somewhat stalled due to lack of anybody who wants to > consume it. If you have a use case then you should probably > flag it up with the kernel devs. Can you please give references to those patches/discussion? > > That said, I think we should probably hold off on this > discussion until we have clearer benchmarking info that > demonstrates that doing these prefetches really does make > a significant difference. I would much prefer to have a Thunderx pass2 board does not have hardware prefetch. So explicit sw prefetch instructions is required for this platform. Here is the benchmarking result with and without prefetch. of an idle VM with 4 VCPUS, 8GB RAM. Without prefech, total migration time is 8.2 seconds With prefetch total migration time is 2.7 seconds. Without prefetch: ------------------------ (qemu) info migrate capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off x-postcopy-ram: off Migration status: completed total time: 8217 milliseconds downtime: 86 milliseconds setup: 4 milliseconds transferred ram: 212624 kbytes throughput: 212.08 mbps remaining ram: 0 kbytes total ram: 8520128 kbytes duplicate: 2085805 pages skipped: 0 pages normal: 48478 pages normal bytes: 193912 kbytes dirty sync count: 3 With prefetch: -------------------- (qemu) info migrate capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off x-postcopy-ram: off Migration status: completed total time: 2744 milliseconds downtime: 48 milliseconds setup: 5 milliseconds transferred ram: 213526 kbytes throughput: 637.76 mbps remaining ram: 0 kbytes total ram: 8520128 kbytes duplicate: 2085014 pages skipped: 0 pages normal: 48705 pages normal bytes: 194820 kbytes dirty sync count: 3 > single aarch64 routine that works for everybody, rather > than a thunderx-only special case. Now, I found that the generic existings function by name buffer_find_nonzero_offset_inner() can be made to work with neon. So no need of special function by name buffer_find_nonzero_offset_neon() for arm64 creating in this patch series. However, adding prefetch code needs to be added for performance reason. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo 2016-04-08 6:21 ` Vijay Kilari @ 2016-04-08 9:43 ` Peter Maydell 2016-04-11 6:52 ` Vijay Kilari 0 siblings, 1 reply; 23+ messages in thread From: Peter Maydell @ 2016-04-08 9:43 UTC (permalink / raw) To: Vijay Kilari Cc: Vijaya Kumar K, qemu-arm, Paolo Bonzini, QEMU Developers, Prasun Kapoor, suresh knv, Vijaya Kumar K, Suresh On 8 April 2016 at 07:21, Vijay Kilari <vijay.kilari@gmail.com> wrote: > On Thu, Apr 7, 2016 at 5:15 PM, Peter Maydell <peter.maydell@linaro.org> wrote: >> I'm told there are kernel patches in progress to get this sort >> of information in a maintainable way to userspace, which are >> currently somewhat stalled due to lack of anybody who wants to >> consume it. If you have a use case then you should probably >> flag it up with the kernel devs. > > Can you please give references to those patches/discussion? I'm told the most recent thread is https://lkml.org/lkml/2015/10/5/517 (and that most of the patches in that series have gone in, except for the last 4 or 5 which implement the ABI). thanks -- PMM ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo 2016-04-08 9:43 ` Peter Maydell @ 2016-04-11 6:52 ` Vijay Kilari 2016-04-11 9:37 ` Suzuki K Poulose 0 siblings, 1 reply; 23+ messages in thread From: Vijay Kilari @ 2016-04-11 6:52 UTC (permalink / raw) To: Peter Maydell, suzuki.poulose Cc: Vijaya Kumar K, qemu-arm, Paolo Bonzini, QEMU Developers, Prasun Kapoor, suresh knv, Vijaya Kumar K, Suresh Adding Suzuki Poulose. Hi Suzuki, On Fri, Apr 8, 2016 at 3:13 PM, Peter Maydell <peter.maydell@linaro.org> wrote: > On 8 April 2016 at 07:21, Vijay Kilari <vijay.kilari@gmail.com> wrote: >> On Thu, Apr 7, 2016 at 5:15 PM, Peter Maydell <peter.maydell@linaro.org> wrote: >>> I'm told there are kernel patches in progress to get this sort >>> of information in a maintainable way to userspace, which are >>> currently somewhat stalled due to lack of anybody who wants to >>> consume it. If you have a use case then you should probably >>> flag it up with the kernel devs. >> >> Can you please give references to those patches/discussion? > > I'm told the most recent thread is https://lkml.org/lkml/2015/10/5/517 > (and that most of the patches in that series have gone in, except > for the last 4 or 5 which implement the ABI). Can you please throw some light on what is the status of ABI to read cpu information in user space. I wanted to know cpu implementer, part number in QEMU utils to add prefetches to speed up live migration for Thunderx platform. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo 2016-04-11 6:52 ` Vijay Kilari @ 2016-04-11 9:37 ` Suzuki K Poulose 2016-04-13 9:54 ` Vijay Kilari 0 siblings, 1 reply; 23+ messages in thread From: Suzuki K Poulose @ 2016-04-11 9:37 UTC (permalink / raw) To: Vijay Kilari, Peter Maydell Cc: Vijaya Kumar K, qemu-arm, Paolo Bonzini, QEMU Developers, Prasun Kapoor, suresh knv, Vijaya Kumar K, Suresh, Catalin Marinas, Will Deacon On 11/04/16 07:52, Vijay Kilari wrote: > Adding Suzuki Poulose. > > Hi Suzuki, > > On Fri, Apr 8, 2016 at 3:13 PM, Peter Maydell <peter.maydell@linaro.org> wrote: >> On 8 April 2016 at 07:21, Vijay Kilari <vijay.kilari@gmail.com> wrote: >>> On Thu, Apr 7, 2016 at 5:15 PM, Peter Maydell <peter.maydell@linaro.org> wrote: >>>> I'm told there are kernel patches in progress to get this sort >>>> of information in a maintainable way to userspace, which are >>>> currently somewhat stalled due to lack of anybody who wants to >>>> consume it. If you have a use case then you should probably >>>> flag it up with the kernel devs. >>> >>> Can you please give references to those patches/discussion? >> >> I'm told the most recent thread is https://lkml.org/lkml/2015/10/5/517 >> (and that most of the patches in that series have gone in, except >> for the last 4 or 5 which implement the ABI). > > Can you please throw some light on what is the status of ABI to > read cpu information in user space. > I wanted to know cpu implementer, part number in QEMU utils > to add prefetches to speed up live migration for Thunderx platform. > As for the patch series, except for that last 5 patches (which actually implements the ABI), the infrastructure patches have been merged in v4.4. We are awaiting feedback from possible consumers like toolchain (gcc, glibc). If you think this will be suitable for you, thats good to know. There is documentation available in the last patch in the above series. Could you please try the series (on v4.4, which would be easier, by simply picking up the last 5 patches) and let us know if that works for you ? Cheers Suzuki ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo 2016-04-11 9:37 ` Suzuki K Poulose @ 2016-04-13 9:54 ` Vijay Kilari 2016-04-13 9:59 ` Suzuki K Poulose 0 siblings, 1 reply; 23+ messages in thread From: Vijay Kilari @ 2016-04-13 9:54 UTC (permalink / raw) To: Suzuki K Poulose Cc: Peter Maydell, Vijaya Kumar K, qemu-arm, Paolo Bonzini, QEMU Developers, Prasun Kapoor, suresh knv, Vijaya Kumar K, Suresh, Catalin Marinas, Will Deacon On Mon, Apr 11, 2016 at 3:07 PM, Suzuki K Poulose <Suzuki.Poulose@arm.com> wrote: > On 11/04/16 07:52, Vijay Kilari wrote: >> >> Adding Suzuki Poulose. >> >> Hi Suzuki, >> >> On Fri, Apr 8, 2016 at 3:13 PM, Peter Maydell <peter.maydell@linaro.org> >> wrote: >>> >>> On 8 April 2016 at 07:21, Vijay Kilari <vijay.kilari@gmail.com> wrote: >>>> >>>> On Thu, Apr 7, 2016 at 5:15 PM, Peter Maydell <peter.maydell@linaro.org> >>>> wrote: >>>>> >>>>> I'm told there are kernel patches in progress to get this sort >>>>> of information in a maintainable way to userspace, which are >>>>> currently somewhat stalled due to lack of anybody who wants to >>>>> consume it. If you have a use case then you should probably >>>>> flag it up with the kernel devs Hi Peter, Looks like getting Suzuki's patches merged might take some time. I propose to use /proc/cpuinfo for now and later I can move to using Suzuki's way. >>>> >>>> >>>> Can you please give references to those patches/discussion? >>> >>> >>> I'm told the most recent thread is https://lkml.org/lkml/2015/10/5/517 >>> (and that most of the patches in that series have gone in, except >>> for the last 4 or 5 which implement the ABI). >> >> >> Can you please throw some light on what is the status of ABI to >> read cpu information in user space. >> I wanted to know cpu implementer, part number in QEMU utils >> to add prefetches to speed up live migration for Thunderx platform. >> > > As for the patch series, except for that last 5 patches (which actually > implements > the ABI), the infrastructure patches have been merged in v4.4. > > We are awaiting feedback from possible consumers like toolchain (gcc, > glibc). > If you think this will be suitable for you, thats good to know. There is > documentation available in the last patch in the above series. Could you > please > try the series (on v4.4, which would be easier, by simply picking up the > last > 5 patches) and let us know if that works for you ? Hi Suzuki, The last 5 patches are not compiling on v4.4. Looks like your patch series is not merged completely. Can you please rebase your patches and let me know. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo 2016-04-13 9:54 ` Vijay Kilari @ 2016-04-13 9:59 ` Suzuki K Poulose 2016-05-09 3:30 ` Vijay Kilari 0 siblings, 1 reply; 23+ messages in thread From: Suzuki K Poulose @ 2016-04-13 9:59 UTC (permalink / raw) To: Vijay Kilari Cc: Peter Maydell, Vijaya Kumar K, qemu-arm, Paolo Bonzini, QEMU Developers, Prasun Kapoor, suresh knv, Vijaya Kumar K, Suresh, Catalin Marinas, Will Deacon On 13/04/16 10:54, Vijay Kilari wrote: > On Mon, Apr 11, 2016 at 3:07 PM, Suzuki K Poulose > <Suzuki.Poulose@arm.com> wrote: >> On 11/04/16 07:52, Vijay Kilari wrote: > > Hi Suzuki, > > The last 5 patches are not compiling on v4.4. Looks like your patch > series is not merged completely. Can you please > rebase your patches and let me know. > Could you please give the tree below a try ? git://linux-arm.org/linux-skp.git cpu-ftr/v3-4.3-rc4 Cheers Suzuki ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo 2016-04-13 9:59 ` Suzuki K Poulose @ 2016-05-09 3:30 ` Vijay Kilari 2016-05-09 10:59 ` Suzuki K Poulose 0 siblings, 1 reply; 23+ messages in thread From: Vijay Kilari @ 2016-05-09 3:30 UTC (permalink / raw) To: Suzuki K Poulose Cc: Peter Maydell, Vijaya Kumar K, qemu-arm, Paolo Bonzini, QEMU Developers, Prasun Kapoor, suresh knv, Vijaya Kumar K, Suresh, Catalin Marinas, Will Deacon Hi Suzuki/Peter, On Wed, Apr 13, 2016 at 5:59 PM, Suzuki K Poulose <Suzuki.Poulose@arm.com> wrote: > On 13/04/16 10:54, Vijay Kilari wrote: >> >> On Mon, Apr 11, 2016 at 3:07 PM, Suzuki K Poulose >> <Suzuki.Poulose@arm.com> wrote: >>> >>> On 11/04/16 07:52, Vijay Kilari wrote: > > >> >> Hi Suzuki, >> >> The last 5 patches are not compiling on v4.4. Looks like your patch >> series is not merged completely. Can you please >> rebase your patches and let me know. >> > > Could you please give the tree below a try ? > > git://linux-arm.org/linux-skp.git cpu-ftr/v3-4.3-rc4 This works. Now the question is, Are your patches getting merged anytime soon?. If not, I prefer to go with /proc/cpuinfo. Another solution is look for /sys/devices/system/cpu/cpu$ID/identification/midr if not available then fall back on /proc/cpuinfo. Regards Vijay ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo 2016-05-09 3:30 ` Vijay Kilari @ 2016-05-09 10:59 ` Suzuki K Poulose 2016-05-09 11:21 ` Peter Maydell 0 siblings, 1 reply; 23+ messages in thread From: Suzuki K Poulose @ 2016-05-09 10:59 UTC (permalink / raw) To: Vijay Kilari, Catalin Marinas, Will Deacon Cc: Peter Maydell, Vijaya Kumar K, qemu-arm, Paolo Bonzini, QEMU Developers, Prasun Kapoor, suresh knv, Vijaya Kumar K, Suresh On 09/05/16 04:30, Vijay Kilari wrote: >>> Hi Suzuki, >>> >>> The last 5 patches are not compiling on v4.4. Looks like your patch >>> series is not merged completely. Can you please >>> rebase your patches and let me know. >>> >> >> Could you please give the tree below a try ? >> >> git://linux-arm.org/linux-skp.git cpu-ftr/v3-4.3-rc4 > > This works. > Now the question is, Are your patches getting merged anytime soon?. Well, we have been waiting for a use case, like this, before we merge the series. Will, Catalin, Now that we have some real users of the infrastructure, what do you think ? I can post an updated/rebased series, if you would like. Suzuki > If not, I prefer to go with /proc/cpuinfo. > > Another solution is look for /sys/devices/system/cpu/cpu$ID/identification/midr > if not available then fall back on /proc/cpuinfo. > > Regards > Vijay > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo 2016-05-09 10:59 ` Suzuki K Poulose @ 2016-05-09 11:21 ` Peter Maydell 2016-05-09 13:44 ` Catalin Marinas 0 siblings, 1 reply; 23+ messages in thread From: Peter Maydell @ 2016-05-09 11:21 UTC (permalink / raw) To: Suzuki K Poulose Cc: Vijay Kilari, Catalin Marinas, Will Deacon, Vijaya Kumar K, qemu-arm, Paolo Bonzini, QEMU Developers, Prasun Kapoor, suresh knv, Vijaya Kumar K, Suresh On 9 May 2016 at 11:59, Suzuki K Poulose <Suzuki.Poulose@arm.com> wrote: > Well, we have been waiting for a use case, like this, before we merge > the series. This isn't a great strategy for moving people away from things you'd like them to avoid like parsing /proc/cpuinfo, because typically userspace app writers are not very interested in coding to facilities which don't exist yet, and will prefer to make do with what's actually present in the kernel today... You need to provide the improved API, and then it needs to get out into kernel versions in distros and otherwise, and only then are you likely to get app developers who will start to say "this is useful". thanks -- PMM ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo 2016-05-09 11:21 ` Peter Maydell @ 2016-05-09 13:44 ` Catalin Marinas 2016-05-10 10:24 ` Will Deacon 0 siblings, 1 reply; 23+ messages in thread From: Catalin Marinas @ 2016-05-09 13:44 UTC (permalink / raw) To: Peter Maydell Cc: Suzuki K Poulose, Vijay Kilari, Will Deacon, Vijaya Kumar K, qemu-arm, Paolo Bonzini, QEMU Developers, Prasun Kapoor, suresh knv, Vijaya Kumar K, Suresh On Mon, May 09, 2016 at 12:21:08PM +0100, Peter Maydell wrote: > On 9 May 2016 at 11:59, Suzuki K Poulose <Suzuki.Poulose@arm.com> wrote: > > Well, we have been waiting for a use case, like this, before we merge > > the series. > > This isn't a great strategy for moving people away from things > you'd like them to avoid like parsing /proc/cpuinfo, because typically > userspace app writers are not very interested in coding to facilities > which don't exist yet, and will prefer to make do with what's actually > present in the kernel today... You need to provide the improved API, > and then it needs to get out into kernel versions in distros and > otherwise, and only then are you likely to get app developers who > will start to say "this is useful". The problem is that the way kernel people think the API may be improved does not always match the use-cases required by app writers. One example here is exposing MIDR via MRS emulation, we know there are problems with big.LITTLE and the only clear answer I got so far is that we ignore such configurations. We don't even have a way to tell user space that this is a heterogeneous CPU configuration, unless we add another HWCAP bit specifically for this (or the opposite: HWCAP_HOMOGENEOUS_CPUS). That said, I'm perfectly fine with exposing: /sys/devices/system/cpu/cpu$ID/identification/ \- midr \- revidr I had the wrong impression that we already merged this part but Suzuki just pointed out to me that it's not. I think our 4.7-rc1 tree is pretty much frozen to new features now, though the sysfs patch is relatively small (I'll let Will comment): https://patches.linaro.org/patch/54502/ The MRS emulation, we should restart the discussion around big.LITTLE implications and make a decision one way or another by the 4.8 merging window. -- Catalin ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo 2016-05-09 13:44 ` Catalin Marinas @ 2016-05-10 10:24 ` Will Deacon 2016-05-10 13:06 ` Catalin Marinas 0 siblings, 1 reply; 23+ messages in thread From: Will Deacon @ 2016-05-10 10:24 UTC (permalink / raw) To: Catalin Marinas Cc: Peter Maydell, Suzuki K Poulose, Vijay Kilari, Vijaya Kumar K, qemu-arm, Paolo Bonzini, QEMU Developers, Prasun Kapoor, suresh knv, Vijaya Kumar K, Suresh On Mon, May 09, 2016 at 01:44:11PM +0000, Catalin Marinas wrote: > On Mon, May 09, 2016 at 12:21:08PM +0100, Peter Maydell wrote: > > On 9 May 2016 at 11:59, Suzuki K Poulose <Suzuki.Poulose@arm.com> wrote: > > > Well, we have been waiting for a use case, like this, before we merge > > > the series. > > > > This isn't a great strategy for moving people away from things > > you'd like them to avoid like parsing /proc/cpuinfo, because typically > > userspace app writers are not very interested in coding to facilities > > which don't exist yet, and will prefer to make do with what's actually > > present in the kernel today... You need to provide the improved API, > > and then it needs to get out into kernel versions in distros and > > otherwise, and only then are you likely to get app developers who > > will start to say "this is useful". > > The problem is that the way kernel people think the API may be improved > does not always match the use-cases required by app writers. One example > here is exposing MIDR via MRS emulation, we know there are problems with > big.LITTLE and the only clear answer I got so far is that we ignore such > configurations. We don't even have a way to tell user space that this is > a heterogeneous CPU configuration, unless we add another HWCAP bit > specifically for this (or the opposite: HWCAP_HOMOGENEOUS_CPUS). Personally, I think we should expose big.LITTLE as-is to userspace. That is, if you execute an mrs instruction you'll get whichever core the emulation happens to run on. This might even be useful to things like pinned threadpools w/ userspace schedulers sitting on top. > That said, I'm perfectly fine with exposing: > > /sys/devices/system/cpu/cpu$ID/identification/ > \- midr > \- revidr > > I had the wrong impression that we already merged this part but Suzuki > just pointed out to me that it's not. Yes, there are use-cases for this interface as well. I don't think it's a choice between one or the other and I firmly believe we need both (the sysfs and mrs code). > I think our 4.7-rc1 tree is pretty much frozen to new features now, > though the sysfs patch is relatively small (I'll let Will comment): The merge window opens in less than a week, so it's fixes only atm. Will ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo 2016-05-10 10:24 ` Will Deacon @ 2016-05-10 13:06 ` Catalin Marinas 0 siblings, 0 replies; 23+ messages in thread From: Catalin Marinas @ 2016-05-10 13:06 UTC (permalink / raw) To: Will Deacon Cc: Peter Maydell, Suzuki K Poulose, Vijay Kilari, Vijaya Kumar K, qemu-arm, Paolo Bonzini, QEMU Developers, Prasun Kapoor, suresh knv, Vijaya Kumar K, Suresh, Ramana Radhakrishnan Cc'ing Ramana to get some input from the toolchain side. On Tue, May 10, 2016 at 11:24:04AM +0100, Will Deacon wrote: > On Mon, May 09, 2016 at 01:44:11PM +0000, Catalin Marinas wrote: > > On Mon, May 09, 2016 at 12:21:08PM +0100, Peter Maydell wrote: > > > On 9 May 2016 at 11:59, Suzuki K Poulose <Suzuki.Poulose@arm.com> wrote: > > > > Well, we have been waiting for a use case, like this, before we merge > > > > the series. > > > > > > This isn't a great strategy for moving people away from things > > > you'd like them to avoid like parsing /proc/cpuinfo, because typically > > > userspace app writers are not very interested in coding to facilities > > > which don't exist yet, and will prefer to make do with what's actually > > > present in the kernel today... You need to provide the improved API, > > > and then it needs to get out into kernel versions in distros and > > > otherwise, and only then are you likely to get app developers who > > > will start to say "this is useful". > > > > The problem is that the way kernel people think the API may be improved > > does not always match the use-cases required by app writers. One example > > here is exposing MIDR via MRS emulation, we know there are problems with > > big.LITTLE and the only clear answer I got so far is that we ignore such > > configurations. We don't even have a way to tell user space that this is > > a heterogeneous CPU configuration, unless we add another HWCAP bit > > specifically for this (or the opposite: HWCAP_HOMOGENEOUS_CPUS). > > Personally, I think we should expose big.LITTLE as-is to userspace. That > is, if you execute an mrs instruction you'll get whichever core the > emulation happens to run on. This might even be useful to things like > pinned threadpools w/ userspace schedulers sitting on top. That's the point I try to make. We "think" there may be use-cases but there are no concrete examples yet. IIRC, the only request for mrs handling came from the tools guys for the ifunc support. However, they don't seem to have a solution for big.LITTLE either and they may simply ignore this feature. OTOH, we have to maintain it in the kernel on the long run because it became ABI (that said, I would be fine if this was complemented by another HWCAP bit for heterogeneous systems). The CPU feature registers wouldn't be affected by the big.LITTLE configurations as we provide a sanitised version anyway. But, again, do we have concrete use-cases? > > That said, I'm perfectly fine with exposing: > > > > /sys/devices/system/cpu/cpu$ID/identification/ > > \- midr > > \- revidr > > > > I had the wrong impression that we already merged this part but Suzuki > > just pointed out to me that it's not. > > Yes, there are use-cases for this interface as well. I don't think it's > a choice between one or the other and I firmly believe we need both (the > sysfs and mrs code). At least for this one we have a clear use-case: JVM and errata workarounds. > > I think our 4.7-rc1 tree is pretty much frozen to new features now, > > though the sysfs patch is relatively small (I'll let Will comment): > > The merge window opens in less than a week, so it's fixes only atm. We have more time to debate ;) -- Catalin ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2016-05-10 13:06 UTC | newest] Thread overview: 23+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-04-07 9:58 [Qemu-devel] [RFC PATCH v2 0/3] ARM64: Live migration optimization vijayak 2016-04-07 9:58 ` [Qemu-devel] [RFC PATCH v2 1/3] target-arm: Use Neon for zero checking vijayak 2016-04-07 10:30 ` Paolo Bonzini 2016-04-07 10:44 ` Peter Maydell 2016-04-07 10:44 ` Peter Maydell 2016-04-09 22:45 ` Richard Henderson 2016-04-11 10:40 ` Peter Maydell 2016-04-07 9:58 ` [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo vijayak 2016-04-07 10:11 ` Peter Maydell 2016-04-07 10:56 ` Vijay Kilari 2016-04-07 11:45 ` Peter Maydell 2016-04-08 6:21 ` Vijay Kilari 2016-04-08 9:43 ` Peter Maydell 2016-04-11 6:52 ` Vijay Kilari 2016-04-11 9:37 ` Suzuki K Poulose 2016-04-13 9:54 ` Vijay Kilari 2016-04-13 9:59 ` Suzuki K Poulose 2016-05-09 3:30 ` Vijay Kilari 2016-05-09 10:59 ` Suzuki K Poulose 2016-05-09 11:21 ` Peter Maydell 2016-05-09 13:44 ` Catalin Marinas 2016-05-10 10:24 ` Will Deacon 2016-05-10 13:06 ` Catalin Marinas
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).