* [PATCH 0/3] tcg misc patches
@ 2020-08-28 18:02 Richard Henderson
2020-08-28 18:02 ` [PATCH 1/3] softmmu/cpus: Only set parallel_cpus for SMP Richard Henderson
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Richard Henderson @ 2020-08-28 18:02 UTC (permalink / raw)
To: qemu-devel
A couple of changes I'd like to queue for tcg-next,
which are as yet unreviewed. The final one has been
on list before, as part of the sve2 patch set.
r~
Richard Henderson (3):
softmmu/cpus: Only set parallel_cpus for SMP
tcg: Eliminate one store for in-place 128-bit dup_mem
tcg: Implement 256-bit dup for tcg_gen_gvec_dup_mem
softmmu/cpus.c | 11 +++++++++-
tcg/tcg-op-gvec.c | 56 ++++++++++++++++++++++++++++++++++++++++++-----
2 files changed, 61 insertions(+), 6 deletions(-)
--
2.25.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/3] softmmu/cpus: Only set parallel_cpus for SMP
2020-08-28 18:02 [PATCH 0/3] tcg misc patches Richard Henderson
@ 2020-08-28 18:02 ` Richard Henderson
2020-08-31 17:17 ` Philippe Mathieu-Daudé
2020-08-28 18:02 ` [PATCH 2/3] tcg: Eliminate one store for in-place 128-bit dup_mem Richard Henderson
2020-08-28 18:02 ` [PATCH 3/3] tcg: Implement 256-bit dup for tcg_gen_gvec_dup_mem Richard Henderson
2 siblings, 1 reply; 7+ messages in thread
From: Richard Henderson @ 2020-08-28 18:02 UTC (permalink / raw)
To: qemu-devel
Do not set parallel_cpus if there is only one cpu instantiated.
This will allow tcg to use serial code to implement atomics.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
softmmu/cpus.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/softmmu/cpus.c b/softmmu/cpus.c
index a802e899ab..e3b98065c9 100644
--- a/softmmu/cpus.c
+++ b/softmmu/cpus.c
@@ -1895,6 +1895,16 @@ static void qemu_tcg_init_vcpu(CPUState *cpu)
if (!tcg_region_inited) {
tcg_region_inited = 1;
tcg_region_init();
+ /*
+ * If MTTCG, and we will create multiple cpus,
+ * then we will have cpus running in parallel.
+ */
+ if (qemu_tcg_mttcg_enabled()) {
+ MachineState *ms = MACHINE(qdev_get_machine());
+ if (ms->smp.max_cpus > 1) {
+ parallel_cpus = true;
+ }
+ }
}
if (qemu_tcg_mttcg_enabled() || !single_tcg_cpu_thread) {
@@ -1904,7 +1914,6 @@ static void qemu_tcg_init_vcpu(CPUState *cpu)
if (qemu_tcg_mttcg_enabled()) {
/* create a thread per vCPU with TCG (MTTCG) */
- parallel_cpus = true;
snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG",
cpu->cpu_index);
--
2.25.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/3] tcg: Eliminate one store for in-place 128-bit dup_mem
2020-08-28 18:02 [PATCH 0/3] tcg misc patches Richard Henderson
2020-08-28 18:02 ` [PATCH 1/3] softmmu/cpus: Only set parallel_cpus for SMP Richard Henderson
@ 2020-08-28 18:02 ` Richard Henderson
2020-08-31 17:18 ` Philippe Mathieu-Daudé
2020-08-28 18:02 ` [PATCH 3/3] tcg: Implement 256-bit dup for tcg_gen_gvec_dup_mem Richard Henderson
2 siblings, 1 reply; 7+ messages in thread
From: Richard Henderson @ 2020-08-28 18:02 UTC (permalink / raw)
To: qemu-devel
Do not store back to the exact memory from which we just loaded.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
tcg/tcg-op-gvec.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 793d4ba64c..fcc25b04e6 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -1581,7 +1581,7 @@ void tcg_gen_gvec_dup_mem(unsigned vece, uint32_t dofs, uint32_t aofs,
TCGv_vec in = tcg_temp_new_vec(TCG_TYPE_V128);
tcg_gen_ld_vec(in, cpu_env, aofs);
- for (i = 0; i < oprsz; i += 16) {
+ for (i = (aofs == dofs) * 16; i < oprsz; i += 16) {
tcg_gen_st_vec(in, cpu_env, dofs + i);
}
tcg_temp_free_vec(in);
@@ -1591,7 +1591,7 @@ void tcg_gen_gvec_dup_mem(unsigned vece, uint32_t dofs, uint32_t aofs,
tcg_gen_ld_i64(in0, cpu_env, aofs);
tcg_gen_ld_i64(in1, cpu_env, aofs + 8);
- for (i = 0; i < oprsz; i += 16) {
+ for (i = (aofs == dofs) * 16; i < oprsz; i += 16) {
tcg_gen_st_i64(in0, cpu_env, dofs + i);
tcg_gen_st_i64(in1, cpu_env, dofs + i + 8);
}
--
2.25.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 3/3] tcg: Implement 256-bit dup for tcg_gen_gvec_dup_mem
2020-08-28 18:02 [PATCH 0/3] tcg misc patches Richard Henderson
2020-08-28 18:02 ` [PATCH 1/3] softmmu/cpus: Only set parallel_cpus for SMP Richard Henderson
2020-08-28 18:02 ` [PATCH 2/3] tcg: Eliminate one store for in-place 128-bit dup_mem Richard Henderson
@ 2020-08-28 18:02 ` Richard Henderson
2020-08-31 17:23 ` Philippe Mathieu-Daudé
2 siblings, 1 reply; 7+ messages in thread
From: Richard Henderson @ 2020-08-28 18:02 UTC (permalink / raw)
To: qemu-devel
We already support duplication of 128-bit blocks. This extends
that support to 256-bit blocks. This will be needed by SVE2.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
tcg/tcg-op-gvec.c | 52 ++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 49 insertions(+), 3 deletions(-)
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index fcc25b04e6..7ebd9e8298 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -1570,12 +1570,10 @@ void tcg_gen_gvec_dup_mem(unsigned vece, uint32_t dofs, uint32_t aofs,
do_dup(vece, dofs, oprsz, maxsz, NULL, in, 0);
tcg_temp_free_i64(in);
}
- } else {
+ } else if (vece == 4) {
/* 128-bit duplicate. */
- /* ??? Dup to 256-bit vector. */
int i;
- tcg_debug_assert(vece == 4);
tcg_debug_assert(oprsz >= 16);
if (TCG_TARGET_HAS_v128) {
TCGv_vec in = tcg_temp_new_vec(TCG_TYPE_V128);
@@ -1601,6 +1599,54 @@ void tcg_gen_gvec_dup_mem(unsigned vece, uint32_t dofs, uint32_t aofs,
if (oprsz < maxsz) {
expand_clr(dofs + oprsz, maxsz - oprsz);
}
+ } else if (vece == 5) {
+ /* 256-bit duplicate. */
+ int i;
+
+ tcg_debug_assert(oprsz >= 32);
+ tcg_debug_assert(oprsz % 32 == 0);
+ if (TCG_TARGET_HAS_v256) {
+ TCGv_vec in = tcg_temp_new_vec(TCG_TYPE_V256);
+
+ tcg_gen_ld_vec(in, cpu_env, aofs);
+ for (i = (aofs == dofs) * 32; i < oprsz; i += 32) {
+ tcg_gen_st_vec(in, cpu_env, dofs + i);
+ }
+ tcg_temp_free_vec(in);
+ } else if (TCG_TARGET_HAS_v128) {
+ TCGv_vec in0 = tcg_temp_new_vec(TCG_TYPE_V128);
+ TCGv_vec in1 = tcg_temp_new_vec(TCG_TYPE_V128);
+
+ tcg_gen_ld_vec(in0, cpu_env, aofs);
+ tcg_gen_ld_vec(in1, cpu_env, aofs + 16);
+ for (i = (aofs == dofs) * 32; i < oprsz; i += 32) {
+ tcg_gen_st_vec(in0, cpu_env, dofs + i);
+ tcg_gen_st_vec(in1, cpu_env, dofs + i + 16);
+ }
+ tcg_temp_free_vec(in0);
+ tcg_temp_free_vec(in1);
+ } else {
+ TCGv_i64 in[4];
+ int j;
+
+ for (j = 0; j < 4; ++j) {
+ in[j] = tcg_temp_new_i64();
+ tcg_gen_ld_i64(in[j], cpu_env, aofs + j * 8);
+ }
+ for (i = (aofs == dofs) * 32; i < oprsz; i += 32) {
+ for (j = 0; j < 4; ++j) {
+ tcg_gen_st_i64(in[j], cpu_env, dofs + i + j * 8);
+ }
+ }
+ for (j = 0; j < 4; ++j) {
+ tcg_temp_free_i64(in[j]);
+ }
+ }
+ if (oprsz < maxsz) {
+ expand_clr(dofs + oprsz, maxsz - oprsz);
+ }
+ } else {
+ g_assert_not_reached();
}
}
--
2.25.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/3] softmmu/cpus: Only set parallel_cpus for SMP
2020-08-28 18:02 ` [PATCH 1/3] softmmu/cpus: Only set parallel_cpus for SMP Richard Henderson
@ 2020-08-31 17:17 ` Philippe Mathieu-Daudé
0 siblings, 0 replies; 7+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-08-31 17:17 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel@nongnu.org Developers
[-- Attachment #1: Type: text/plain, Size: 1575 bytes --]
Le ven. 28 août 2020 20:04, Richard Henderson <richard.henderson@linaro.org>
a écrit :
> Do not set parallel_cpus if there is only one cpu instantiated.
> This will allow tcg to use serial code to implement atomics.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
---
> softmmu/cpus.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/softmmu/cpus.c b/softmmu/cpus.c
> index a802e899ab..e3b98065c9 100644
> --- a/softmmu/cpus.c
> +++ b/softmmu/cpus.c
> @@ -1895,6 +1895,16 @@ static void qemu_tcg_init_vcpu(CPUState *cpu)
> if (!tcg_region_inited) {
> tcg_region_inited = 1;
> tcg_region_init();
> + /*
> + * If MTTCG, and we will create multiple cpus,
> + * then we will have cpus running in parallel.
> + */
> + if (qemu_tcg_mttcg_enabled()) {
> + MachineState *ms = MACHINE(qdev_get_machine());
> + if (ms->smp.max_cpus > 1) {
> + parallel_cpus = true;
> + }
> + }
> }
>
> if (qemu_tcg_mttcg_enabled() || !single_tcg_cpu_thread) {
> @@ -1904,7 +1914,6 @@ static void qemu_tcg_init_vcpu(CPUState *cpu)
>
> if (qemu_tcg_mttcg_enabled()) {
> /* create a thread per vCPU with TCG (MTTCG) */
> - parallel_cpus = true;
> snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG",
> cpu->cpu_index);
>
> --
> 2.25.1
>
>
>
[-- Attachment #2: Type: text/html, Size: 2723 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/3] tcg: Eliminate one store for in-place 128-bit dup_mem
2020-08-28 18:02 ` [PATCH 2/3] tcg: Eliminate one store for in-place 128-bit dup_mem Richard Henderson
@ 2020-08-31 17:18 ` Philippe Mathieu-Daudé
0 siblings, 0 replies; 7+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-08-31 17:18 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel@nongnu.org Developers
[-- Attachment #1: Type: text/plain, Size: 1477 bytes --]
Le ven. 28 août 2020 20:04, Richard Henderson <richard.henderson@linaro.org>
a écrit :
> Do not store back to the exact memory from which we just loaded.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
---
> tcg/tcg-op-gvec.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
> index 793d4ba64c..fcc25b04e6 100644
> --- a/tcg/tcg-op-gvec.c
> +++ b/tcg/tcg-op-gvec.c
> @@ -1581,7 +1581,7 @@ void tcg_gen_gvec_dup_mem(unsigned vece, uint32_t
> dofs, uint32_t aofs,
> TCGv_vec in = tcg_temp_new_vec(TCG_TYPE_V128);
>
> tcg_gen_ld_vec(in, cpu_env, aofs);
> - for (i = 0; i < oprsz; i += 16) {
> + for (i = (aofs == dofs) * 16; i < oprsz; i += 16) {
> tcg_gen_st_vec(in, cpu_env, dofs + i);
> }
> tcg_temp_free_vec(in);
> @@ -1591,7 +1591,7 @@ void tcg_gen_gvec_dup_mem(unsigned vece, uint32_t
> dofs, uint32_t aofs,
>
> tcg_gen_ld_i64(in0, cpu_env, aofs);
> tcg_gen_ld_i64(in1, cpu_env, aofs + 8);
> - for (i = 0; i < oprsz; i += 16) {
> + for (i = (aofs == dofs) * 16; i < oprsz; i += 16) {
> tcg_gen_st_i64(in0, cpu_env, dofs + i);
> tcg_gen_st_i64(in1, cpu_env, dofs + i + 8);
> }
> --
> 2.25.1
>
>
>
[-- Attachment #2: Type: text/html, Size: 2526 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 3/3] tcg: Implement 256-bit dup for tcg_gen_gvec_dup_mem
2020-08-28 18:02 ` [PATCH 3/3] tcg: Implement 256-bit dup for tcg_gen_gvec_dup_mem Richard Henderson
@ 2020-08-31 17:23 ` Philippe Mathieu-Daudé
0 siblings, 0 replies; 7+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-08-31 17:23 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel@nongnu.org Developers
[-- Attachment #1: Type: text/plain, Size: 3312 bytes --]
Le ven. 28 août 2020 20:04, Richard Henderson <richard.henderson@linaro.org>
a écrit :
> We already support duplication of 128-bit blocks. This extends
> that support to 256-bit blocks. This will be needed by SVE2.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
---
> tcg/tcg-op-gvec.c | 52 ++++++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 49 insertions(+), 3 deletions(-)
>
> diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
> index fcc25b04e6..7ebd9e8298 100644
> --- a/tcg/tcg-op-gvec.c
> +++ b/tcg/tcg-op-gvec.c
> @@ -1570,12 +1570,10 @@ void tcg_gen_gvec_dup_mem(unsigned vece, uint32_t
> dofs, uint32_t aofs,
> do_dup(vece, dofs, oprsz, maxsz, NULL, in, 0);
> tcg_temp_free_i64(in);
> }
> - } else {
> + } else if (vece == 4) {
> /* 128-bit duplicate. */
> - /* ??? Dup to 256-bit vector. */
> int i;
>
> - tcg_debug_assert(vece == 4);
> tcg_debug_assert(oprsz >= 16);
> if (TCG_TARGET_HAS_v128) {
> TCGv_vec in = tcg_temp_new_vec(TCG_TYPE_V128);
> @@ -1601,6 +1599,54 @@ void tcg_gen_gvec_dup_mem(unsigned vece, uint32_t
> dofs, uint32_t aofs,
> if (oprsz < maxsz) {
> expand_clr(dofs + oprsz, maxsz - oprsz);
> }
> + } else if (vece == 5) {
> + /* 256-bit duplicate. */
> + int i;
> +
> + tcg_debug_assert(oprsz >= 32);
> + tcg_debug_assert(oprsz % 32 == 0);
> + if (TCG_TARGET_HAS_v256) {
> + TCGv_vec in = tcg_temp_new_vec(TCG_TYPE_V256);
> +
> + tcg_gen_ld_vec(in, cpu_env, aofs);
> + for (i = (aofs == dofs) * 32; i < oprsz; i += 32) {
> + tcg_gen_st_vec(in, cpu_env, dofs + i);
> + }
> + tcg_temp_free_vec(in);
> + } else if (TCG_TARGET_HAS_v128) {
> + TCGv_vec in0 = tcg_temp_new_vec(TCG_TYPE_V128);
> + TCGv_vec in1 = tcg_temp_new_vec(TCG_TYPE_V128);
> +
> + tcg_gen_ld_vec(in0, cpu_env, aofs);
> + tcg_gen_ld_vec(in1, cpu_env, aofs + 16);
> + for (i = (aofs == dofs) * 32; i < oprsz; i += 32) {
> + tcg_gen_st_vec(in0, cpu_env, dofs + i);
> + tcg_gen_st_vec(in1, cpu_env, dofs + i + 16);
> + }
> + tcg_temp_free_vec(in0);
> + tcg_temp_free_vec(in1);
> + } else {
> + TCGv_i64 in[4];
> + int j;
> +
> + for (j = 0; j < 4; ++j) {
> + in[j] = tcg_temp_new_i64();
> + tcg_gen_ld_i64(in[j], cpu_env, aofs + j * 8);
> + }
> + for (i = (aofs == dofs) * 32; i < oprsz; i += 32) {
> + for (j = 0; j < 4; ++j) {
> + tcg_gen_st_i64(in[j], cpu_env, dofs + i + j * 8);
> + }
> + }
> + for (j = 0; j < 4; ++j) {
> + tcg_temp_free_i64(in[j]);
> + }
> + }
> + if (oprsz < maxsz) {
> + expand_clr(dofs + oprsz, maxsz - oprsz);
> + }
> + } else {
> + g_assert_not_reached();
> }
> }
>
> --
> 2.25.1
>
>
>
[-- Attachment #2: Type: text/html, Size: 4721 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2020-08-31 21:31 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-08-28 18:02 [PATCH 0/3] tcg misc patches Richard Henderson
2020-08-28 18:02 ` [PATCH 1/3] softmmu/cpus: Only set parallel_cpus for SMP Richard Henderson
2020-08-31 17:17 ` Philippe Mathieu-Daudé
2020-08-28 18:02 ` [PATCH 2/3] tcg: Eliminate one store for in-place 128-bit dup_mem Richard Henderson
2020-08-31 17:18 ` Philippe Mathieu-Daudé
2020-08-28 18:02 ` [PATCH 3/3] tcg: Implement 256-bit dup for tcg_gen_gvec_dup_mem Richard Henderson
2020-08-31 17:23 ` Philippe Mathieu-Daudé
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).