* [PATCH for 7.1 v1 0/8] memory leaks and speed tweaks
@ 2022-08-11 15:14 Alex Bennée
2022-08-11 15:14 ` [PATCH v1 1/8] linux-user: un-parent OBJECT(cpu) when closing thread Alex Bennée
` (8 more replies)
0 siblings, 9 replies; 21+ messages in thread
From: Alex Bennée @ 2022-08-11 15:14 UTC (permalink / raw)
To: qemu-devel; +Cc: Alex Bennée
Hi,
I've been collecting a number of small fixes since the tree was
frozen. I've been mostly focusing on improving the reliability of the
avocado tests and seeing if there are any low hanging fruit for
improving the performance.
The linux-user patch is a v2 fixing the obvious de-reference I missed
in v1 and prevents a memory leak in highly threaded code. Laurent may
want to cherry-pick himself if he wants to re-run the LTP tests before
merging although I hand ran the ones he mentioned failing and they all
work (for me at least ;-).
The CPUClass caching patches are a clean-up from my earlier hacky RFC
and shave a bit more time of the execution of particularly heavy IO
executions. The same is true of the SSI fixes.
The avocado fixes are band-aids over a wider issue which is we
currently can't cleanly wait for prompts that don't end in a newline.
However they should improve the situation of stuck tests a bit.
Finally the trace_dstate fix is some left over work from the TCG based
tracing that was pulled earlier this year. There is still the question
of what to do about per-vcpu trace points but they are all currently
called direct from C code so don't concern an TCG code.
I'm still going through the > 30s avocado tests on an --enable-debug
build. The two behemoths (BootLinuxAarch64.test_virt_tcg_gicv2/3)
should be ameliorated by better TB invalidation which rth is currently
cooking up patches for. I've still got to profile
BootLinuxS390X.test_s390_ccw_virtio_tcg,
ReplayKernelNormal.test_x86_64_pc, BootLinuxConsole.test_ppc_powernv8
and BootLinuxConsole.test_ppc_powernv9 which are kings of the
check-avocado time hill to see if there is anything obvious there.
The following patches still need review:
- accel/tcg: remove trace_vcpu_dstate TB checking
- tests/avocado: add timeout to the aspeed tests
- ssi: cache SSIPeripheralClass to avoid GET_CLASS()
- cputlb: used cached CPUClass in our hot-paths
- hw/core/cpu-sysemu: used cached class in cpu_asidx_from_attrs
- cpu: cache CPUClass in CPUState for hot code paths
Alex Bennée (8):
linux-user: un-parent OBJECT(cpu) when closing thread
cpu: cache CPUClass in CPUState for hot code paths
hw/core/cpu-sysemu: used cached class in cpu_asidx_from_attrs
cputlb: used cached CPUClass in our hot-paths
ssi: cache SSIPeripheralClass to avoid GET_CLASS()
tests/avocado: add timeout to the aspeed tests
tests/avocado: apply a band aid to aspeed-evb login
accel/tcg: remove trace_vcpu_dstate TB checking
accel/tcg/tb-hash.h | 6 +++---
include/exec/exec-all.h | 3 ---
include/hw/core/cpu.h | 9 +++++++++
include/hw/ssi/ssi.h | 3 +++
accel/tcg/cpu-exec.c | 6 +-----
accel/tcg/cputlb.c | 15 ++++++---------
accel/tcg/translate-all.c | 13 ++-----------
cpu.c | 9 ++++-----
hw/core/cpu-sysemu.c | 5 ++---
hw/ssi/ssi.c | 18 ++++++++----------
linux-user/syscall.c | 13 +++++++------
tests/avocado/machine_aspeed.py | 4 ++++
12 files changed, 49 insertions(+), 55 deletions(-)
--
2.30.2
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v1 1/8] linux-user: un-parent OBJECT(cpu) when closing thread
2022-08-11 15:14 [PATCH for 7.1 v1 0/8] memory leaks and speed tweaks Alex Bennée
@ 2022-08-11 15:14 ` Alex Bennée
2022-08-11 15:14 ` [PATCH v1 2/8] cpu: cache CPUClass in CPUState for hot code paths Alex Bennée
` (7 subsequent siblings)
8 siblings, 0 replies; 21+ messages in thread
From: Alex Bennée @ 2022-08-11 15:14 UTC (permalink / raw)
To: qemu-devel; +Cc: Alex Bennée, Laurent Vivier
While forcing the CPU to unrealize by hand does trigger the clean-up
code we never fully free resources because refcount never reaches
zero. This is because QOM automatically added objects without an
explicit parent to /unattached/, incrementing the refcount.
Instead of manually triggering unrealization just unparent the object
and let the device machinery deal with that for us.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/866
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20220610143855.1211000-1-alex.bennee@linaro.org>
---
v2
- move clearing of child_tidptr to before we finalise the CPU
object. While ts itself can be cleared g2h needs the current CPU
to resolve the address.
---
linux-user/syscall.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index f409121202..bfdd60136b 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8594,7 +8594,13 @@ static abi_long do_syscall1(CPUArchState *cpu_env, int num, abi_long arg1,
if (CPU_NEXT(first_cpu)) {
TaskState *ts = cpu->opaque;
- object_property_set_bool(OBJECT(cpu), "realized", false, NULL);
+ if (ts->child_tidptr) {
+ put_user_u32(0, ts->child_tidptr);
+ do_sys_futex(g2h(cpu, ts->child_tidptr),
+ FUTEX_WAKE, INT_MAX, NULL, NULL, 0);
+ }
+
+ object_unparent(OBJECT(cpu));
object_unref(OBJECT(cpu));
/*
* At this point the CPU should be unrealized and removed
@@ -8604,11 +8610,6 @@ static abi_long do_syscall1(CPUArchState *cpu_env, int num, abi_long arg1,
pthread_mutex_unlock(&clone_lock);
- if (ts->child_tidptr) {
- put_user_u32(0, ts->child_tidptr);
- do_sys_futex(g2h(cpu, ts->child_tidptr),
- FUTEX_WAKE, INT_MAX, NULL, NULL, 0);
- }
thread_cpu = NULL;
g_free(ts);
rcu_unregister_thread();
--
2.30.2
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v1 2/8] cpu: cache CPUClass in CPUState for hot code paths
2022-08-11 15:14 [PATCH for 7.1 v1 0/8] memory leaks and speed tweaks Alex Bennée
2022-08-11 15:14 ` [PATCH v1 1/8] linux-user: un-parent OBJECT(cpu) when closing thread Alex Bennée
@ 2022-08-11 15:14 ` Alex Bennée
2022-08-11 17:17 ` Richard Henderson
2022-08-11 23:37 ` Philippe Mathieu-Daudé via
2022-08-11 15:14 ` [PATCH v1 3/8] hw/core/cpu-sysemu: used cached class in cpu_asidx_from_attrs Alex Bennée
` (6 subsequent siblings)
8 siblings, 2 replies; 21+ messages in thread
From: Alex Bennée @ 2022-08-11 15:14 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Bennée, Eduardo Habkost, Marcel Apfelbaum,
Philippe Mathieu-Daudé, Yanan Wang
The class cast checkers are quite expensive and always on (unlike the
dynamic case who's checks are gated by CONFIG_QOM_CAST_DEBUG). To
avoid the overhead of repeatedly checking something which should never
change we cache the CPUClass reference for use in the hot code paths.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
include/hw/core/cpu.h | 9 +++++++++
cpu.c | 9 ++++-----
2 files changed, 13 insertions(+), 5 deletions(-)
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 500503da13..1a7e1a9380 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -51,6 +51,13 @@ typedef int (*WriteCoreDumpFunction)(const void *buf, size_t size,
*/
#define CPU(obj) ((CPUState *)(obj))
+/*
+ * The class checkers bring in CPU_GET_CLASS() which is potentially
+ * expensive given the eventual call to
+ * object_class_dynamic_cast_assert(). Because of this the CPUState
+ * has a cached value for the class in cs->cc which is set up in
+ * cpu_exec_realizefn() for use in hot code paths.
+ */
typedef struct CPUClass CPUClass;
DECLARE_CLASS_CHECKERS(CPUClass, CPU,
TYPE_CPU)
@@ -317,6 +324,8 @@ struct qemu_work_item;
struct CPUState {
/*< private >*/
DeviceState parent_obj;
+ /* cache to avoid expensive CPU_GET_CLASS */
+ CPUClass *cc;
/*< public >*/
int nr_cores;
diff --git a/cpu.c b/cpu.c
index 584ac78baf..14365e36f3 100644
--- a/cpu.c
+++ b/cpu.c
@@ -131,9 +131,8 @@ const VMStateDescription vmstate_cpu_common = {
void cpu_exec_realizefn(CPUState *cpu, Error **errp)
{
-#ifndef CONFIG_USER_ONLY
- CPUClass *cc = CPU_GET_CLASS(cpu);
-#endif
+ /* cache the cpu class for the hotpath */
+ cpu->cc = CPU_GET_CLASS(cpu);
cpu_list_add(cpu);
if (!accel_cpu_realizefn(cpu, errp)) {
@@ -151,8 +150,8 @@ void cpu_exec_realizefn(CPUState *cpu, Error **errp)
if (qdev_get_vmsd(DEVICE(cpu)) == NULL) {
vmstate_register(NULL, cpu->cpu_index, &vmstate_cpu_common, cpu);
}
- if (cc->sysemu_ops->legacy_vmsd != NULL) {
- vmstate_register(NULL, cpu->cpu_index, cc->sysemu_ops->legacy_vmsd, cpu);
+ if (cpu->cc->sysemu_ops->legacy_vmsd != NULL) {
+ vmstate_register(NULL, cpu->cpu_index, cpu->cc->sysemu_ops->legacy_vmsd, cpu);
}
#endif /* CONFIG_USER_ONLY */
}
--
2.30.2
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v1 3/8] hw/core/cpu-sysemu: used cached class in cpu_asidx_from_attrs
2022-08-11 15:14 [PATCH for 7.1 v1 0/8] memory leaks and speed tweaks Alex Bennée
2022-08-11 15:14 ` [PATCH v1 1/8] linux-user: un-parent OBJECT(cpu) when closing thread Alex Bennée
2022-08-11 15:14 ` [PATCH v1 2/8] cpu: cache CPUClass in CPUState for hot code paths Alex Bennée
@ 2022-08-11 15:14 ` Alex Bennée
2022-08-11 17:17 ` Richard Henderson
2022-08-11 23:37 ` Philippe Mathieu-Daudé via
2022-08-11 15:14 ` [PATCH v1 4/8] cputlb: used cached CPUClass in our hot-paths Alex Bennée
` (5 subsequent siblings)
8 siblings, 2 replies; 21+ messages in thread
From: Alex Bennée @ 2022-08-11 15:14 UTC (permalink / raw)
To: qemu-devel; +Cc: Alex Bennée
This is a heavily used function so lets avoid the cost of
CPU_GET_CLASS. On the romulus-bmc run it has a modest effect:
Before: 36.812 s ± 0.506 s
After: 35.912 s ± 0.168 s
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
hw/core/cpu-sysemu.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/hw/core/cpu-sysemu.c b/hw/core/cpu-sysemu.c
index 00253f8929..5eaf2e79e6 100644
--- a/hw/core/cpu-sysemu.c
+++ b/hw/core/cpu-sysemu.c
@@ -69,11 +69,10 @@ hwaddr cpu_get_phys_page_debug(CPUState *cpu, vaddr addr)
int cpu_asidx_from_attrs(CPUState *cpu, MemTxAttrs attrs)
{
- CPUClass *cc = CPU_GET_CLASS(cpu);
int ret = 0;
- if (cc->sysemu_ops->asidx_from_attrs) {
- ret = cc->sysemu_ops->asidx_from_attrs(cpu, attrs);
+ if (cpu->cc->sysemu_ops->asidx_from_attrs) {
+ ret = cpu->cc->sysemu_ops->asidx_from_attrs(cpu, attrs);
assert(ret < cpu->num_ases && ret >= 0);
}
return ret;
--
2.30.2
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v1 4/8] cputlb: used cached CPUClass in our hot-paths
2022-08-11 15:14 [PATCH for 7.1 v1 0/8] memory leaks and speed tweaks Alex Bennée
` (2 preceding siblings ...)
2022-08-11 15:14 ` [PATCH v1 3/8] hw/core/cpu-sysemu: used cached class in cpu_asidx_from_attrs Alex Bennée
@ 2022-08-11 15:14 ` Alex Bennée
2022-08-11 17:18 ` Richard Henderson
2022-08-11 23:39 ` Philippe Mathieu-Daudé via
2022-08-11 15:14 ` [PATCH v1 5/8] ssi: cache SSIPeripheralClass to avoid GET_CLASS() Alex Bennée
` (4 subsequent siblings)
8 siblings, 2 replies; 21+ messages in thread
From: Alex Bennée @ 2022-08-11 15:14 UTC (permalink / raw)
To: qemu-devel; +Cc: Alex Bennée, Richard Henderson, Paolo Bonzini
Before: 35.912 s ± 0.168 s
After: 35.565 s ± 0.087 s
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
accel/tcg/cputlb.c | 15 ++++++---------
1 file changed, 6 insertions(+), 9 deletions(-)
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index a46f3a654d..891f3f04c5 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1303,15 +1303,14 @@ static inline ram_addr_t qemu_ram_addr_from_host_nofail(void *ptr)
static void tlb_fill(CPUState *cpu, target_ulong addr, int size,
MMUAccessType access_type, int mmu_idx, uintptr_t retaddr)
{
- CPUClass *cc = CPU_GET_CLASS(cpu);
bool ok;
/*
* This is not a probe, so only valid return is success; failure
* should result in exception + longjmp to the cpu loop.
*/
- ok = cc->tcg_ops->tlb_fill(cpu, addr, size,
- access_type, mmu_idx, false, retaddr);
+ ok = cpu->cc->tcg_ops->tlb_fill(cpu, addr, size,
+ access_type, mmu_idx, false, retaddr);
assert(ok);
}
@@ -1319,9 +1318,8 @@ static inline void cpu_unaligned_access(CPUState *cpu, vaddr addr,
MMUAccessType access_type,
int mmu_idx, uintptr_t retaddr)
{
- CPUClass *cc = CPU_GET_CLASS(cpu);
-
- cc->tcg_ops->do_unaligned_access(cpu, addr, access_type, mmu_idx, retaddr);
+ cpu->cc->tcg_ops->do_unaligned_access(cpu, addr, access_type,
+ mmu_idx, retaddr);
}
static inline void cpu_transaction_failed(CPUState *cpu, hwaddr physaddr,
@@ -1606,10 +1604,9 @@ static int probe_access_internal(CPUArchState *env, target_ulong addr,
if (!tlb_hit_page(tlb_addr, page_addr)) {
if (!victim_tlb_hit(env, mmu_idx, index, elt_ofs, page_addr)) {
CPUState *cs = env_cpu(env);
- CPUClass *cc = CPU_GET_CLASS(cs);
- if (!cc->tcg_ops->tlb_fill(cs, addr, fault_size, access_type,
- mmu_idx, nonfault, retaddr)) {
+ if (!cs->cc->tcg_ops->tlb_fill(cs, addr, fault_size, access_type,
+ mmu_idx, nonfault, retaddr)) {
/* Non-faulting page table read failed. */
*phost = NULL;
return TLB_INVALID_MASK;
--
2.30.2
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v1 5/8] ssi: cache SSIPeripheralClass to avoid GET_CLASS()
2022-08-11 15:14 [PATCH for 7.1 v1 0/8] memory leaks and speed tweaks Alex Bennée
` (3 preceding siblings ...)
2022-08-11 15:14 ` [PATCH v1 4/8] cputlb: used cached CPUClass in our hot-paths Alex Bennée
@ 2022-08-11 15:14 ` Alex Bennée
2022-08-11 15:30 ` Cédric Le Goater
2022-08-11 23:42 ` Philippe Mathieu-Daudé via
2022-08-11 15:14 ` [PATCH v1 6/8] tests/avocado: add timeout to the aspeed tests Alex Bennée
` (3 subsequent siblings)
8 siblings, 2 replies; 21+ messages in thread
From: Alex Bennée @ 2022-08-11 15:14 UTC (permalink / raw)
To: qemu-devel; +Cc: Alex Bennée, Cédric Le Goater, Alistair Francis
Investigating why some BMC models are so slow compared to a plain ARM
virt machines I did some profiling of:
./qemu-system-arm -M romulus-bmc -nic user \
-drive
file=obmc-phosphor-image-romulus.static.mtd,format=raw,if=mtd \
-nographic -serial mon:stdio
And saw that object_class_dynamic_cast_assert was dominating the
profile times. We have a number of cases in this model of the SSI bus.
As the class is static once the object is created we just cache it and
use it instead of the dynamic case macros.
Profiling against:
./tests/venv/bin/avocado run \
tests/avocado/machine_aspeed.py:test_arm_ast2500_romulus_openbmc_v2_9_0
Before: 35.565 s ± 0.087 s
After: 15.713 s ± 0.287 s
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Cc: Cédric Le Goater <clg@kaod.org>
---
v2
- split patches
---
include/hw/ssi/ssi.h | 3 +++
hw/ssi/ssi.c | 18 ++++++++----------
2 files changed, 11 insertions(+), 10 deletions(-)
diff --git a/include/hw/ssi/ssi.h b/include/hw/ssi/ssi.h
index f411858ab0..6950f86810 100644
--- a/include/hw/ssi/ssi.h
+++ b/include/hw/ssi/ssi.h
@@ -59,6 +59,9 @@ struct SSIPeripheralClass {
struct SSIPeripheral {
DeviceState parent_obj;
+ /* cache the class */
+ SSIPeripheralClass *spc;
+
/* Chip select state */
bool cs;
};
diff --git a/hw/ssi/ssi.c b/hw/ssi/ssi.c
index 003931fb50..d54a109bee 100644
--- a/hw/ssi/ssi.c
+++ b/hw/ssi/ssi.c
@@ -38,9 +38,8 @@ static void ssi_cs_default(void *opaque, int n, int level)
bool cs = !!level;
assert(n == 0);
if (s->cs != cs) {
- SSIPeripheralClass *ssc = SSI_PERIPHERAL_GET_CLASS(s);
- if (ssc->set_cs) {
- ssc->set_cs(s, cs);
+ if (s->spc->set_cs) {
+ s->spc->set_cs(s, cs);
}
}
s->cs = cs;
@@ -48,11 +47,11 @@ static void ssi_cs_default(void *opaque, int n, int level)
static uint32_t ssi_transfer_raw_default(SSIPeripheral *dev, uint32_t val)
{
- SSIPeripheralClass *ssc = SSI_PERIPHERAL_GET_CLASS(dev);
+ SSIPeripheralClass *ssc = dev->spc;
if ((dev->cs && ssc->cs_polarity == SSI_CS_HIGH) ||
- (!dev->cs && ssc->cs_polarity == SSI_CS_LOW) ||
- ssc->cs_polarity == SSI_CS_NONE) {
+ (!dev->cs && ssc->cs_polarity == SSI_CS_LOW) ||
+ ssc->cs_polarity == SSI_CS_NONE) {
return ssc->transfer(dev, val);
}
return 0;
@@ -67,6 +66,7 @@ static void ssi_peripheral_realize(DeviceState *dev, Error **errp)
ssc->cs_polarity != SSI_CS_NONE) {
qdev_init_gpio_in_named(dev, ssi_cs_default, SSI_GPIO_CS, 1);
}
+ s->spc = ssc;
ssc->realize(s, errp);
}
@@ -115,13 +115,11 @@ uint32_t ssi_transfer(SSIBus *bus, uint32_t val)
{
BusState *b = BUS(bus);
BusChild *kid;
- SSIPeripheralClass *ssc;
uint32_t r = 0;
QTAILQ_FOREACH(kid, &b->children, sibling) {
- SSIPeripheral *peripheral = SSI_PERIPHERAL(kid->child);
- ssc = SSI_PERIPHERAL_GET_CLASS(peripheral);
- r |= ssc->transfer_raw(peripheral, val);
+ SSIPeripheral *p = SSI_PERIPHERAL(kid->child);
+ r |= p->spc->transfer_raw(p, val);
}
return r;
--
2.30.2
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v1 6/8] tests/avocado: add timeout to the aspeed tests
2022-08-11 15:14 [PATCH for 7.1 v1 0/8] memory leaks and speed tweaks Alex Bennée
` (4 preceding siblings ...)
2022-08-11 15:14 ` [PATCH v1 5/8] ssi: cache SSIPeripheralClass to avoid GET_CLASS() Alex Bennée
@ 2022-08-11 15:14 ` Alex Bennée
2022-08-11 23:44 ` Philippe Mathieu-Daudé via
2022-08-11 15:14 ` [PATCH v1 7/8] tests/avocado: apply a band aid to aspeed-evb login Alex Bennée
` (2 subsequent siblings)
8 siblings, 1 reply; 21+ messages in thread
From: Alex Bennée @ 2022-08-11 15:14 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Bennée, Cleber Rosa, Philippe Mathieu-Daudé,
Wainer dos Santos Moschetta, Beraldo Leal
On some systems the test can hang. At least defining a timeout stops
it from hanging forever.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
tests/avocado/machine_aspeed.py | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tests/avocado/machine_aspeed.py b/tests/avocado/machine_aspeed.py
index b4e35a3d07..c54da0fd8f 100644
--- a/tests/avocado/machine_aspeed.py
+++ b/tests/avocado/machine_aspeed.py
@@ -40,6 +40,8 @@ def test_ast1030_zephyros(self):
class AST2x00Machine(QemuSystemTest):
+ timeout = 90
+
def wait_for_console_pattern(self, success_message, vm=None):
wait_for_console_pattern(self, success_message,
failure_message='Kernel panic - not syncing',
--
2.30.2
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v1 7/8] tests/avocado: apply a band aid to aspeed-evb login
2022-08-11 15:14 [PATCH for 7.1 v1 0/8] memory leaks and speed tweaks Alex Bennée
` (5 preceding siblings ...)
2022-08-11 15:14 ` [PATCH v1 6/8] tests/avocado: add timeout to the aspeed tests Alex Bennée
@ 2022-08-11 15:14 ` Alex Bennée
2022-08-11 15:14 ` [PATCH v1 8/8] accel/tcg: remove trace_vcpu_dstate TB checking Alex Bennée
2022-08-11 17:05 ` [PATCH for 7.1 v1 0/8] memory leaks and speed tweaks Peter Maydell
8 siblings, 0 replies; 21+ messages in thread
From: Alex Bennée @ 2022-08-11 15:14 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Bennée, Cédric Le Goater, John Snow, Cleber Rosa,
Philippe Mathieu-Daudé, Wainer dos Santos Moschetta,
Beraldo Leal
This is really a limitation of the underlying console code which
doesn't allow us to detect the login: and following "#" prompts
because it reads input line wise. By adding a small delay we ensure
that the login prompt has appeared so we don't accidentally spaff the
shell commands to a confused getty in the guest.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Acked-by: John Snow <jsnow@redhat.com>
---
tests/avocado/machine_aspeed.py | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tests/avocado/machine_aspeed.py b/tests/avocado/machine_aspeed.py
index c54da0fd8f..65d38f4efa 100644
--- a/tests/avocado/machine_aspeed.py
+++ b/tests/avocado/machine_aspeed.py
@@ -101,7 +101,9 @@ def do_test_arm_aspeed_buidroot_start(self, image, cpu_id):
self.wait_for_console_pattern('Starting kernel ...')
self.wait_for_console_pattern('Booting Linux on physical CPU ' + cpu_id)
self.wait_for_console_pattern('lease of 10.0.2.15')
+ # the line before login:
self.wait_for_console_pattern('Aspeed EVB')
+ time.sleep(0.1)
exec_command(self, 'root')
time.sleep(0.1)
--
2.30.2
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v1 8/8] accel/tcg: remove trace_vcpu_dstate TB checking
2022-08-11 15:14 [PATCH for 7.1 v1 0/8] memory leaks and speed tweaks Alex Bennée
` (6 preceding siblings ...)
2022-08-11 15:14 ` [PATCH v1 7/8] tests/avocado: apply a band aid to aspeed-evb login Alex Bennée
@ 2022-08-11 15:14 ` Alex Bennée
2022-08-11 23:44 ` Philippe Mathieu-Daudé via
2022-08-11 17:05 ` [PATCH for 7.1 v1 0/8] memory leaks and speed tweaks Peter Maydell
8 siblings, 1 reply; 21+ messages in thread
From: Alex Bennée @ 2022-08-11 15:14 UTC (permalink / raw)
To: qemu-devel; +Cc: Alex Bennée, Richard Henderson, Paolo Bonzini
We removed the ability to do vcpu tcg tracing between:
d9a6bad542 (docs: remove references to TCG tracing)
and
126d4123c5 (tracing: excise the tcg related from tracetool)
but missed a bunch of other code. Lets continue the clean-up by
removing the extra field from tb_hash saving us 4 bytes per-TB and the
additional cost of hashing/checking something that was always empty
anyway.
There remain some per-vcpu trace points which don't look as though
they are called anywhere and the command line/QMP machinery to
clean-up.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
accel/tcg/tb-hash.h | 6 +++---
include/exec/exec-all.h | 3 ---
accel/tcg/cpu-exec.c | 6 +-----
accel/tcg/translate-all.c | 13 ++-----------
4 files changed, 6 insertions(+), 22 deletions(-)
diff --git a/accel/tcg/tb-hash.h b/accel/tcg/tb-hash.h
index 0a273d9605..d58115ee70 100644
--- a/accel/tcg/tb-hash.h
+++ b/accel/tcg/tb-hash.h
@@ -60,10 +60,10 @@ static inline unsigned int tb_jmp_cache_hash_func(target_ulong pc)
#endif /* CONFIG_SOFTMMU */
static inline
-uint32_t tb_hash_func(tb_page_addr_t phys_pc, target_ulong pc, uint32_t flags,
- uint32_t cf_mask, uint32_t trace_vcpu_dstate)
+uint32_t tb_hash_func(tb_page_addr_t phys_pc, target_ulong pc,
+ uint32_t flags, uint32_t cf_mask)
{
- return qemu_xxhash7(phys_pc, pc, flags, cf_mask, trace_vcpu_dstate);
+ return qemu_xxhash6(phys_pc, pc, flags, cf_mask);
}
#endif
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 311e5fb422..21469da064 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -479,9 +479,6 @@ struct TranslationBlock {
#define CF_CLUSTER_MASK 0xff000000 /* Top 8 bits are cluster ID */
#define CF_CLUSTER_SHIFT 24
- /* Per-vCPU dynamic tracing state used to generate this TB */
- uint32_t trace_vcpu_dstate;
-
/*
* Above fields used for comparing
*/
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index a565a3f8ec..86f0276b1d 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -188,7 +188,6 @@ static inline TranslationBlock *tb_lookup(CPUState *cpu, target_ulong pc,
tb->pc == pc &&
tb->cs_base == cs_base &&
tb->flags == flags &&
- tb->trace_vcpu_dstate == *cpu->trace_dstate &&
tb_cflags(tb) == cflags)) {
return tb;
}
@@ -494,7 +493,6 @@ struct tb_desc {
tb_page_addr_t phys_page1;
uint32_t flags;
uint32_t cflags;
- uint32_t trace_vcpu_dstate;
};
static bool tb_lookup_cmp(const void *p, const void *d)
@@ -506,7 +504,6 @@ static bool tb_lookup_cmp(const void *p, const void *d)
tb->page_addr[0] == desc->phys_page1 &&
tb->cs_base == desc->cs_base &&
tb->flags == desc->flags &&
- tb->trace_vcpu_dstate == desc->trace_vcpu_dstate &&
tb_cflags(tb) == desc->cflags) {
/* check next page if needed */
if (tb->page_addr[1] == -1) {
@@ -537,14 +534,13 @@ TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc,
desc.cs_base = cs_base;
desc.flags = flags;
desc.cflags = cflags;
- desc.trace_vcpu_dstate = *cpu->trace_dstate;
desc.pc = pc;
phys_pc = get_page_addr_code(desc.env, pc);
if (phys_pc == -1) {
return NULL;
}
desc.phys_page1 = phys_pc & TARGET_PAGE_MASK;
- h = tb_hash_func(phys_pc, pc, flags, cflags, *cpu->trace_dstate);
+ h = tb_hash_func(phys_pc, pc, flags, cflags);
return qht_lookup_custom(&tb_ctx.htable, &desc, h, tb_lookup_cmp);
}
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index ef62a199c7..ce05cb4103 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -197,11 +197,6 @@ struct page_collection {
#define V_L2_BITS 10
#define V_L2_SIZE (1 << V_L2_BITS)
-/* Make sure all possible CPU event bits fit in tb->trace_vcpu_dstate */
-QEMU_BUILD_BUG_ON(CPU_TRACE_DSTATE_MAX_EVENTS >
- sizeof_field(TranslationBlock, trace_vcpu_dstate)
- * BITS_PER_BYTE);
-
/*
* L1 Mapping properties
*/
@@ -894,7 +889,6 @@ static bool tb_cmp(const void *ap, const void *bp)
a->cs_base == b->cs_base &&
a->flags == b->flags &&
(tb_cflags(a) & ~CF_INVALID) == (tb_cflags(b) & ~CF_INVALID) &&
- a->trace_vcpu_dstate == b->trace_vcpu_dstate &&
a->page_addr[0] == b->page_addr[0] &&
a->page_addr[1] == b->page_addr[1];
}
@@ -1186,8 +1180,7 @@ static void do_tb_phys_invalidate(TranslationBlock *tb, bool rm_from_page_list)
/* remove the TB from the hash list */
phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK);
- h = tb_hash_func(phys_pc, tb->pc, tb->flags, orig_cflags,
- tb->trace_vcpu_dstate);
+ h = tb_hash_func(phys_pc, tb->pc, tb->flags, orig_cflags);
if (!qht_remove(&tb_ctx.htable, tb, h)) {
return;
}
@@ -1349,8 +1342,7 @@ tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
}
/* add in the hash table */
- h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->cflags,
- tb->trace_vcpu_dstate);
+ h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->cflags);
qht_insert(&tb_ctx.htable, tb, h, &existing_tb);
/* remove TB from the page(s) if we couldn't insert it */
@@ -1426,7 +1418,6 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
tb->cs_base = cs_base;
tb->flags = flags;
tb->cflags = cflags;
- tb->trace_vcpu_dstate = *cpu->trace_dstate;
tcg_ctx->tb_cflags = cflags;
tb_overflow:
--
2.30.2
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v1 5/8] ssi: cache SSIPeripheralClass to avoid GET_CLASS()
2022-08-11 15:14 ` [PATCH v1 5/8] ssi: cache SSIPeripheralClass to avoid GET_CLASS() Alex Bennée
@ 2022-08-11 15:30 ` Cédric Le Goater
2022-08-11 23:42 ` Philippe Mathieu-Daudé via
1 sibling, 0 replies; 21+ messages in thread
From: Cédric Le Goater @ 2022-08-11 15:30 UTC (permalink / raw)
To: Alex Bennée, qemu-devel; +Cc: Alistair Francis
On 8/11/22 17:14, Alex Bennée wrote:
> Investigating why some BMC models are so slow compared to a plain ARM
> virt machines I did some profiling of:
>
> ./qemu-system-arm -M romulus-bmc -nic user \
> -drive
> file=obmc-phosphor-image-romulus.static.mtd,format=raw,if=mtd \
> -nographic -serial mon:stdio
>
> And saw that object_class_dynamic_cast_assert was dominating the
> profile times. We have a number of cases in this model of the SSI bus.
> As the class is static once the object is created we just cache it and
> use it instead of the dynamic case macros.
>
> Profiling against:
>
> ./tests/venv/bin/avocado run \
> tests/avocado/machine_aspeed.py:test_arm_ast2500_romulus_openbmc_v2_9_0
>
> Before: 35.565 s ± 0.087 s
> After: 15.713 s ± 0.287 s
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Cc: Cédric Le Goater <clg@kaod.org>
Tested-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Thanks,
C.
>
> ---
> v2
> - split patches
> ---
> include/hw/ssi/ssi.h | 3 +++
> hw/ssi/ssi.c | 18 ++++++++----------
> 2 files changed, 11 insertions(+), 10 deletions(-)
>
> diff --git a/include/hw/ssi/ssi.h b/include/hw/ssi/ssi.h
> index f411858ab0..6950f86810 100644
> --- a/include/hw/ssi/ssi.h
> +++ b/include/hw/ssi/ssi.h
> @@ -59,6 +59,9 @@ struct SSIPeripheralClass {
> struct SSIPeripheral {
> DeviceState parent_obj;
>
> + /* cache the class */
> + SSIPeripheralClass *spc;
> +
> /* Chip select state */
> bool cs;
> };
> diff --git a/hw/ssi/ssi.c b/hw/ssi/ssi.c
> index 003931fb50..d54a109bee 100644
> --- a/hw/ssi/ssi.c
> +++ b/hw/ssi/ssi.c
> @@ -38,9 +38,8 @@ static void ssi_cs_default(void *opaque, int n, int level)
> bool cs = !!level;
> assert(n == 0);
> if (s->cs != cs) {
> - SSIPeripheralClass *ssc = SSI_PERIPHERAL_GET_CLASS(s);
> - if (ssc->set_cs) {
> - ssc->set_cs(s, cs);
> + if (s->spc->set_cs) {
> + s->spc->set_cs(s, cs);
> }
> }
> s->cs = cs;
> @@ -48,11 +47,11 @@ static void ssi_cs_default(void *opaque, int n, int level)
>
> static uint32_t ssi_transfer_raw_default(SSIPeripheral *dev, uint32_t val)
> {
> - SSIPeripheralClass *ssc = SSI_PERIPHERAL_GET_CLASS(dev);
> + SSIPeripheralClass *ssc = dev->spc;
>
> if ((dev->cs && ssc->cs_polarity == SSI_CS_HIGH) ||
> - (!dev->cs && ssc->cs_polarity == SSI_CS_LOW) ||
> - ssc->cs_polarity == SSI_CS_NONE) {
> + (!dev->cs && ssc->cs_polarity == SSI_CS_LOW) ||
> + ssc->cs_polarity == SSI_CS_NONE) {
> return ssc->transfer(dev, val);
> }
> return 0;
> @@ -67,6 +66,7 @@ static void ssi_peripheral_realize(DeviceState *dev, Error **errp)
> ssc->cs_polarity != SSI_CS_NONE) {
> qdev_init_gpio_in_named(dev, ssi_cs_default, SSI_GPIO_CS, 1);
> }
> + s->spc = ssc;
>
> ssc->realize(s, errp);
> }
> @@ -115,13 +115,11 @@ uint32_t ssi_transfer(SSIBus *bus, uint32_t val)
> {
> BusState *b = BUS(bus);
> BusChild *kid;
> - SSIPeripheralClass *ssc;
> uint32_t r = 0;
>
> QTAILQ_FOREACH(kid, &b->children, sibling) {
> - SSIPeripheral *peripheral = SSI_PERIPHERAL(kid->child);
> - ssc = SSI_PERIPHERAL_GET_CLASS(peripheral);
> - r |= ssc->transfer_raw(peripheral, val);
> + SSIPeripheral *p = SSI_PERIPHERAL(kid->child);
> + r |= p->spc->transfer_raw(p, val);
> }
>
> return r;
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH for 7.1 v1 0/8] memory leaks and speed tweaks
2022-08-11 15:14 [PATCH for 7.1 v1 0/8] memory leaks and speed tweaks Alex Bennée
` (7 preceding siblings ...)
2022-08-11 15:14 ` [PATCH v1 8/8] accel/tcg: remove trace_vcpu_dstate TB checking Alex Bennée
@ 2022-08-11 17:05 ` Peter Maydell
2022-08-11 18:00 ` Alex Bennée
8 siblings, 1 reply; 21+ messages in thread
From: Peter Maydell @ 2022-08-11 17:05 UTC (permalink / raw)
To: Alex Bennée; +Cc: qemu-devel
On Thu, 11 Aug 2022 at 16:24, Alex Bennée <alex.bennee@linaro.org> wrote:
>
> Hi,
>
> I've been collecting a number of small fixes since the tree was
> frozen. I've been mostly focusing on improving the reliability of the
> avocado tests and seeing if there are any low hanging fruit for
> improving the performance.
> Alex Bennée (8):
> linux-user: un-parent OBJECT(cpu) when closing thread
> cpu: cache CPUClass in CPUState for hot code paths
> hw/core/cpu-sysemu: used cached class in cpu_asidx_from_attrs
> cputlb: used cached CPUClass in our hot-paths
> ssi: cache SSIPeripheralClass to avoid GET_CLASS()
> tests/avocado: add timeout to the aspeed tests
> tests/avocado: apply a band aid to aspeed-evb login
> accel/tcg: remove trace_vcpu_dstate TB checking
Changes to tests/ is fine, and fixes for memory leaks
also if they've been well tested, but stuff like the
caching of class objects is really not 7.1 material
at this point in the release cycle, I think.
thanks
-- PMM
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v1 2/8] cpu: cache CPUClass in CPUState for hot code paths
2022-08-11 15:14 ` [PATCH v1 2/8] cpu: cache CPUClass in CPUState for hot code paths Alex Bennée
@ 2022-08-11 17:17 ` Richard Henderson
2022-08-11 23:37 ` Philippe Mathieu-Daudé via
1 sibling, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2022-08-11 17:17 UTC (permalink / raw)
To: Alex Bennée, qemu-devel
Cc: Eduardo Habkost, Marcel Apfelbaum, Philippe Mathieu-Daudé,
Yanan Wang
On 8/11/22 08:14, Alex Bennée wrote:
> The class cast checkers are quite expensive and always on (unlike the
> dynamic case who's checks are gated by CONFIG_QOM_CAST_DEBUG). To
> avoid the overhead of repeatedly checking something which should never
> change we cache the CPUClass reference for use in the hot code paths.
>
> Signed-off-by: Alex Bennée<alex.bennee@linaro.org>
> ---
> include/hw/core/cpu.h | 9 +++++++++
> cpu.c | 9 ++++-----
> 2 files changed, 13 insertions(+), 5 deletions(-)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v1 3/8] hw/core/cpu-sysemu: used cached class in cpu_asidx_from_attrs
2022-08-11 15:14 ` [PATCH v1 3/8] hw/core/cpu-sysemu: used cached class in cpu_asidx_from_attrs Alex Bennée
@ 2022-08-11 17:17 ` Richard Henderson
2022-08-11 23:37 ` Philippe Mathieu-Daudé via
1 sibling, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2022-08-11 17:17 UTC (permalink / raw)
To: Alex Bennée, qemu-devel
On 8/11/22 08:14, Alex Bennée wrote:
> This is a heavily used function so lets avoid the cost of
> CPU_GET_CLASS. On the romulus-bmc run it has a modest effect:
>
> Before: 36.812 s ± 0.506 s
> After: 35.912 s ± 0.168 s
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
> hw/core/cpu-sysemu.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/hw/core/cpu-sysemu.c b/hw/core/cpu-sysemu.c
> index 00253f8929..5eaf2e79e6 100644
> --- a/hw/core/cpu-sysemu.c
> +++ b/hw/core/cpu-sysemu.c
> @@ -69,11 +69,10 @@ hwaddr cpu_get_phys_page_debug(CPUState *cpu, vaddr addr)
>
> int cpu_asidx_from_attrs(CPUState *cpu, MemTxAttrs attrs)
> {
> - CPUClass *cc = CPU_GET_CLASS(cpu);
> int ret = 0;
>
> - if (cc->sysemu_ops->asidx_from_attrs) {
> - ret = cc->sysemu_ops->asidx_from_attrs(cpu, attrs);
> + if (cpu->cc->sysemu_ops->asidx_from_attrs) {
> + ret = cpu->cc->sysemu_ops->asidx_from_attrs(cpu, attrs);
> assert(ret < cpu->num_ases && ret >= 0);
> }
> return ret;
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v1 4/8] cputlb: used cached CPUClass in our hot-paths
2022-08-11 15:14 ` [PATCH v1 4/8] cputlb: used cached CPUClass in our hot-paths Alex Bennée
@ 2022-08-11 17:18 ` Richard Henderson
2022-08-11 23:39 ` Philippe Mathieu-Daudé via
1 sibling, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2022-08-11 17:18 UTC (permalink / raw)
To: Alex Bennée, qemu-devel; +Cc: Paolo Bonzini
On 8/11/22 08:14, Alex Bennée wrote:
> Before: 35.912 s ± 0.168 s
> After: 35.565 s ± 0.087 s
>
> Signed-off-by: Alex Bennée<alex.bennee@linaro.org>
> ---
> accel/tcg/cputlb.c | 15 ++++++---------
> 1 file changed, 6 insertions(+), 9 deletions(-)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH for 7.1 v1 0/8] memory leaks and speed tweaks
2022-08-11 17:05 ` [PATCH for 7.1 v1 0/8] memory leaks and speed tweaks Peter Maydell
@ 2022-08-11 18:00 ` Alex Bennée
0 siblings, 0 replies; 21+ messages in thread
From: Alex Bennée @ 2022-08-11 18:00 UTC (permalink / raw)
To: Peter Maydell; +Cc: qemu-devel
Peter Maydell <peter.maydell@linaro.org> writes:
> On Thu, 11 Aug 2022 at 16:24, Alex Bennée <alex.bennee@linaro.org> wrote:
>>
>> Hi,
>>
>> I've been collecting a number of small fixes since the tree was
>> frozen. I've been mostly focusing on improving the reliability of the
>> avocado tests and seeing if there are any low hanging fruit for
>> improving the performance.
>
>> Alex Bennée (8):
>> linux-user: un-parent OBJECT(cpu) when closing thread
>> cpu: cache CPUClass in CPUState for hot code paths
>> hw/core/cpu-sysemu: used cached class in cpu_asidx_from_attrs
>> cputlb: used cached CPUClass in our hot-paths
>> ssi: cache SSIPeripheralClass to avoid GET_CLASS()
>> tests/avocado: add timeout to the aspeed tests
>> tests/avocado: apply a band aid to aspeed-evb login
>> accel/tcg: remove trace_vcpu_dstate TB checking
>
> Changes to tests/ is fine, and fixes for memory leaks
> also if they've been well tested, but stuff like the
> caching of class objects is really not 7.1 material
> at this point in the release cycle, I think.
No worries - I can drop the caching stuff for 7.1 but at least people
can test it ;-)
--
Alex Bennée
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v1 2/8] cpu: cache CPUClass in CPUState for hot code paths
2022-08-11 15:14 ` [PATCH v1 2/8] cpu: cache CPUClass in CPUState for hot code paths Alex Bennée
2022-08-11 17:17 ` Richard Henderson
@ 2022-08-11 23:37 ` Philippe Mathieu-Daudé via
1 sibling, 0 replies; 21+ messages in thread
From: Philippe Mathieu-Daudé via @ 2022-08-11 23:37 UTC (permalink / raw)
To: Alex Bennée, qemu-devel
Cc: Eduardo Habkost, Marcel Apfelbaum, Yanan Wang
On 11/8/22 17:14, Alex Bennée wrote:
> The class cast checkers are quite expensive and always on (unlike the
> dynamic case who's checks are gated by CONFIG_QOM_CAST_DEBUG). To
> avoid the overhead of repeatedly checking something which should never
> change we cache the CPUClass reference for use in the hot code paths.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
> include/hw/core/cpu.h | 9 +++++++++
> cpu.c | 9 ++++-----
> 2 files changed, 13 insertions(+), 5 deletions(-)
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v1 3/8] hw/core/cpu-sysemu: used cached class in cpu_asidx_from_attrs
2022-08-11 15:14 ` [PATCH v1 3/8] hw/core/cpu-sysemu: used cached class in cpu_asidx_from_attrs Alex Bennée
2022-08-11 17:17 ` Richard Henderson
@ 2022-08-11 23:37 ` Philippe Mathieu-Daudé via
1 sibling, 0 replies; 21+ messages in thread
From: Philippe Mathieu-Daudé via @ 2022-08-11 23:37 UTC (permalink / raw)
To: Alex Bennée, qemu-devel
On 11/8/22 17:14, Alex Bennée wrote:
> This is a heavily used function so lets avoid the cost of
> CPU_GET_CLASS. On the romulus-bmc run it has a modest effect:
>
> Before: 36.812 s ± 0.506 s
> After: 35.912 s ± 0.168 s
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
> hw/core/cpu-sysemu.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v1 4/8] cputlb: used cached CPUClass in our hot-paths
2022-08-11 15:14 ` [PATCH v1 4/8] cputlb: used cached CPUClass in our hot-paths Alex Bennée
2022-08-11 17:18 ` Richard Henderson
@ 2022-08-11 23:39 ` Philippe Mathieu-Daudé via
1 sibling, 0 replies; 21+ messages in thread
From: Philippe Mathieu-Daudé via @ 2022-08-11 23:39 UTC (permalink / raw)
To: Alex Bennée, qemu-devel; +Cc: Richard Henderson, Paolo Bonzini
On 11/8/22 17:14, Alex Bennée wrote:
> Before: 35.912 s ± 0.168 s
> After: 35.565 s ± 0.087 s
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
> accel/tcg/cputlb.c | 15 ++++++---------
> 1 file changed, 6 insertions(+), 9 deletions(-)
s/used/use/ in subject (also previous patch).
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v1 5/8] ssi: cache SSIPeripheralClass to avoid GET_CLASS()
2022-08-11 15:14 ` [PATCH v1 5/8] ssi: cache SSIPeripheralClass to avoid GET_CLASS() Alex Bennée
2022-08-11 15:30 ` Cédric Le Goater
@ 2022-08-11 23:42 ` Philippe Mathieu-Daudé via
1 sibling, 0 replies; 21+ messages in thread
From: Philippe Mathieu-Daudé via @ 2022-08-11 23:42 UTC (permalink / raw)
To: Alex Bennée, qemu-devel; +Cc: Cédric Le Goater, Alistair Francis
On 11/8/22 17:14, Alex Bennée wrote:
> Investigating why some BMC models are so slow compared to a plain ARM
> virt machines I did some profiling of:
>
> ./qemu-system-arm -M romulus-bmc -nic user \
> -drive
> file=obmc-phosphor-image-romulus.static.mtd,format=raw,if=mtd \
> -nographic -serial mon:stdio
>
> And saw that object_class_dynamic_cast_assert was dominating the
> profile times. We have a number of cases in this model of the SSI bus.
> As the class is static once the object is created we just cache it and
> use it instead of the dynamic case macros.
>
> Profiling against:
>
> ./tests/venv/bin/avocado run \
> tests/avocado/machine_aspeed.py:test_arm_ast2500_romulus_openbmc_v2_9_0
>
> Before: 35.565 s ± 0.087 s
> After: 15.713 s ± 0.287 s
Wow!
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Cc: Cédric Le Goater <clg@kaod.org>
>
> ---
> v2
> - split patches
> ---
> include/hw/ssi/ssi.h | 3 +++
> hw/ssi/ssi.c | 18 ++++++++----------
> 2 files changed, 11 insertions(+), 10 deletions(-)
> @@ -48,11 +47,11 @@ static void ssi_cs_default(void *opaque, int n, int level)
>
> static uint32_t ssi_transfer_raw_default(SSIPeripheral *dev, uint32_t val)
> {
> - SSIPeripheralClass *ssc = SSI_PERIPHERAL_GET_CLASS(dev);
> + SSIPeripheralClass *ssc = dev->spc;
>
> if ((dev->cs && ssc->cs_polarity == SSI_CS_HIGH) ||
> - (!dev->cs && ssc->cs_polarity == SSI_CS_LOW) ||
> - ssc->cs_polarity == SSI_CS_NONE) {
> + (!dev->cs && ssc->cs_polarity == SSI_CS_LOW) ||
> + ssc->cs_polarity == SSI_CS_NONE) {
Spurious de-indent?
Otherwise:
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> return ssc->transfer(dev, val);
> }
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v1 8/8] accel/tcg: remove trace_vcpu_dstate TB checking
2022-08-11 15:14 ` [PATCH v1 8/8] accel/tcg: remove trace_vcpu_dstate TB checking Alex Bennée
@ 2022-08-11 23:44 ` Philippe Mathieu-Daudé via
0 siblings, 0 replies; 21+ messages in thread
From: Philippe Mathieu-Daudé via @ 2022-08-11 23:44 UTC (permalink / raw)
To: Alex Bennée, qemu-devel; +Cc: Richard Henderson, Paolo Bonzini
On 11/8/22 17:14, Alex Bennée wrote:
> We removed the ability to do vcpu tcg tracing between:
>
> d9a6bad542 (docs: remove references to TCG tracing)
> and
> 126d4123c5 (tracing: excise the tcg related from tracetool)
>
> but missed a bunch of other code. Lets continue the clean-up by
> removing the extra field from tb_hash saving us 4 bytes per-TB and the
> additional cost of hashing/checking something that was always empty
> anyway.
>
> There remain some per-vcpu trace points which don't look as though
> they are called anywhere and the command line/QMP machinery to
> clean-up.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
> accel/tcg/tb-hash.h | 6 +++---
> include/exec/exec-all.h | 3 ---
> accel/tcg/cpu-exec.c | 6 +-----
> accel/tcg/translate-all.c | 13 ++-----------
> 4 files changed, 6 insertions(+), 22 deletions(-)
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v1 6/8] tests/avocado: add timeout to the aspeed tests
2022-08-11 15:14 ` [PATCH v1 6/8] tests/avocado: add timeout to the aspeed tests Alex Bennée
@ 2022-08-11 23:44 ` Philippe Mathieu-Daudé via
0 siblings, 0 replies; 21+ messages in thread
From: Philippe Mathieu-Daudé via @ 2022-08-11 23:44 UTC (permalink / raw)
To: Alex Bennée, qemu-devel
Cc: Cleber Rosa, Wainer dos Santos Moschetta, Beraldo Leal
On 11/8/22 17:14, Alex Bennée wrote:
> On some systems the test can hang. At least defining a timeout stops
> it from hanging forever.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
> tests/avocado/machine_aspeed.py | 2 ++
> 1 file changed, 2 insertions(+)
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2022-08-11 23:46 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-11 15:14 [PATCH for 7.1 v1 0/8] memory leaks and speed tweaks Alex Bennée
2022-08-11 15:14 ` [PATCH v1 1/8] linux-user: un-parent OBJECT(cpu) when closing thread Alex Bennée
2022-08-11 15:14 ` [PATCH v1 2/8] cpu: cache CPUClass in CPUState for hot code paths Alex Bennée
2022-08-11 17:17 ` Richard Henderson
2022-08-11 23:37 ` Philippe Mathieu-Daudé via
2022-08-11 15:14 ` [PATCH v1 3/8] hw/core/cpu-sysemu: used cached class in cpu_asidx_from_attrs Alex Bennée
2022-08-11 17:17 ` Richard Henderson
2022-08-11 23:37 ` Philippe Mathieu-Daudé via
2022-08-11 15:14 ` [PATCH v1 4/8] cputlb: used cached CPUClass in our hot-paths Alex Bennée
2022-08-11 17:18 ` Richard Henderson
2022-08-11 23:39 ` Philippe Mathieu-Daudé via
2022-08-11 15:14 ` [PATCH v1 5/8] ssi: cache SSIPeripheralClass to avoid GET_CLASS() Alex Bennée
2022-08-11 15:30 ` Cédric Le Goater
2022-08-11 23:42 ` Philippe Mathieu-Daudé via
2022-08-11 15:14 ` [PATCH v1 6/8] tests/avocado: add timeout to the aspeed tests Alex Bennée
2022-08-11 23:44 ` Philippe Mathieu-Daudé via
2022-08-11 15:14 ` [PATCH v1 7/8] tests/avocado: apply a band aid to aspeed-evb login Alex Bennée
2022-08-11 15:14 ` [PATCH v1 8/8] accel/tcg: remove trace_vcpu_dstate TB checking Alex Bennée
2022-08-11 23:44 ` Philippe Mathieu-Daudé via
2022-08-11 17:05 ` [PATCH for 7.1 v1 0/8] memory leaks and speed tweaks Peter Maydell
2022-08-11 18:00 ` Alex Bennée
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).