* [RFC PATCH 0/4] ppc: nested TCG migration (KVM-on-TCG) @ 2022-02-24 18:58 Fabiano Rosas 2022-02-24 18:58 ` [RFC PATCH 1/4] target/ppc: TCG: Migrate tb_offset and decr Fabiano Rosas ` (4 more replies) 0 siblings, 5 replies; 20+ messages in thread From: Fabiano Rosas @ 2022-02-24 18:58 UTC (permalink / raw) To: qemu-devel; +Cc: aik, danielhb413, npiggin, qemu-ppc, clg, david This series implements the migration for a TCG pseries guest running a nested KVM guest. This is just like migrating a pseries TCG guest, but with some extra state to allow a nested guest to continue to run on the destination. Unfortunately the regular TCG migration scenario (not nested) is not fully working so I cannot be entirely sure the nested migration is correct. I have included a couple of patches for the general migration case that (I think?) improve the situation a bit, but I'm still seeing hard lockups and other issues with more than 1 vcpu. This is more of an early RFC to see if anyone spots something right away. I haven't made much progress in debugging the general TCG migration case so if anyone has any input there as well I'd appreciate it. Thanks Fabiano Rosas (4): target/ppc: TCG: Migrate tb_offset and decr spapr: TCG: Migrate spapr_cpu->prod hw/ppc: Take nested guest into account when saving timebase spapr: Add KVM-on-TCG migration support hw/ppc/ppc.c | 17 +++++++- hw/ppc/spapr.c | 19 ++++++++ hw/ppc/spapr_cpu_core.c | 77 +++++++++++++++++++++++++++++++++ include/hw/ppc/spapr_cpu_core.h | 2 +- target/ppc/machine.c | 61 ++++++++++++++++++++++++++ 5 files changed, 174 insertions(+), 2 deletions(-) -- 2.34.1 ^ permalink raw reply [flat|nested] 20+ messages in thread
* [RFC PATCH 1/4] target/ppc: TCG: Migrate tb_offset and decr 2022-02-24 18:58 [RFC PATCH 0/4] ppc: nested TCG migration (KVM-on-TCG) Fabiano Rosas @ 2022-02-24 18:58 ` Fabiano Rosas 2022-02-24 20:06 ` Richard Henderson 2022-02-25 3:15 ` David Gibson 2022-02-24 18:58 ` [RFC PATCH 2/4] spapr: TCG: Migrate spapr_cpu->prod Fabiano Rosas ` (3 subsequent siblings) 4 siblings, 2 replies; 20+ messages in thread From: Fabiano Rosas @ 2022-02-24 18:58 UTC (permalink / raw) To: qemu-devel; +Cc: aik, danielhb413, npiggin, qemu-ppc, clg, david These two were not migrated so the remote end was starting with the decrementer expired. I am seeing less frequent crashes with this patch (tested with -smp 4 and -smp 32). It certainly doesn't fix all issues but it looks like it helps. Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> --- target/ppc/machine.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/target/ppc/machine.c b/target/ppc/machine.c index 1b63146ed1..7ee1984500 100644 --- a/target/ppc/machine.c +++ b/target/ppc/machine.c @@ -9,6 +9,7 @@ #include "qemu/main-loop.h" #include "kvm_ppc.h" #include "power8-pmu.h" +#include "hw/ppc/ppc.h" static void post_load_update_msr(CPUPPCState *env) { @@ -666,6 +667,18 @@ static const VMStateDescription vmstate_compat = { } }; +static const VMStateDescription vmstate_tb_env = { + .name = "cpu/env/tb_env", + .version_id = 1, + .minimum_version_id = 1, + .fields = (VMStateField[]) { + VMSTATE_INT64(tb_offset, ppc_tb_t), + VMSTATE_UINT64(decr_next, ppc_tb_t), + VMSTATE_TIMER_PTR(decr_timer, ppc_tb_t), + VMSTATE_END_OF_LIST() + } +}; + const VMStateDescription vmstate_ppc_cpu = { .name = "cpu", .version_id = 5, @@ -696,12 +709,16 @@ const VMStateDescription vmstate_ppc_cpu = { /* Backward compatible internal state */ VMSTATE_UINTTL(env.hflags_compat_nmsr, PowerPCCPU), + VMSTATE_STRUCT_POINTER_V(env.tb_env, PowerPCCPU, 1, + vmstate_tb_env, ppc_tb_t), + /* Sanity checking */ VMSTATE_UINTTL_TEST(mig_msr_mask, PowerPCCPU, cpu_pre_2_8_migration), VMSTATE_UINT64_TEST(mig_insns_flags, PowerPCCPU, cpu_pre_2_8_migration), VMSTATE_UINT64_TEST(mig_insns_flags2, PowerPCCPU, cpu_pre_2_8_migration), VMSTATE_UINT32_TEST(mig_nb_BATs, PowerPCCPU, cpu_pre_2_8_migration), + VMSTATE_END_OF_LIST() }, .subsections = (const VMStateDescription*[]) { -- 2.34.1 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 1/4] target/ppc: TCG: Migrate tb_offset and decr 2022-02-24 18:58 ` [RFC PATCH 1/4] target/ppc: TCG: Migrate tb_offset and decr Fabiano Rosas @ 2022-02-24 20:06 ` Richard Henderson 2022-02-25 3:15 ` David Gibson 1 sibling, 0 replies; 20+ messages in thread From: Richard Henderson @ 2022-02-24 20:06 UTC (permalink / raw) To: Fabiano Rosas, qemu-devel; +Cc: aik, danielhb413, npiggin, qemu-ppc, clg, david On 2/24/22 08:58, Fabiano Rosas wrote: > These two were not migrated so the remote end was starting with the > decrementer expired. > > I am seeing less frequent crashes with this patch (tested with -smp 4 > and -smp 32). It certainly doesn't fix all issues but it looks like it > helps. > > Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> > --- > target/ppc/machine.c | 17 +++++++++++++++++ > 1 file changed, 17 insertions(+) > > diff --git a/target/ppc/machine.c b/target/ppc/machine.c > index 1b63146ed1..7ee1984500 100644 > --- a/target/ppc/machine.c > +++ b/target/ppc/machine.c > @@ -9,6 +9,7 @@ > #include "qemu/main-loop.h" > #include "kvm_ppc.h" > #include "power8-pmu.h" > +#include "hw/ppc/ppc.h" > > static void post_load_update_msr(CPUPPCState *env) > { > @@ -666,6 +667,18 @@ static const VMStateDescription vmstate_compat = { > } > }; > > +static const VMStateDescription vmstate_tb_env = { > + .name = "cpu/env/tb_env", > + .version_id = 1, > + .minimum_version_id = 1, > + .fields = (VMStateField[]) { > + VMSTATE_INT64(tb_offset, ppc_tb_t), > + VMSTATE_UINT64(decr_next, ppc_tb_t), > + VMSTATE_TIMER_PTR(decr_timer, ppc_tb_t), > + VMSTATE_END_OF_LIST() > + } > +}; > + > const VMStateDescription vmstate_ppc_cpu = { > .name = "cpu", > .version_id = 5, > @@ -696,12 +709,16 @@ const VMStateDescription vmstate_ppc_cpu = { > /* Backward compatible internal state */ > VMSTATE_UINTTL(env.hflags_compat_nmsr, PowerPCCPU), > > + VMSTATE_STRUCT_POINTER_V(env.tb_env, PowerPCCPU, 1, > + vmstate_tb_env, ppc_tb_t), > + > /* Sanity checking */ > VMSTATE_UINTTL_TEST(mig_msr_mask, PowerPCCPU, cpu_pre_2_8_migration), > VMSTATE_UINT64_TEST(mig_insns_flags, PowerPCCPU, cpu_pre_2_8_migration), > VMSTATE_UINT64_TEST(mig_insns_flags2, PowerPCCPU, > cpu_pre_2_8_migration), > VMSTATE_UINT32_TEST(mig_nb_BATs, PowerPCCPU, cpu_pre_2_8_migration), > + > VMSTATE_END_OF_LIST() > }, > .subsections = (const VMStateDescription*[]) { I think the new struct should go into a subsection. r~ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 1/4] target/ppc: TCG: Migrate tb_offset and decr 2022-02-24 18:58 ` [RFC PATCH 1/4] target/ppc: TCG: Migrate tb_offset and decr Fabiano Rosas 2022-02-24 20:06 ` Richard Henderson @ 2022-02-25 3:15 ` David Gibson 2022-02-25 16:08 ` Fabiano Rosas 1 sibling, 1 reply; 20+ messages in thread From: David Gibson @ 2022-02-25 3:15 UTC (permalink / raw) To: Fabiano Rosas; +Cc: aik, danielhb413, qemu-devel, npiggin, qemu-ppc, clg [-- Attachment #1: Type: text/plain, Size: 3843 bytes --] On Thu, Feb 24, 2022 at 03:58:14PM -0300, Fabiano Rosas wrote: > These two were not migrated so the remote end was starting with the > decrementer expired. > > I am seeing less frequent crashes with this patch (tested with -smp 4 > and -smp 32). It certainly doesn't fix all issues but it looks like it > helps. > > Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Nack. This completely breaks migration compatibility for all ppc machines. In order to maintain compatibility as Richard says new info has to go into a subsection, with a needed function that allows migration of existing machine types both to and from a new qemu version. There are also some other problems. > --- > target/ppc/machine.c | 17 +++++++++++++++++ > 1 file changed, 17 insertions(+) > > diff --git a/target/ppc/machine.c b/target/ppc/machine.c > index 1b63146ed1..7ee1984500 100644 > --- a/target/ppc/machine.c > +++ b/target/ppc/machine.c > @@ -9,6 +9,7 @@ > #include "qemu/main-loop.h" > #include "kvm_ppc.h" > #include "power8-pmu.h" > +#include "hw/ppc/ppc.h" > > static void post_load_update_msr(CPUPPCState *env) > { > @@ -666,6 +667,18 @@ static const VMStateDescription vmstate_compat = { > } > }; > > +static const VMStateDescription vmstate_tb_env = { > + .name = "cpu/env/tb_env", Because spapr requires that all cpus have synchronized timebases, we migrate the timebase state from vmstate_spapr, not from each cpu individually, to make sure it stays synchronized across migration. If that's not working right, that's a bug, but it needs to be fixed there, not just plastered over with extra information transmitted at cpu level. > + .version_id = 1, > + .minimum_version_id = 1, > + .fields = (VMStateField[]) { > + VMSTATE_INT64(tb_offset, ppc_tb_t), tb_offset isn't a good thing to put directly in the migration stream. tb_offset has kind of non-obvious semantics to begin with: when we're dealing with TCG (which is your explicit case), there is no host TB, so what's this an offset from? That's why vmstate_ppc_timebase uses an explicit guest timebase value matched with a time of day in real units. Again, if there's a bug, that needs fixing there. > + VMSTATE_UINT64(decr_next, ppc_tb_t), > + VMSTATE_TIMER_PTR(decr_timer, ppc_tb_t), You're attempting to migrate decr_next and decr_timer, but not the actual DECR value, which makes me suspect that *is* being migrated. In which case you should be able to reconstruct the next tick and timer state in a post_load function on receive. If that's buggy, it needs to be fixed there. > + VMSTATE_END_OF_LIST() > + } > +}; > + > const VMStateDescription vmstate_ppc_cpu = { > .name = "cpu", > .version_id = 5, > @@ -696,12 +709,16 @@ const VMStateDescription vmstate_ppc_cpu = { > /* Backward compatible internal state */ > VMSTATE_UINTTL(env.hflags_compat_nmsr, PowerPCCPU), > > + VMSTATE_STRUCT_POINTER_V(env.tb_env, PowerPCCPU, 1, > + vmstate_tb_env, ppc_tb_t), > + > /* Sanity checking */ > VMSTATE_UINTTL_TEST(mig_msr_mask, PowerPCCPU, cpu_pre_2_8_migration), > VMSTATE_UINT64_TEST(mig_insns_flags, PowerPCCPU, cpu_pre_2_8_migration), > VMSTATE_UINT64_TEST(mig_insns_flags2, PowerPCCPU, > cpu_pre_2_8_migration), > VMSTATE_UINT32_TEST(mig_nb_BATs, PowerPCCPU, cpu_pre_2_8_migration), > + > VMSTATE_END_OF_LIST() > }, > .subsections = (const VMStateDescription*[]) { -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 1/4] target/ppc: TCG: Migrate tb_offset and decr 2022-02-25 3:15 ` David Gibson @ 2022-02-25 16:08 ` Fabiano Rosas 2022-02-28 2:04 ` David Gibson 0 siblings, 1 reply; 20+ messages in thread From: Fabiano Rosas @ 2022-02-25 16:08 UTC (permalink / raw) To: David Gibson; +Cc: danielhb413, qemu-ppc, qemu-devel, npiggin, clg David Gibson <david@gibson.dropbear.id.au> writes: > On Thu, Feb 24, 2022 at 03:58:14PM -0300, Fabiano Rosas wrote: >> These two were not migrated so the remote end was starting with the >> decrementer expired. >> >> I am seeing less frequent crashes with this patch (tested with -smp 4 >> and -smp 32). It certainly doesn't fix all issues but it looks like it >> helps. >> >> Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> > > Nack. This completely breaks migration compatibility for all ppc > machines. In order to maintain compatibility as Richard says new info > has to go into a subsection, with a needed function that allows > migration of existing machine types both to and from a new qemu > version. Ok, I'm still learning the tenets of migration. I'll give more thought to that in the next versions. > > There are also some other problems. > >> --- >> target/ppc/machine.c | 17 +++++++++++++++++ >> 1 file changed, 17 insertions(+) >> >> diff --git a/target/ppc/machine.c b/target/ppc/machine.c >> index 1b63146ed1..7ee1984500 100644 >> --- a/target/ppc/machine.c >> +++ b/target/ppc/machine.c >> @@ -9,6 +9,7 @@ >> #include "qemu/main-loop.h" >> #include "kvm_ppc.h" >> #include "power8-pmu.h" >> +#include "hw/ppc/ppc.h" >> >> static void post_load_update_msr(CPUPPCState *env) >> { >> @@ -666,6 +667,18 @@ static const VMStateDescription vmstate_compat = { >> } >> }; >> >> +static const VMStateDescription vmstate_tb_env = { >> + .name = "cpu/env/tb_env", > > Because spapr requires that all cpus have synchronized timebases, we > migrate the timebase state from vmstate_spapr, not from each cpu > individually, to make sure it stays synchronized across migration. If > that's not working right, that's a bug, but it needs to be fixed > there, not just plastered over with extra information transmitted at > cpu level. Ok, so it that what guest_timebase is about? We shouldn't need to migrate DECR individually then. >> + .version_id = 1, >> + .minimum_version_id = 1, >> + .fields = (VMStateField[]) { >> + VMSTATE_INT64(tb_offset, ppc_tb_t), > > tb_offset isn't a good thing to put directly in the migration stream. > tb_offset has kind of non-obvious semantics to begin with: when we're > dealing with TCG (which is your explicit case), there is no host TB, > so what's this an offset from? That's why vmstate_ppc_timebase uses > an explicit guest timebase value matched with a time of day in real > units. Again, if there's a bug, that needs fixing there. This should be in patch 4, but tb_offset is needed for the nested case. When we enter the nested guest we increment tb_offset with nested_tb_offset and decrement it when leaving the nested guest. So tb_offset needs to carry this value over. But maybe we could alternatively modify the nested code to just zero tb_offset when leaving the guest instead? We could then remove nested_tb_offset altogether. >> + VMSTATE_UINT64(decr_next, ppc_tb_t), >> + VMSTATE_TIMER_PTR(decr_timer, ppc_tb_t), > > You're attempting to migrate decr_next and decr_timer, but not the > actual DECR value, which makes me suspect that *is* being migrated. > In which case you should be able to reconstruct the next tick and > timer state in a post_load function on receive. If that's buggy, it > needs to be fixed there. There isn't any "actual DECR" when running TCG, is there? My understanding is that it is embedded in the QEMUTimer. Mark mentioned this years ago: "I can't see anything in __cpu_ppc_store_decr() that updates the spr[SPR_DECR] value when the decrementer register is changed" https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg01114.html Your answer was along the lines of: "we should be reconstructing the decrementer on the destination based on an offset from the timebase" https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg01373.html So if I'm getting this right, in TCG we don't touch SPR_DECR because we only effectively care about what is in the decr_timer->expire_time. And in KVM we don't migrate DECR explicitly because we use the tb_offset and timebase_save/timebase_load to ensure the tb_offset in the destination has the correct value. But timebase_save/timebase_load are not used for TCG currently. So there would be nothing transfering the current decr value to the other side. >> + VMSTATE_END_OF_LIST() >> + } >> +}; >> + >> const VMStateDescription vmstate_ppc_cpu = { >> .name = "cpu", >> .version_id = 5, >> @@ -696,12 +709,16 @@ const VMStateDescription vmstate_ppc_cpu = { >> /* Backward compatible internal state */ >> VMSTATE_UINTTL(env.hflags_compat_nmsr, PowerPCCPU), >> >> + VMSTATE_STRUCT_POINTER_V(env.tb_env, PowerPCCPU, 1, >> + vmstate_tb_env, ppc_tb_t), >> + >> /* Sanity checking */ >> VMSTATE_UINTTL_TEST(mig_msr_mask, PowerPCCPU, cpu_pre_2_8_migration), >> VMSTATE_UINT64_TEST(mig_insns_flags, PowerPCCPU, cpu_pre_2_8_migration), >> VMSTATE_UINT64_TEST(mig_insns_flags2, PowerPCCPU, >> cpu_pre_2_8_migration), >> VMSTATE_UINT32_TEST(mig_nb_BATs, PowerPCCPU, cpu_pre_2_8_migration), >> + >> VMSTATE_END_OF_LIST() >> }, >> .subsections = (const VMStateDescription*[]) { ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 1/4] target/ppc: TCG: Migrate tb_offset and decr 2022-02-25 16:08 ` Fabiano Rosas @ 2022-02-28 2:04 ` David Gibson 0 siblings, 0 replies; 20+ messages in thread From: David Gibson @ 2022-02-28 2:04 UTC (permalink / raw) To: Fabiano Rosas; +Cc: danielhb413, qemu-ppc, qemu-devel, npiggin, clg [-- Attachment #1: Type: text/plain, Size: 6902 bytes --] On Fri, Feb 25, 2022 at 01:08:10PM -0300, Fabiano Rosas wrote: > David Gibson <david@gibson.dropbear.id.au> writes: > > > On Thu, Feb 24, 2022 at 03:58:14PM -0300, Fabiano Rosas wrote: > >> These two were not migrated so the remote end was starting with the > >> decrementer expired. > >> > >> I am seeing less frequent crashes with this patch (tested with -smp 4 > >> and -smp 32). It certainly doesn't fix all issues but it looks like it > >> helps. > >> > >> Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> > > > > Nack. This completely breaks migration compatibility for all ppc > > machines. In order to maintain compatibility as Richard says new info > > has to go into a subsection, with a needed function that allows > > migration of existing machine types both to and from a new qemu > > version. > > Ok, I'm still learning the tenets of migration. I'll give more thought > to that in the next versions. Fair enough. I'd had a very frustrating week for entirely unrelated reasons, so I was probably a bit unfairly critical. > > There are also some other problems. > > > >> --- > >> target/ppc/machine.c | 17 +++++++++++++++++ > >> 1 file changed, 17 insertions(+) > >> > >> diff --git a/target/ppc/machine.c b/target/ppc/machine.c > >> index 1b63146ed1..7ee1984500 100644 > >> --- a/target/ppc/machine.c > >> +++ b/target/ppc/machine.c > >> @@ -9,6 +9,7 @@ > >> #include "qemu/main-loop.h" > >> #include "kvm_ppc.h" > >> #include "power8-pmu.h" > >> +#include "hw/ppc/ppc.h" > >> > >> static void post_load_update_msr(CPUPPCState *env) > >> { > >> @@ -666,6 +667,18 @@ static const VMStateDescription vmstate_compat = { > >> } > >> }; > >> > >> +static const VMStateDescription vmstate_tb_env = { > >> + .name = "cpu/env/tb_env", > > > > Because spapr requires that all cpus have synchronized timebases, we > > migrate the timebase state from vmstate_spapr, not from each cpu > > individually, to make sure it stays synchronized across migration. If > > that's not working right, that's a bug, but it needs to be fixed > > there, not just plastered over with extra information transmitted at > > cpu level. > > Ok, so it that what guest_timebase is about? We shouldn't need to > migrate DECR individually then. Probably not. Unlike the TB there is obviously per-cpu state that has to be migrated for DECR, and I'm not immediately sure how that's handled right now. I think we would be a lot more broken if we weren't currently migrating the DECRs in at least most ordinary circumstances. > >> + .version_id = 1, > >> + .minimum_version_id = 1, > >> + .fields = (VMStateField[]) { > >> + VMSTATE_INT64(tb_offset, ppc_tb_t), > > > > tb_offset isn't a good thing to put directly in the migration stream. > > tb_offset has kind of non-obvious semantics to begin with: when we're > > dealing with TCG (which is your explicit case), there is no host TB, > > so what's this an offset from? That's why vmstate_ppc_timebase uses > > an explicit guest timebase value matched with a time of day in real > > units. Again, if there's a bug, that needs fixing there. > > This should be in patch 4, but tb_offset is needed for the nested > case. When we enter the nested guest we increment tb_offset with > nested_tb_offset and decrement it when leaving the nested guest. So > tb_offset needs to carry this value over. Yeah.. that's not really going to work. We'll have to instead reconstruct tb_offset from the real-time based stuff we have, then use that on the destination where we need it. > But maybe we could alternatively modify the nested code to just zero > tb_offset when leaving the guest instead? We could then remove > nested_tb_offset altogether. Uh.. maybe? I don't remember how the nested implementation works well enough to quickly assess if that will work. > > >> + VMSTATE_UINT64(decr_next, ppc_tb_t), > >> + VMSTATE_TIMER_PTR(decr_timer, ppc_tb_t), > > > > You're attempting to migrate decr_next and decr_timer, but not the > > actual DECR value, which makes me suspect that *is* being migrated. > > In which case you should be able to reconstruct the next tick and > > timer state in a post_load function on receive. If that's buggy, it > > needs to be fixed there. > > There isn't any "actual DECR" when running TCG, is there? My > understanding is that it is embedded in the QEMUTimer. > > Mark mentioned this years ago: > > "I can't see anything in __cpu_ppc_store_decr() that > updates the spr[SPR_DECR] value when the decrementer register is > changed" > > https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg01114.html > > Your answer was along the lines of: > > "we should be reconstructing the decrementer on > the destination based on an offset from the timebase" > > https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg01373.html > > So if I'm getting this right, in TCG we don't touch SPR_DECR because we > only effectively care about what is in the decr_timer->expire_time. > > And in KVM we don't migrate DECR explicitly because we use the tb_offset > and timebase_save/timebase_load to ensure the tb_offset in the > destination has the correct value. > > But timebase_save/timebase_load are not used for TCG currently. So there > would be nothing transfering the current decr value to the other side. Ah.. good points. What we need to make sure of is that all the values which spr_read_decr / gen_helper_load_decr needs to make it's calculation are correctly migrated. > >> + VMSTATE_END_OF_LIST() > >> + } > >> +}; > >> + > >> const VMStateDescription vmstate_ppc_cpu = { > >> .name = "cpu", > >> .version_id = 5, > >> @@ -696,12 +709,16 @@ const VMStateDescription vmstate_ppc_cpu = { > >> /* Backward compatible internal state */ > >> VMSTATE_UINTTL(env.hflags_compat_nmsr, PowerPCCPU), > >> > >> + VMSTATE_STRUCT_POINTER_V(env.tb_env, PowerPCCPU, 1, > >> + vmstate_tb_env, ppc_tb_t), > >> + > >> /* Sanity checking */ > >> VMSTATE_UINTTL_TEST(mig_msr_mask, PowerPCCPU, cpu_pre_2_8_migration), > >> VMSTATE_UINT64_TEST(mig_insns_flags, PowerPCCPU, cpu_pre_2_8_migration), > >> VMSTATE_UINT64_TEST(mig_insns_flags2, PowerPCCPU, > >> cpu_pre_2_8_migration), > >> VMSTATE_UINT32_TEST(mig_nb_BATs, PowerPCCPU, cpu_pre_2_8_migration), > >> + > >> VMSTATE_END_OF_LIST() > >> }, > >> .subsections = (const VMStateDescription*[]) { > -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* [RFC PATCH 2/4] spapr: TCG: Migrate spapr_cpu->prod 2022-02-24 18:58 [RFC PATCH 0/4] ppc: nested TCG migration (KVM-on-TCG) Fabiano Rosas 2022-02-24 18:58 ` [RFC PATCH 1/4] target/ppc: TCG: Migrate tb_offset and decr Fabiano Rosas @ 2022-02-24 18:58 ` Fabiano Rosas 2022-02-25 3:17 ` David Gibson 2022-02-24 18:58 ` [RFC PATCH 3/4] hw/ppc: Take nested guest into account when saving timebase Fabiano Rosas ` (2 subsequent siblings) 4 siblings, 1 reply; 20+ messages in thread From: Fabiano Rosas @ 2022-02-24 18:58 UTC (permalink / raw) To: qemu-devel; +Cc: aik, danielhb413, npiggin, qemu-ppc, clg, david I'm seeing some stack traces in the migrated guest going through cede and some hangs at the plpar_hcall_norets so let's make sure everything related to cede/prod is being migrated just in case. Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> --- hw/ppc/spapr_cpu_core.c | 1 + include/hw/ppc/spapr_cpu_core.h | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c index ed84713960..efda7730f1 100644 --- a/hw/ppc/spapr_cpu_core.c +++ b/hw/ppc/spapr_cpu_core.c @@ -179,6 +179,7 @@ static const VMStateDescription vmstate_spapr_cpu_state = { .version_id = 1, .minimum_version_id = 1, .fields = (VMStateField[]) { + VMSTATE_BOOL(prod, SpaprCpuState), VMSTATE_END_OF_LIST() }, .subsections = (const VMStateDescription * []) { diff --git a/include/hw/ppc/spapr_cpu_core.h b/include/hw/ppc/spapr_cpu_core.h index b560514560..2772689c84 100644 --- a/include/hw/ppc/spapr_cpu_core.h +++ b/include/hw/ppc/spapr_cpu_core.h @@ -45,7 +45,7 @@ typedef struct SpaprCpuState { uint64_t vpa_addr; uint64_t slb_shadow_addr, slb_shadow_size; uint64_t dtl_addr, dtl_size; - bool prod; /* not migrated, only used to improve dispatch latencies */ + bool prod; struct ICPState *icp; struct XiveTCTX *tctx; -- 2.34.1 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 2/4] spapr: TCG: Migrate spapr_cpu->prod 2022-02-24 18:58 ` [RFC PATCH 2/4] spapr: TCG: Migrate spapr_cpu->prod Fabiano Rosas @ 2022-02-25 3:17 ` David Gibson 2022-02-25 16:08 ` Fabiano Rosas 0 siblings, 1 reply; 20+ messages in thread From: David Gibson @ 2022-02-25 3:17 UTC (permalink / raw) To: Fabiano Rosas; +Cc: aik, danielhb413, qemu-devel, npiggin, qemu-ppc, clg [-- Attachment #1: Type: text/plain, Size: 2103 bytes --] On Thu, Feb 24, 2022 at 03:58:15PM -0300, Fabiano Rosas wrote: > I'm seeing some stack traces in the migrated guest going through cede > and some hangs at the plpar_hcall_norets so let's make sure everything > related to cede/prod is being migrated just in case. This is a poor approach in general. Migration becomes even harder to maintain than it already is if you don't pare down the set of migrated data to something minimal and non-redundant. If you want to migrate prod, you have to give a case for why you *need* it, not "just in case". Also, you have to put this in a subsection with a needed function in order not to break compatibility. > > Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> > --- > hw/ppc/spapr_cpu_core.c | 1 + > include/hw/ppc/spapr_cpu_core.h | 2 +- > 2 files changed, 2 insertions(+), 1 deletion(-) > > diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c > index ed84713960..efda7730f1 100644 > --- a/hw/ppc/spapr_cpu_core.c > +++ b/hw/ppc/spapr_cpu_core.c > @@ -179,6 +179,7 @@ static const VMStateDescription vmstate_spapr_cpu_state = { > .version_id = 1, > .minimum_version_id = 1, > .fields = (VMStateField[]) { > + VMSTATE_BOOL(prod, SpaprCpuState), > VMSTATE_END_OF_LIST() > }, > .subsections = (const VMStateDescription * []) { > diff --git a/include/hw/ppc/spapr_cpu_core.h b/include/hw/ppc/spapr_cpu_core.h > index b560514560..2772689c84 100644 > --- a/include/hw/ppc/spapr_cpu_core.h > +++ b/include/hw/ppc/spapr_cpu_core.h > @@ -45,7 +45,7 @@ typedef struct SpaprCpuState { > uint64_t vpa_addr; > uint64_t slb_shadow_addr, slb_shadow_size; > uint64_t dtl_addr, dtl_size; > - bool prod; /* not migrated, only used to improve dispatch latencies */ > + bool prod; > struct ICPState *icp; > struct XiveTCTX *tctx; > -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 2/4] spapr: TCG: Migrate spapr_cpu->prod 2022-02-25 3:17 ` David Gibson @ 2022-02-25 16:08 ` Fabiano Rosas 0 siblings, 0 replies; 20+ messages in thread From: Fabiano Rosas @ 2022-02-25 16:08 UTC (permalink / raw) To: David Gibson; +Cc: danielhb413, qemu-ppc, qemu-devel, npiggin, clg David Gibson <david@gibson.dropbear.id.au> writes: > On Thu, Feb 24, 2022 at 03:58:15PM -0300, Fabiano Rosas wrote: >> I'm seeing some stack traces in the migrated guest going through cede >> and some hangs at the plpar_hcall_norets so let's make sure everything >> related to cede/prod is being migrated just in case. > > This is a poor approach in general. Migration becomes even harder to > maintain than it already is if you don't pare down the set of migrated > data to something minimal and non-redundant. > > If you want to migrate prod, you have to give a case for why you > *need* it, not "just in case". Ah yes, I'm not actually trying to merge stuff without a good explanation. I haven't even delineated the problem properly. But I know little about migration so I need to do some probing, bear with me please. > Also, you have to put this in a subsection with a needed function in > order not to break compatibility. > >> >> Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> >> --- >> hw/ppc/spapr_cpu_core.c | 1 + >> include/hw/ppc/spapr_cpu_core.h | 2 +- >> 2 files changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c >> index ed84713960..efda7730f1 100644 >> --- a/hw/ppc/spapr_cpu_core.c >> +++ b/hw/ppc/spapr_cpu_core.c >> @@ -179,6 +179,7 @@ static const VMStateDescription vmstate_spapr_cpu_state = { >> .version_id = 1, >> .minimum_version_id = 1, >> .fields = (VMStateField[]) { >> + VMSTATE_BOOL(prod, SpaprCpuState), >> VMSTATE_END_OF_LIST() >> }, >> .subsections = (const VMStateDescription * []) { >> diff --git a/include/hw/ppc/spapr_cpu_core.h b/include/hw/ppc/spapr_cpu_core.h >> index b560514560..2772689c84 100644 >> --- a/include/hw/ppc/spapr_cpu_core.h >> +++ b/include/hw/ppc/spapr_cpu_core.h >> @@ -45,7 +45,7 @@ typedef struct SpaprCpuState { >> uint64_t vpa_addr; >> uint64_t slb_shadow_addr, slb_shadow_size; >> uint64_t dtl_addr, dtl_size; >> - bool prod; /* not migrated, only used to improve dispatch latencies */ >> + bool prod; >> struct ICPState *icp; >> struct XiveTCTX *tctx; >> ^ permalink raw reply [flat|nested] 20+ messages in thread
* [RFC PATCH 3/4] hw/ppc: Take nested guest into account when saving timebase 2022-02-24 18:58 [RFC PATCH 0/4] ppc: nested TCG migration (KVM-on-TCG) Fabiano Rosas 2022-02-24 18:58 ` [RFC PATCH 1/4] target/ppc: TCG: Migrate tb_offset and decr Fabiano Rosas 2022-02-24 18:58 ` [RFC PATCH 2/4] spapr: TCG: Migrate spapr_cpu->prod Fabiano Rosas @ 2022-02-24 18:58 ` Fabiano Rosas 2022-02-25 3:21 ` David Gibson 2022-02-24 18:58 ` [RFC PATCH 4/4] spapr: Add KVM-on-TCG migration support Fabiano Rosas 2022-02-24 21:00 ` [RFC PATCH 0/4] ppc: nested TCG migration (KVM-on-TCG) Mark Cave-Ayland 4 siblings, 1 reply; 20+ messages in thread From: Fabiano Rosas @ 2022-02-24 18:58 UTC (permalink / raw) To: qemu-devel; +Cc: aik, danielhb413, npiggin, qemu-ppc, clg, david When saving the guest "timebase" we look to the first_cpu for its tb_offset. If that CPU happens to be running a nested guest at this time, the tb_offset will have the nested guest value. This was caught by code inspection. Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> --- hw/ppc/ppc.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c index 9e99625ea9..093cd87014 100644 --- a/hw/ppc/ppc.c +++ b/hw/ppc/ppc.c @@ -36,6 +36,7 @@ #include "kvm_ppc.h" #include "migration/vmstate.h" #include "trace.h" +#include "hw/ppc/spapr_cpu_core.h" static void cpu_ppc_tb_stop (CPUPPCState *env); static void cpu_ppc_tb_start (CPUPPCState *env); @@ -961,19 +962,33 @@ static void timebase_save(PPCTimebase *tb) { uint64_t ticks = cpu_get_host_ticks(); PowerPCCPU *first_ppc_cpu = POWERPC_CPU(first_cpu); + int64_t tb_offset; if (!first_ppc_cpu->env.tb_env) { error_report("No timebase object"); return; } + tb_offset = first_ppc_cpu->env.tb_env->tb_offset; + + if (first_ppc_cpu->vhyp && vhyp_cpu_in_nested(first_ppc_cpu)) { + SpaprCpuState *spapr_cpu = spapr_cpu_state(first_ppc_cpu); + + /* + * If the first_cpu happens to be running a nested guest at + * this time, tb_env->tb_offset will contain the nested guest + * offset. + */ + tb_offset -= spapr_cpu->nested_tb_offset; + } + /* not used anymore, we keep it for compatibility */ tb->time_of_the_day_ns = qemu_clock_get_ns(QEMU_CLOCK_HOST); /* * tb_offset is only expected to be changed by QEMU so * there is no need to update it from KVM here */ - tb->guest_timebase = ticks + first_ppc_cpu->env.tb_env->tb_offset; + tb->guest_timebase = ticks + tb_offset; tb->runstate_paused = runstate_check(RUN_STATE_PAUSED) || runstate_check(RUN_STATE_SAVE_VM); -- 2.34.1 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 3/4] hw/ppc: Take nested guest into account when saving timebase 2022-02-24 18:58 ` [RFC PATCH 3/4] hw/ppc: Take nested guest into account when saving timebase Fabiano Rosas @ 2022-02-25 3:21 ` David Gibson 2022-02-25 16:08 ` Fabiano Rosas 0 siblings, 1 reply; 20+ messages in thread From: David Gibson @ 2022-02-25 3:21 UTC (permalink / raw) To: Fabiano Rosas; +Cc: aik, danielhb413, qemu-devel, npiggin, qemu-ppc, clg [-- Attachment #1: Type: text/plain, Size: 2594 bytes --] On Thu, Feb 24, 2022 at 03:58:16PM -0300, Fabiano Rosas wrote: > When saving the guest "timebase" we look to the first_cpu for its > tb_offset. If that CPU happens to be running a nested guest at this > time, the tb_offset will have the nested guest value. > > This was caught by code inspection. This doesn't seem right. Isn't the real problem that nested_tb_offset isn't being migrated? If you migrate that, shouldn't everything be fixed up when the L1 cpu leaves the nested guest on the destination host? > > Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> > --- > hw/ppc/ppc.c | 17 ++++++++++++++++- > 1 file changed, 16 insertions(+), 1 deletion(-) > > diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c > index 9e99625ea9..093cd87014 100644 > --- a/hw/ppc/ppc.c > +++ b/hw/ppc/ppc.c > @@ -36,6 +36,7 @@ > #include "kvm_ppc.h" > #include "migration/vmstate.h" > #include "trace.h" > +#include "hw/ppc/spapr_cpu_core.h" > > static void cpu_ppc_tb_stop (CPUPPCState *env); > static void cpu_ppc_tb_start (CPUPPCState *env); > @@ -961,19 +962,33 @@ static void timebase_save(PPCTimebase *tb) > { > uint64_t ticks = cpu_get_host_ticks(); > PowerPCCPU *first_ppc_cpu = POWERPC_CPU(first_cpu); > + int64_t tb_offset; > > if (!first_ppc_cpu->env.tb_env) { > error_report("No timebase object"); > return; > } > > + tb_offset = first_ppc_cpu->env.tb_env->tb_offset; > + > + if (first_ppc_cpu->vhyp && vhyp_cpu_in_nested(first_ppc_cpu)) { > + SpaprCpuState *spapr_cpu = spapr_cpu_state(first_ppc_cpu); > + > + /* > + * If the first_cpu happens to be running a nested guest at > + * this time, tb_env->tb_offset will contain the nested guest > + * offset. > + */ > + tb_offset -= spapr_cpu->nested_tb_offset; > + } > + > /* not used anymore, we keep it for compatibility */ > tb->time_of_the_day_ns = qemu_clock_get_ns(QEMU_CLOCK_HOST); > /* > * tb_offset is only expected to be changed by QEMU so > * there is no need to update it from KVM here > */ > - tb->guest_timebase = ticks + first_ppc_cpu->env.tb_env->tb_offset; > + tb->guest_timebase = ticks + tb_offset; > > tb->runstate_paused = > runstate_check(RUN_STATE_PAUSED) || runstate_check(RUN_STATE_SAVE_VM); -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 3/4] hw/ppc: Take nested guest into account when saving timebase 2022-02-25 3:21 ` David Gibson @ 2022-02-25 16:08 ` Fabiano Rosas 2022-02-28 2:06 ` David Gibson 0 siblings, 1 reply; 20+ messages in thread From: Fabiano Rosas @ 2022-02-25 16:08 UTC (permalink / raw) To: David Gibson; +Cc: danielhb413, qemu-ppc, qemu-devel, npiggin, clg David Gibson <david@gibson.dropbear.id.au> writes: > On Thu, Feb 24, 2022 at 03:58:16PM -0300, Fabiano Rosas wrote: >> When saving the guest "timebase" we look to the first_cpu for its >> tb_offset. If that CPU happens to be running a nested guest at this >> time, the tb_offset will have the nested guest value. >> >> This was caught by code inspection. > > This doesn't seem right. Isn't the real problem that nested_tb_offset > isn't being migrated? If you migrate that, shouldn't everything be > fixed up when the L1 cpu leaves the nested guest on the destination > host? This uses first_cpu, so after we introduced the nested guest code, this value has become dependent on what the first_cpu is doing. If it happens to be running the nested guest when we migrate, then guest_timebase here will have a different value from the one it would have if we had used another cpu's tb_offset. Now, we might have a bug or at least an inefficiency here because timebase_load is never called for the TCG migration case. The cpu_ppc_clock_vm_state_change callback is only registered for KVM. So in TCG we call timebase_save during pre_save, migrate the guest_timebase, but never do anything with it on the remote side. >> >> Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> >> --- >> hw/ppc/ppc.c | 17 ++++++++++++++++- >> 1 file changed, 16 insertions(+), 1 deletion(-) >> >> diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c >> index 9e99625ea9..093cd87014 100644 >> --- a/hw/ppc/ppc.c >> +++ b/hw/ppc/ppc.c >> @@ -36,6 +36,7 @@ >> #include "kvm_ppc.h" >> #include "migration/vmstate.h" >> #include "trace.h" >> +#include "hw/ppc/spapr_cpu_core.h" >> >> static void cpu_ppc_tb_stop (CPUPPCState *env); >> static void cpu_ppc_tb_start (CPUPPCState *env); >> @@ -961,19 +962,33 @@ static void timebase_save(PPCTimebase *tb) >> { >> uint64_t ticks = cpu_get_host_ticks(); >> PowerPCCPU *first_ppc_cpu = POWERPC_CPU(first_cpu); >> + int64_t tb_offset; >> >> if (!first_ppc_cpu->env.tb_env) { >> error_report("No timebase object"); >> return; >> } >> >> + tb_offset = first_ppc_cpu->env.tb_env->tb_offset; >> + >> + if (first_ppc_cpu->vhyp && vhyp_cpu_in_nested(first_ppc_cpu)) { >> + SpaprCpuState *spapr_cpu = spapr_cpu_state(first_ppc_cpu); >> + >> + /* >> + * If the first_cpu happens to be running a nested guest at >> + * this time, tb_env->tb_offset will contain the nested guest >> + * offset. >> + */ >> + tb_offset -= spapr_cpu->nested_tb_offset; >> + } >> + >> /* not used anymore, we keep it for compatibility */ >> tb->time_of_the_day_ns = qemu_clock_get_ns(QEMU_CLOCK_HOST); >> /* >> * tb_offset is only expected to be changed by QEMU so >> * there is no need to update it from KVM here >> */ >> - tb->guest_timebase = ticks + first_ppc_cpu->env.tb_env->tb_offset; >> + tb->guest_timebase = ticks + tb_offset; >> >> tb->runstate_paused = >> runstate_check(RUN_STATE_PAUSED) || runstate_check(RUN_STATE_SAVE_VM); ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 3/4] hw/ppc: Take nested guest into account when saving timebase 2022-02-25 16:08 ` Fabiano Rosas @ 2022-02-28 2:06 ` David Gibson 0 siblings, 0 replies; 20+ messages in thread From: David Gibson @ 2022-02-28 2:06 UTC (permalink / raw) To: Fabiano Rosas; +Cc: danielhb413, qemu-ppc, qemu-devel, npiggin, clg [-- Attachment #1: Type: text/plain, Size: 3760 bytes --] On Fri, Feb 25, 2022 at 01:08:46PM -0300, Fabiano Rosas wrote: > David Gibson <david@gibson.dropbear.id.au> writes: > > > On Thu, Feb 24, 2022 at 03:58:16PM -0300, Fabiano Rosas wrote: > >> When saving the guest "timebase" we look to the first_cpu for its > >> tb_offset. If that CPU happens to be running a nested guest at this > >> time, the tb_offset will have the nested guest value. > >> > >> This was caught by code inspection. > > > > This doesn't seem right. Isn't the real problem that nested_tb_offset > > isn't being migrated? If you migrate that, shouldn't everything be > > fixed up when the L1 cpu leaves the nested guest on the destination > > host? > > This uses first_cpu, so after we introduced the nested guest code, this > value has become dependent on what the first_cpu is doing. If it happens > to be running the nested guest when we migrate, then guest_timebase here > will have a different value from the one it would have if we had used > another cpu's tb_offset. Oh, good point. Yes, you probably do need this. > Now, we might have a bug or at least an inefficiency here because > timebase_load is never called for the TCG migration case. The > cpu_ppc_clock_vm_state_change callback is only registered for KVM. So in > TCG we call timebase_save during pre_save, migrate the guest_timebase, > but never do anything with it on the remote side. Oh.. yeah.. that sounds probably buggy. Unless the logic you need is done at the time of read TB or read DECR. > > >> > >> Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> > >> --- > >> hw/ppc/ppc.c | 17 ++++++++++++++++- > >> 1 file changed, 16 insertions(+), 1 deletion(-) > >> > >> diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c > >> index 9e99625ea9..093cd87014 100644 > >> --- a/hw/ppc/ppc.c > >> +++ b/hw/ppc/ppc.c > >> @@ -36,6 +36,7 @@ > >> #include "kvm_ppc.h" > >> #include "migration/vmstate.h" > >> #include "trace.h" > >> +#include "hw/ppc/spapr_cpu_core.h" > >> > >> static void cpu_ppc_tb_stop (CPUPPCState *env); > >> static void cpu_ppc_tb_start (CPUPPCState *env); > >> @@ -961,19 +962,33 @@ static void timebase_save(PPCTimebase *tb) > >> { > >> uint64_t ticks = cpu_get_host_ticks(); > >> PowerPCCPU *first_ppc_cpu = POWERPC_CPU(first_cpu); > >> + int64_t tb_offset; > >> > >> if (!first_ppc_cpu->env.tb_env) { > >> error_report("No timebase object"); > >> return; > >> } > >> > >> + tb_offset = first_ppc_cpu->env.tb_env->tb_offset; > >> + > >> + if (first_ppc_cpu->vhyp && vhyp_cpu_in_nested(first_ppc_cpu)) { > >> + SpaprCpuState *spapr_cpu = spapr_cpu_state(first_ppc_cpu); > >> + > >> + /* > >> + * If the first_cpu happens to be running a nested guest at > >> + * this time, tb_env->tb_offset will contain the nested guest > >> + * offset. > >> + */ > >> + tb_offset -= spapr_cpu->nested_tb_offset; > >> + } > >> + > >> /* not used anymore, we keep it for compatibility */ > >> tb->time_of_the_day_ns = qemu_clock_get_ns(QEMU_CLOCK_HOST); > >> /* > >> * tb_offset is only expected to be changed by QEMU so > >> * there is no need to update it from KVM here > >> */ > >> - tb->guest_timebase = ticks + first_ppc_cpu->env.tb_env->tb_offset; > >> + tb->guest_timebase = ticks + tb_offset; > >> > >> tb->runstate_paused = > >> runstate_check(RUN_STATE_PAUSED) || runstate_check(RUN_STATE_SAVE_VM); > -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* [RFC PATCH 4/4] spapr: Add KVM-on-TCG migration support 2022-02-24 18:58 [RFC PATCH 0/4] ppc: nested TCG migration (KVM-on-TCG) Fabiano Rosas ` (2 preceding siblings ...) 2022-02-24 18:58 ` [RFC PATCH 3/4] hw/ppc: Take nested guest into account when saving timebase Fabiano Rosas @ 2022-02-24 18:58 ` Fabiano Rosas 2022-02-25 0:51 ` Nicholas Piggin 2022-02-25 3:42 ` David Gibson 2022-02-24 21:00 ` [RFC PATCH 0/4] ppc: nested TCG migration (KVM-on-TCG) Mark Cave-Ayland 4 siblings, 2 replies; 20+ messages in thread From: Fabiano Rosas @ 2022-02-24 18:58 UTC (permalink / raw) To: qemu-devel; +Cc: aik, danielhb413, npiggin, qemu-ppc, clg, david This adds migration support for TCG pseries machines running a KVM-HV guest. The state that needs to be migrated is: - the nested PTCR value; - the in_nested flag; - the nested_tb_offset. - the saved host CPUPPCState structure; Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> --- (this migrates just fine with L2 running stress and 1 VCPU in L1. With 32 VCPUs in L1 there's crashes which I still don't understand. They might be related to L1 migration being flaky right now) --- hw/ppc/spapr.c | 19 +++++++++++ hw/ppc/spapr_cpu_core.c | 76 +++++++++++++++++++++++++++++++++++++++++ target/ppc/machine.c | 44 ++++++++++++++++++++++++ 3 files changed, 139 insertions(+) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index f0b75b22bb..6e87c515db 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -1934,6 +1934,13 @@ static bool spapr_patb_entry_needed(void *opaque) return !!spapr->patb_entry; } +static bool spapr_nested_ptcr_needed(void *opaque) +{ + SpaprMachineState *spapr = opaque; + + return !!spapr->nested_ptcr; +} + static const VMStateDescription vmstate_spapr_patb_entry = { .name = "spapr_patb_entry", .version_id = 1, @@ -1945,6 +1952,17 @@ static const VMStateDescription vmstate_spapr_patb_entry = { }, }; +static const VMStateDescription vmstate_spapr_nested_ptcr = { + .name = "spapr_nested_ptcr", + .version_id = 1, + .minimum_version_id = 1, + .needed = spapr_nested_ptcr_needed, + .fields = (VMStateField[]) { + VMSTATE_UINT64(nested_ptcr, SpaprMachineState), + VMSTATE_END_OF_LIST() + }, +}; + static bool spapr_irq_map_needed(void *opaque) { SpaprMachineState *spapr = opaque; @@ -2069,6 +2087,7 @@ static const VMStateDescription vmstate_spapr = { &vmstate_spapr_cap_fwnmi, &vmstate_spapr_fwnmi, &vmstate_spapr_cap_rpt_invalidate, + &vmstate_spapr_nested_ptcr, NULL } }; diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c index efda7730f1..3ec13c0660 100644 --- a/hw/ppc/spapr_cpu_core.c +++ b/hw/ppc/spapr_cpu_core.c @@ -25,6 +25,7 @@ #include "sysemu/reset.h" #include "sysemu/hw_accel.h" #include "qemu/error-report.h" +#include "migration/cpu.h" static void spapr_reset_vcpu(PowerPCCPU *cpu) { @@ -174,6 +175,80 @@ static const VMStateDescription vmstate_spapr_cpu_vpa = { } }; +static bool nested_needed(void *opaque) +{ + SpaprCpuState *spapr_cpu = opaque; + + return spapr_cpu->in_nested; +} + +static int nested_state_pre_save(void *opaque) +{ + CPUPPCState *env = opaque; + + env->spr[SPR_LR] = env->lr; + env->spr[SPR_CTR] = env->ctr; + env->spr[SPR_XER] = cpu_read_xer(env); + env->spr[SPR_CFAR] = env->cfar; + + return 0; +} + +static int nested_state_post_load(void *opaque, int version_id) +{ + CPUPPCState *env = opaque; + + env->lr = env->spr[SPR_LR]; + env->ctr = env->spr[SPR_CTR]; + cpu_write_xer(env, env->spr[SPR_XER]); + env->cfar = env->spr[SPR_CFAR]; + + return 0; +} + +static const VMStateDescription vmstate_nested_host_state = { + .name = "spapr_nested_host_state", + .version_id = 1, + .minimum_version_id = 1, + .pre_save = nested_state_pre_save, + .post_load = nested_state_post_load, + .fields = (VMStateField[]) { + VMSTATE_UINTTL_ARRAY(gpr, CPUPPCState, 32), + VMSTATE_UINTTL_ARRAY(spr, CPUPPCState, 1024), + VMSTATE_UINT32_ARRAY(crf, CPUPPCState, 8), + VMSTATE_UINTTL(nip, CPUPPCState), + VMSTATE_UINTTL(msr, CPUPPCState), + VMSTATE_END_OF_LIST() + } +}; + +static int nested_cpu_pre_load(void *opaque) +{ + SpaprCpuState *spapr_cpu = opaque; + + spapr_cpu->nested_host_state = g_try_malloc(sizeof(CPUPPCState)); + if (!spapr_cpu->nested_host_state) { + return -1; + } + + return 0; +} + +static const VMStateDescription vmstate_spapr_cpu_nested = { + .name = "spapr_cpu/nested", + .version_id = 1, + .minimum_version_id = 1, + .needed = nested_needed, + .pre_load = nested_cpu_pre_load, + .fields = (VMStateField[]) { + VMSTATE_BOOL(in_nested, SpaprCpuState), + VMSTATE_INT64(nested_tb_offset, SpaprCpuState), + VMSTATE_STRUCT_POINTER_V(nested_host_state, SpaprCpuState, 1, + vmstate_nested_host_state, CPUPPCState), + VMSTATE_END_OF_LIST() + }, +}; + static const VMStateDescription vmstate_spapr_cpu_state = { .name = "spapr_cpu", .version_id = 1, @@ -184,6 +259,7 @@ static const VMStateDescription vmstate_spapr_cpu_state = { }, .subsections = (const VMStateDescription * []) { &vmstate_spapr_cpu_vpa, + &vmstate_spapr_cpu_nested, NULL } }; diff --git a/target/ppc/machine.c b/target/ppc/machine.c index 7ee1984500..ae09b1bcfe 100644 --- a/target/ppc/machine.c +++ b/target/ppc/machine.c @@ -10,6 +10,7 @@ #include "kvm_ppc.h" #include "power8-pmu.h" #include "hw/ppc/ppc.h" +#include "hw/ppc/spapr_cpu_core.h" static void post_load_update_msr(CPUPPCState *env) { @@ -679,6 +680,48 @@ static const VMStateDescription vmstate_tb_env = { } }; +static const VMStateDescription vmstate_hdecr = { + .name = "cpu/hdecr", + .version_id = 1, + .minimum_version_id = 1, + .fields = (VMStateField[]) { + VMSTATE_UINT64(hdecr_next, ppc_tb_t), + VMSTATE_TIMER_PTR(hdecr_timer, ppc_tb_t), + VMSTATE_END_OF_LIST() + } +}; + +static bool nested_needed(void *opaque) +{ + PowerPCCPU *cpu = opaque; + SpaprCpuState *spapr_cpu = spapr_cpu_state(cpu); + + return spapr_cpu->in_nested; +} + +static int nested_pre_load(void *opaque) +{ + PowerPCCPU *cpu = opaque; + CPUPPCState *env = &cpu->env; + + cpu_ppc_hdecr_init(env); + + return 0; +} + +static const VMStateDescription vmstate_nested = { + .name = "cpu/nested-guest", + .version_id = 1, + .minimum_version_id = 1, + .needed = nested_needed, + .pre_load = nested_pre_load, + .fields = (VMStateField[]) { + VMSTATE_STRUCT_POINTER_V(env.tb_env, PowerPCCPU, 1, + vmstate_hdecr, ppc_tb_t), + VMSTATE_END_OF_LIST() + } +}; + const VMStateDescription vmstate_ppc_cpu = { .name = "cpu", .version_id = 5, @@ -734,6 +777,7 @@ const VMStateDescription vmstate_ppc_cpu = { &vmstate_tlbemb, &vmstate_tlbmas, &vmstate_compat, + &vmstate_nested, NULL } }; -- 2.34.1 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 4/4] spapr: Add KVM-on-TCG migration support 2022-02-24 18:58 ` [RFC PATCH 4/4] spapr: Add KVM-on-TCG migration support Fabiano Rosas @ 2022-02-25 0:51 ` Nicholas Piggin 2022-02-25 3:42 ` David Gibson 1 sibling, 0 replies; 20+ messages in thread From: Nicholas Piggin @ 2022-02-25 0:51 UTC (permalink / raw) To: Fabiano Rosas, qemu-devel; +Cc: aik, danielhb413, qemu-ppc, clg, david Excerpts from Fabiano Rosas's message of February 25, 2022 4:58 am: > This adds migration support for TCG pseries machines running a KVM-HV > guest. > > The state that needs to be migrated is: > > - the nested PTCR value; > - the in_nested flag; > - the nested_tb_offset. > - the saved host CPUPPCState structure; > > Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> The series generally looks good to me, I guess patches 1 and 2 are fixes that could go ahead. Main thing about this is I was thinking of cutting down the CPUPPCState structure for saving the host state when in the L2, and making a specific structure for that that only contains what is required. This patch could easily switch to that so it's no big deal AFAIKS. > diff --git a/target/ppc/machine.c b/target/ppc/machine.c > index 7ee1984500..ae09b1bcfe 100644 > --- a/target/ppc/machine.c > +++ b/target/ppc/machine.c > @@ -10,6 +10,7 @@ > #include "kvm_ppc.h" > #include "power8-pmu.h" > #include "hw/ppc/ppc.h" > +#include "hw/ppc/spapr_cpu_core.h" > > static void post_load_update_msr(CPUPPCState *env) > { > @@ -679,6 +680,48 @@ static const VMStateDescription vmstate_tb_env = { > } > }; > > +static const VMStateDescription vmstate_hdecr = { > + .name = "cpu/hdecr", > + .version_id = 1, > + .minimum_version_id = 1, > + .fields = (VMStateField[]) { > + VMSTATE_UINT64(hdecr_next, ppc_tb_t), > + VMSTATE_TIMER_PTR(hdecr_timer, ppc_tb_t), > + VMSTATE_END_OF_LIST() > + } > +}; > + > +static bool nested_needed(void *opaque) > +{ > + PowerPCCPU *cpu = opaque; > + SpaprCpuState *spapr_cpu = spapr_cpu_state(cpu); > + > + return spapr_cpu->in_nested; > +} I don't know the migration code -- are you assured of having a spapr CPU here? Maybe this could call a helper function located near the spapr/nested code like 'return ppc_cpu_need_hdec_migrate(cpu)' ? Thanks, Nick ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 4/4] spapr: Add KVM-on-TCG migration support 2022-02-24 18:58 ` [RFC PATCH 4/4] spapr: Add KVM-on-TCG migration support Fabiano Rosas 2022-02-25 0:51 ` Nicholas Piggin @ 2022-02-25 3:42 ` David Gibson 2022-02-25 10:57 ` Nicholas Piggin 1 sibling, 1 reply; 20+ messages in thread From: David Gibson @ 2022-02-25 3:42 UTC (permalink / raw) To: Fabiano Rosas; +Cc: aik, danielhb413, qemu-devel, npiggin, qemu-ppc, clg [-- Attachment #1: Type: text/plain, Size: 8062 bytes --] On Thu, Feb 24, 2022 at 03:58:17PM -0300, Fabiano Rosas wrote: > This adds migration support for TCG pseries machines running a KVM-HV > guest. > > The state that needs to be migrated is: > > - the nested PTCR value; > - the in_nested flag; > - the nested_tb_offset. > - the saved host CPUPPCState structure; > > Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> > > --- > (this migrates just fine with L2 running stress and 1 VCPU in L1. With > 32 VCPUs in L1 there's crashes which I still don't understand. They might > be related to L1 migration being flaky right now) > --- > hw/ppc/spapr.c | 19 +++++++++++ > hw/ppc/spapr_cpu_core.c | 76 +++++++++++++++++++++++++++++++++++++++++ > target/ppc/machine.c | 44 ++++++++++++++++++++++++ > 3 files changed, 139 insertions(+) > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index f0b75b22bb..6e87c515db 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -1934,6 +1934,13 @@ static bool spapr_patb_entry_needed(void *opaque) > return !!spapr->patb_entry; > } > > +static bool spapr_nested_ptcr_needed(void *opaque) > +{ > + SpaprMachineState *spapr = opaque; > + > + return !!spapr->nested_ptcr; > +} > + > static const VMStateDescription vmstate_spapr_patb_entry = { > .name = "spapr_patb_entry", > .version_id = 1, > @@ -1945,6 +1952,17 @@ static const VMStateDescription vmstate_spapr_patb_entry = { > }, > }; > > +static const VMStateDescription vmstate_spapr_nested_ptcr = { > + .name = "spapr_nested_ptcr", > + .version_id = 1, > + .minimum_version_id = 1, > + .needed = spapr_nested_ptcr_needed, > + .fields = (VMStateField[]) { > + VMSTATE_UINT64(nested_ptcr, SpaprMachineState), > + VMSTATE_END_OF_LIST() > + }, > +}; > + > static bool spapr_irq_map_needed(void *opaque) > { > SpaprMachineState *spapr = opaque; > @@ -2069,6 +2087,7 @@ static const VMStateDescription vmstate_spapr = { > &vmstate_spapr_cap_fwnmi, > &vmstate_spapr_fwnmi, > &vmstate_spapr_cap_rpt_invalidate, > + &vmstate_spapr_nested_ptcr, Ok, the nested_ptcr stuff looks good. > NULL > } > }; > diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c > index efda7730f1..3ec13c0660 100644 > --- a/hw/ppc/spapr_cpu_core.c > +++ b/hw/ppc/spapr_cpu_core.c > @@ -25,6 +25,7 @@ > #include "sysemu/reset.h" > #include "sysemu/hw_accel.h" > #include "qemu/error-report.h" > +#include "migration/cpu.h" > > static void spapr_reset_vcpu(PowerPCCPU *cpu) > { > @@ -174,6 +175,80 @@ static const VMStateDescription vmstate_spapr_cpu_vpa = { > } > }; > > +static bool nested_needed(void *opaque) > +{ > + SpaprCpuState *spapr_cpu = opaque; > + > + return spapr_cpu->in_nested; > +} > + > +static int nested_state_pre_save(void *opaque) > +{ > + CPUPPCState *env = opaque; > + > + env->spr[SPR_LR] = env->lr; > + env->spr[SPR_CTR] = env->ctr; > + env->spr[SPR_XER] = cpu_read_xer(env); > + env->spr[SPR_CFAR] = env->cfar; > + return 0; > +} > + > +static int nested_state_post_load(void *opaque, int version_id) > +{ > + CPUPPCState *env = opaque; > + > + env->lr = env->spr[SPR_LR]; > + env->ctr = env->spr[SPR_CTR]; > + cpu_write_xer(env, env->spr[SPR_XER]); > + env->cfar = env->spr[SPR_CFAR]; > + > + return 0; > +} > + > +static const VMStateDescription vmstate_nested_host_state = { > + .name = "spapr_nested_host_state", > + .version_id = 1, > + .minimum_version_id = 1, > + .pre_save = nested_state_pre_save, > + .post_load = nested_state_post_load, > + .fields = (VMStateField[]) { > + VMSTATE_UINTTL_ARRAY(gpr, CPUPPCState, 32), > + VMSTATE_UINTTL_ARRAY(spr, CPUPPCState, 1024), > + VMSTATE_UINT32_ARRAY(crf, CPUPPCState, 8), > + VMSTATE_UINTTL(nip, CPUPPCState), > + VMSTATE_UINTTL(msr, CPUPPCState), > + VMSTATE_END_OF_LIST() > + } > +}; > + > +static int nested_cpu_pre_load(void *opaque) > +{ > + SpaprCpuState *spapr_cpu = opaque; > + > + spapr_cpu->nested_host_state = g_try_malloc(sizeof(CPUPPCState)); > + if (!spapr_cpu->nested_host_state) { > + return -1; > + } > + > + return 0; > +} > + > +static const VMStateDescription vmstate_spapr_cpu_nested = { > + .name = "spapr_cpu/nested", > + .version_id = 1, > + .minimum_version_id = 1, > + .needed = nested_needed, > + .pre_load = nested_cpu_pre_load, > + .fields = (VMStateField[]) { > + VMSTATE_BOOL(in_nested, SpaprCpuState), > + VMSTATE_INT64(nested_tb_offset, SpaprCpuState), > + VMSTATE_STRUCT_POINTER_V(nested_host_state, SpaprCpuState, 1, > + vmstate_nested_host_state, CPUPPCState), > + VMSTATE_END_OF_LIST() > + }, > +}; > + > static const VMStateDescription vmstate_spapr_cpu_state = { > .name = "spapr_cpu", > .version_id = 1, > @@ -184,6 +259,7 @@ static const VMStateDescription vmstate_spapr_cpu_state = { > }, > .subsections = (const VMStateDescription * []) { > &vmstate_spapr_cpu_vpa, > + &vmstate_spapr_cpu_nested, > NULL > } The vmstate_spapr_cpu_nested stuff looks good too, this is real information that we weren't migrating and can't be recovered from elsewhere. > }; > diff --git a/target/ppc/machine.c b/target/ppc/machine.c > index 7ee1984500..ae09b1bcfe 100644 > --- a/target/ppc/machine.c > +++ b/target/ppc/machine.c > @@ -10,6 +10,7 @@ > #include "kvm_ppc.h" > #include "power8-pmu.h" > #include "hw/ppc/ppc.h" > +#include "hw/ppc/spapr_cpu_core.h" > > static void post_load_update_msr(CPUPPCState *env) > { > @@ -679,6 +680,48 @@ static const VMStateDescription vmstate_tb_env = { > } > }; > > +static const VMStateDescription vmstate_hdecr = { > + .name = "cpu/hdecr", > + .version_id = 1, > + .minimum_version_id = 1, > + .fields = (VMStateField[]) { > + VMSTATE_UINT64(hdecr_next, ppc_tb_t), > + VMSTATE_TIMER_PTR(hdecr_timer, ppc_tb_t), > + VMSTATE_END_OF_LIST() > + } > +}; > + > +static bool nested_needed(void *opaque) > +{ > + PowerPCCPU *cpu = opaque; > + SpaprCpuState *spapr_cpu = spapr_cpu_state(cpu); > + > + return spapr_cpu->in_nested; > +} > + > +static int nested_pre_load(void *opaque) > +{ > + PowerPCCPU *cpu = opaque; > + CPUPPCState *env = &cpu->env; > + > + cpu_ppc_hdecr_init(env); > + > + return 0; > +} > + > +static const VMStateDescription vmstate_nested = { > + .name = "cpu/nested-guest", > + .version_id = 1, > + .minimum_version_id = 1, > + .needed = nested_needed, > + .pre_load = nested_pre_load, > + .fields = (VMStateField[]) { > + VMSTATE_STRUCT_POINTER_V(env.tb_env, PowerPCCPU, 1, > + vmstate_hdecr, ppc_tb_t), > + VMSTATE_END_OF_LIST() > + } > +}; > + > const VMStateDescription vmstate_ppc_cpu = { > .name = "cpu", > .version_id = 5, > @@ -734,6 +777,7 @@ const VMStateDescription vmstate_ppc_cpu = { > &vmstate_tlbemb, > &vmstate_tlbmas, > &vmstate_compat, > + &vmstate_nested, The hdecr stuff doesn't seem quite right. Notionally the L1 cpu, since it is in PAPR mode, doesn't *have* an HDECR. It's only the L0 nested-KVM extensions that allow it to kind of fake access to an HDECR. We're kind of abusing the HDECR fields in the cpu structure for this. At the very least I think the fake-HDECR migration stuff needs to go in the spapr_cpu_state not the general cpu state, since it would make no sense if the L1 were a powernv system. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 4/4] spapr: Add KVM-on-TCG migration support 2022-02-25 3:42 ` David Gibson @ 2022-02-25 10:57 ` Nicholas Piggin 0 siblings, 0 replies; 20+ messages in thread From: Nicholas Piggin @ 2022-02-25 10:57 UTC (permalink / raw) To: David Gibson, Fabiano Rosas; +Cc: aik, danielhb413, qemu-ppc, clg, qemu-devel Excerpts from David Gibson's message of February 25, 2022 1:42 pm: > On Thu, Feb 24, 2022 at 03:58:17PM -0300, Fabiano Rosas wrote: >> This adds migration support for TCG pseries machines running a KVM-HV >> guest. >> @@ -734,6 +777,7 @@ const VMStateDescription vmstate_ppc_cpu = { >> &vmstate_tlbemb, >> &vmstate_tlbmas, >> &vmstate_compat, >> + &vmstate_nested, > > The hdecr stuff doesn't seem quite right. Notionally the L1 cpu, > since it is in PAPR mode, doesn't *have* an HDECR. It's only the L0 > nested-KVM extensions that allow it to kind of fake access to an > HDECR. We're kind of abusing the HDECR fields in the cpu structure > for this. At the very least I think the fake-HDECR migration stuff > needs to go in the spapr_cpu_state not the general cpu state, since it > would make no sense if the L1 were a powernv system. We could possibly just make a new timer for this, btw. It's not really buying a lot and may be slightly confusing to share hdecr. It could still cause a HDEC exception though. (If we really wanted to divorce that confusion about the HV architecture from the vhyp nested L0 implementation we could come up with an entirely new set of "exit nested" kind of exceptions that don't look like HV interrupts, but reusing the HV interrupts was simple and kind of neat in a way). Thanks, Nick ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 0/4] ppc: nested TCG migration (KVM-on-TCG) 2022-02-24 18:58 [RFC PATCH 0/4] ppc: nested TCG migration (KVM-on-TCG) Fabiano Rosas ` (3 preceding siblings ...) 2022-02-24 18:58 ` [RFC PATCH 4/4] spapr: Add KVM-on-TCG migration support Fabiano Rosas @ 2022-02-24 21:00 ` Mark Cave-Ayland 2022-02-25 3:54 ` David Gibson 2022-02-25 16:11 ` Fabiano Rosas 4 siblings, 2 replies; 20+ messages in thread From: Mark Cave-Ayland @ 2022-02-24 21:00 UTC (permalink / raw) To: Fabiano Rosas, qemu-devel; +Cc: danielhb413, qemu-ppc, clg, npiggin, david On 24/02/2022 18:58, Fabiano Rosas wrote: > This series implements the migration for a TCG pseries guest running a > nested KVM guest. This is just like migrating a pseries TCG guest, but > with some extra state to allow a nested guest to continue to run on > the destination. > > Unfortunately the regular TCG migration scenario (not nested) is not > fully working so I cannot be entirely sure the nested migration is > correct. I have included a couple of patches for the general migration > case that (I think?) improve the situation a bit, but I'm still seeing > hard lockups and other issues with more than 1 vcpu. > > This is more of an early RFC to see if anyone spots something right > away. I haven't made much progress in debugging the general TCG > migration case so if anyone has any input there as well I'd appreciate > it. > > Thanks > > Fabiano Rosas (4): > target/ppc: TCG: Migrate tb_offset and decr > spapr: TCG: Migrate spapr_cpu->prod > hw/ppc: Take nested guest into account when saving timebase > spapr: Add KVM-on-TCG migration support > > hw/ppc/ppc.c | 17 +++++++- > hw/ppc/spapr.c | 19 ++++++++ > hw/ppc/spapr_cpu_core.c | 77 +++++++++++++++++++++++++++++++++ > include/hw/ppc/spapr_cpu_core.h | 2 +- > target/ppc/machine.c | 61 ++++++++++++++++++++++++++ > 5 files changed, 174 insertions(+), 2 deletions(-) FWIW I noticed there were some issues with migrating the decrementer on Mac machines a while ago which causes a hang on the destination with TCG (for MacOS on a x86 host in my case). Have a look at the following threads for reference: https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg00546.html https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg04622.html IIRC there is code that assumes any migration in PPC is being done live, and so adjusts the timebase on the destination to reflect wall clock time by recalculating tb_offset. I haven't looked at the code for a while but I think the outcome was that there needs to be 2 phases in migration: the first is to migrate the timebase as-is for guests that are paused during migration, whilst the second is to notify hypervisor-aware guest OSs such as Linux to make the timebase adjustment if required if the guest is running. ATB, Mark. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 0/4] ppc: nested TCG migration (KVM-on-TCG) 2022-02-24 21:00 ` [RFC PATCH 0/4] ppc: nested TCG migration (KVM-on-TCG) Mark Cave-Ayland @ 2022-02-25 3:54 ` David Gibson 2022-02-25 16:11 ` Fabiano Rosas 1 sibling, 0 replies; 20+ messages in thread From: David Gibson @ 2022-02-25 3:54 UTC (permalink / raw) To: Mark Cave-Ayland Cc: Fabiano Rosas, danielhb413, qemu-devel, npiggin, qemu-ppc, clg [-- Attachment #1: Type: text/plain, Size: 3803 bytes --] On Thu, Feb 24, 2022 at 09:00:24PM +0000, Mark Cave-Ayland wrote: > On 24/02/2022 18:58, Fabiano Rosas wrote: > > > This series implements the migration for a TCG pseries guest running a > > nested KVM guest. This is just like migrating a pseries TCG guest, but > > with some extra state to allow a nested guest to continue to run on > > the destination. > > > > Unfortunately the regular TCG migration scenario (not nested) is not > > fully working so I cannot be entirely sure the nested migration is > > correct. I have included a couple of patches for the general migration > > case that (I think?) improve the situation a bit, but I'm still seeing > > hard lockups and other issues with more than 1 vcpu. > > > > This is more of an early RFC to see if anyone spots something right > > away. I haven't made much progress in debugging the general TCG > > migration case so if anyone has any input there as well I'd appreciate > > it. > > > > Thanks > > > > Fabiano Rosas (4): > > target/ppc: TCG: Migrate tb_offset and decr > > spapr: TCG: Migrate spapr_cpu->prod > > hw/ppc: Take nested guest into account when saving timebase > > spapr: Add KVM-on-TCG migration support > > > > hw/ppc/ppc.c | 17 +++++++- > > hw/ppc/spapr.c | 19 ++++++++ > > hw/ppc/spapr_cpu_core.c | 77 +++++++++++++++++++++++++++++++++ > > include/hw/ppc/spapr_cpu_core.h | 2 +- > > target/ppc/machine.c | 61 ++++++++++++++++++++++++++ > > 5 files changed, 174 insertions(+), 2 deletions(-) > > FWIW I noticed there were some issues with migrating the decrementer on Mac > machines a while ago which causes a hang on the destination with TCG (for > MacOS on a x86 host in my case). Have a look at the following threads for > reference: > > https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg00546.html > https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg04622.html > > IIRC there is code that assumes any migration in PPC is being done live, and > so adjusts the timebase on the destination to reflect wall clock time by > recalculating tb_offset. I haven't looked at the code for a while but I > think the outcome was that there needs to be 2 phases in migration: the > first is to migrate the timebase as-is for guests that are paused during > migration, whilst the second is to notify hypervisor-aware guest OSs such as > Linux to make the timebase adjustment if required if the guest is running. Whether the timebase is adjusted for the migration downtime depends whether the guest clock is pinned to wall clock time or not. Usually it should be (because you don't want your clocks to go wrong on migration of a production system). However in neither case should be the guest be involved. There may be guest side code related to this in Linux, but that's probably for migration under pHyp, which is a guest aware migration system. That's essentially unrelated to migration under qemu/kvm, which is a guest unaware system. Guest aware migration has some nice-sounding advantages; in particular itcan allow migrations across a heterogenous cluster with differences between hosts that the hypervisor can't hide, or can't efficiently hide. However, it is IMO, a deeply broken approach, because it can allow an un-cooperative guest to indefinitely block migration, and for it to be reliably correct it requires *much* more pinning down of exactly what host system changes the guest can and can't be expected to cope with than PAPR has ever bothered to do. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 0/4] ppc: nested TCG migration (KVM-on-TCG) 2022-02-24 21:00 ` [RFC PATCH 0/4] ppc: nested TCG migration (KVM-on-TCG) Mark Cave-Ayland 2022-02-25 3:54 ` David Gibson @ 2022-02-25 16:11 ` Fabiano Rosas 1 sibling, 0 replies; 20+ messages in thread From: Fabiano Rosas @ 2022-02-25 16:11 UTC (permalink / raw) To: Mark Cave-Ayland, qemu-devel; +Cc: danielhb413, qemu-ppc, clg, npiggin, david Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> writes: > On 24/02/2022 18:58, Fabiano Rosas wrote: > >> This series implements the migration for a TCG pseries guest running a >> nested KVM guest. This is just like migrating a pseries TCG guest, but >> with some extra state to allow a nested guest to continue to run on >> the destination. >> >> Unfortunately the regular TCG migration scenario (not nested) is not >> fully working so I cannot be entirely sure the nested migration is >> correct. I have included a couple of patches for the general migration >> case that (I think?) improve the situation a bit, but I'm still seeing >> hard lockups and other issues with more than 1 vcpu. >> >> This is more of an early RFC to see if anyone spots something right >> away. I haven't made much progress in debugging the general TCG >> migration case so if anyone has any input there as well I'd appreciate >> it. >> >> Thanks >> >> Fabiano Rosas (4): >> target/ppc: TCG: Migrate tb_offset and decr >> spapr: TCG: Migrate spapr_cpu->prod >> hw/ppc: Take nested guest into account when saving timebase >> spapr: Add KVM-on-TCG migration support >> >> hw/ppc/ppc.c | 17 +++++++- >> hw/ppc/spapr.c | 19 ++++++++ >> hw/ppc/spapr_cpu_core.c | 77 +++++++++++++++++++++++++++++++++ >> include/hw/ppc/spapr_cpu_core.h | 2 +- >> target/ppc/machine.c | 61 ++++++++++++++++++++++++++ >> 5 files changed, 174 insertions(+), 2 deletions(-) > > FWIW I noticed there were some issues with migrating the decrementer on Mac machines > a while ago which causes a hang on the destination with TCG (for MacOS on a x86 host > in my case). Have a look at the following threads for reference: > > https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg00546.html > https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg04622.html Thanks, Mark! There's a lot of helpful information in these threads. > IIRC there is code that assumes any migration in PPC is being done >live, and so adjusts the timebase on the destination to reflect wall >clock time by recalculating tb_offset. I haven't looked at the code for >a while but I think the outcome was that there needs to be 2 phases in >migration: the first is to migrate the timebase as-is for guests that >are paused during migration, whilst the second is to notify >hypervisor-aware guest OSs such as Linux to make the timebase >adjustment if required if the guest is running. > > > ATB, > > Mark. ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2022-02-28 2:18 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-02-24 18:58 [RFC PATCH 0/4] ppc: nested TCG migration (KVM-on-TCG) Fabiano Rosas 2022-02-24 18:58 ` [RFC PATCH 1/4] target/ppc: TCG: Migrate tb_offset and decr Fabiano Rosas 2022-02-24 20:06 ` Richard Henderson 2022-02-25 3:15 ` David Gibson 2022-02-25 16:08 ` Fabiano Rosas 2022-02-28 2:04 ` David Gibson 2022-02-24 18:58 ` [RFC PATCH 2/4] spapr: TCG: Migrate spapr_cpu->prod Fabiano Rosas 2022-02-25 3:17 ` David Gibson 2022-02-25 16:08 ` Fabiano Rosas 2022-02-24 18:58 ` [RFC PATCH 3/4] hw/ppc: Take nested guest into account when saving timebase Fabiano Rosas 2022-02-25 3:21 ` David Gibson 2022-02-25 16:08 ` Fabiano Rosas 2022-02-28 2:06 ` David Gibson 2022-02-24 18:58 ` [RFC PATCH 4/4] spapr: Add KVM-on-TCG migration support Fabiano Rosas 2022-02-25 0:51 ` Nicholas Piggin 2022-02-25 3:42 ` David Gibson 2022-02-25 10:57 ` Nicholas Piggin 2022-02-24 21:00 ` [RFC PATCH 0/4] ppc: nested TCG migration (KVM-on-TCG) Mark Cave-Ayland 2022-02-25 3:54 ` David Gibson 2022-02-25 16:11 ` Fabiano Rosas
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).