qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Christian Borntraeger <borntraeger@de.ibm.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Thomas Huth <thuth@redhat.com>,
	qemu-s390x <qemu-s390x@nongnu.org>,
	qemu-devel <qemu-devel@nongnu.org>,
	Cornelia Huck <cohuck@redhat.com>,
	David Hildenbrand <david@redhat.com>,
	Halil Pasic <pasic@linux.vnet.ibm.com>,
	Janosch Frank <frankja@linux.vnet.ibm.com>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] [PATCH 1/1] s390/kvm: implement clearing part of IPL clear
Date: Thu, 1 Mar 2018 13:08:41 +0100	[thread overview]
Message-ID: <69654fb2-f5ba-c23b-f6f5-1b559692cf37@de.ibm.com> (raw)
In-Reply-To: <20180301114543.GC2994@work-vm>



On 03/01/2018 12:45 PM, Dr. David Alan Gilbert wrote:
> * Christian Borntraeger (borntraeger@de.ibm.com) wrote:
>>
>>
>> On 03/01/2018 10:24 AM, Dr. David Alan Gilbert wrote:
>>> * Thomas Huth (thuth@redhat.com) wrote:
>>>> On 28.02.2018 20:53, Christian Borntraeger wrote:
>>>>> When a guests reboots with diagnose 308 subcode 3 it requests the memory
>>>>> to be cleared. We did not do it so far. This does not only violate the
>>>>> architecture, it also misses the chance to free up that memory on
>>>>> reboot, which would help on host memory over commitment.  By using
>>>>> ram_block_discard_range we can cover both cases.
>>>>
>>>> Sounds like a good idea. I wonder whether that release_all_ram()
>>>> function should maybe rather reside in exec.c, so that other machines
>>>> that want to clear all RAM at reset time can use it, too?
>>>>
>>>>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>>>>> ---
>>>>>  target/s390x/kvm.c | 19 +++++++++++++++++++
>>>>>  1 file changed, 19 insertions(+)
>>>>>
>>>>> diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c
>>>>> index 8f3a422288..2e145ad5c3 100644
>>>>> --- a/target/s390x/kvm.c
>>>>> +++ b/target/s390x/kvm.c
>>>>> @@ -34,6 +34,8 @@
>>>>>  #include "qapi/error.h"
>>>>>  #include "qemu/error-report.h"
>>>>>  #include "qemu/timer.h"
>>>>> +#include "qemu/rcu_queue.h"
>>>>> +#include "sysemu/cpus.h"
>>>>>  #include "sysemu/sysemu.h"
>>>>>  #include "sysemu/hw_accel.h"
>>>>>  #include "hw/boards.h"
>>>>> @@ -41,6 +43,7 @@
>>>>>  #include "sysemu/device_tree.h"
>>>>>  #include "exec/gdbstub.h"
>>>>>  #include "exec/address-spaces.h"
>>>>> +#include "exec/ram_addr.h"
>>>>>  #include "trace.h"
>>>>>  #include "qapi-event.h"
>>>>>  #include "hw/s390x/s390-pci-inst.h"
>>>>> @@ -1841,6 +1844,14 @@ static int kvm_arch_handle_debug_exit(S390CPU *cpu)
>>>>>      return ret;
>>>>>  }
>>>>>  
>>>>> +static void release_all_rams(void)
>>>>
>>>> s/rams/ram/ maybe?
>>>>
>>>>> +{
>>>>> +    struct RAMBlock *rb;
>>>>> +
>>>>> +    QLIST_FOREACH_RCU(rb, &ram_list.blocks, next)
>>>>> +        ram_block_discard_range(rb, 0, rb->used_length);
>>>>
>>>> From a coding style point of view, I think there should be curly braces
>>>> around ram_block_discard_range() ?
>>>
>>> I think this might break if it happens during a postcopy migrate.
>>> The destination CPU is running, so it can do a reboot at just the wrong
>>> time; and then the pages (that are protected by userfaultfd) would get
>>> deallocated and trigger userfaultfd requests if accessed.
>>
>> Yes, userfaultd/postcopy is really fragile and relies on things that are not
>> necessarily true (e.g. virito-balloon can also invalidate pages).
> 
> That's why we use qemu_balloon_inhibit around postcopy to stop
> ballooning; I'm not aware of anything else that does the same.

we also have at least the pte_unused thing in mm/rmap.c that clearly
predates userfaultfd. We might need to look into this as well....

> 
>> The right thing here would be to actually terminate the postcopy migrate but
>> return it as "successful" (since we are going to clear that RAM anyway). Do 
>> you see a good way to achieve that?
> 
> There's no current mechanism to do it; I think it would have to involve
> some interaction with the source as well though to tell it that you
> didn't need that area of RAM anyway.
> 
> However, there are more problems:
>   a) Even forgetting the userfault problem, this is racy since during
> postcopy you're still receiving blocks from the source at the same time;
> so some of the area that you've discarded might get overwritten by data
> from the source.

So how do you handle the case when the target system writes to memory
that is still in flight? Can we build on that mechanism?
> 
>   b) Your release_all_rams seems to do all RAM Blocks - won't that nuke
> any ROMs as well? Or maybe even flash?

ROMs loaded with load_elf (like our s390-ccw.img) are reloaded on every reset.
See rom_reset in /hw/core/loader.c

Is this different with the x86 bios?


> 
>   c) In a normal precopy migration, I think you may also get old data;
> Paolo said that an MADV_DONTNEED won't cause the dirty flags to be set,
> so if the migrate has already sent the data for a page, and then this
> happens, before the CPUs are stopped during the migration, when you
> restart on the destination you'll have the old data.

Yes, looks like we might get non-cleared data. Could we maybe combine fixing
and optimizing: we can stop tranmitting the memory and do a clean
startup on the target side. In other words could we actually use the
reset clear trigger to speed up migration?



> 
> Dave
> 
>>
>>>
>>> Dave
>>>
>>>>> +}
>>>>> +
>>>>>  int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
>>>>>  {
>>>>>      S390CPU *cpu = S390_CPU(cs);
>>>>> @@ -1853,6 +1864,14 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
>>>>>              ret = handle_intercept(cpu);
>>>>>              break;
>>>>>          case KVM_EXIT_S390_RESET:
>>>>> +            if (run->s390_reset_flags & KVM_S390_RESET_CLEAR) {
>>>>> +                /*
>>>>> +                 * We will stop other CPUs anyway, avoid spurious crashes and
>>>>> +                 * get all CPUs out. The reset will take care of the resume.
>>>>> +                 */
>>>>> +                pause_all_vcpus();
>>>>> +                release_all_rams();
>>>>> +            }
>>>>>              s390_reipl_request();
>>>>>              break;
>>>>>          case KVM_EXIT_S390_TSCH:
>>>>>
>>>>
>>>> Apart from the cosmetic nits, patch looks good to me.
>>>>
>>>>  Thomas
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 

  reply	other threads:[~2018-03-01 12:08 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-28 19:53 [Qemu-devel] [PATCH 1/1] s390/kvm: implement clearing part of IPL clear Christian Borntraeger
2018-03-01  3:58 ` Thomas Huth
2018-03-01  7:37   ` Christian Borntraeger
2018-03-01  8:44   ` Paolo Bonzini
2018-03-01  9:24   ` Dr. David Alan Gilbert
2018-03-01 11:00     ` Christian Borntraeger
2018-03-01 11:45       ` Dr. David Alan Gilbert
2018-03-01 12:08         ` Christian Borntraeger [this message]
2018-03-01 12:28           ` Dr. David Alan Gilbert
2018-03-01 12:35             ` Christian Borntraeger
2018-03-01 12:39               ` Christian Borntraeger
2018-03-01 12:58                 ` Dr. David Alan Gilbert
2018-03-01 12:49               ` Dr. David Alan Gilbert
2018-03-01  9:21 ` David Hildenbrand
2018-03-05 12:54 ` Cornelia Huck
2018-03-05 13:04   ` Christian Borntraeger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=69654fb2-f5ba-c23b-f6f5-1b559692cf37@de.ibm.com \
    --to=borntraeger@de.ibm.com \
    --cc=cohuck@redhat.com \
    --cc=david@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=frankja@linux.vnet.ibm.com \
    --cc=pasic@linux.vnet.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-s390x@nongnu.org \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).