qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Thomas Huth <thuth@redhat.com>,
	qemu-s390x <qemu-s390x@nongnu.org>,
	qemu-devel <qemu-devel@nongnu.org>,
	Cornelia Huck <cohuck@redhat.com>,
	David Hildenbrand <david@redhat.com>,
	Halil Pasic <pasic@linux.vnet.ibm.com>,
	Janosch Frank <frankja@linux.vnet.ibm.com>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] [PATCH 1/1] s390/kvm: implement clearing part of IPL clear
Date: Thu, 1 Mar 2018 12:58:12 +0000	[thread overview]
Message-ID: <20180301125811.GF2994@work-vm> (raw)
In-Reply-To: <fa1fed0b-4c7d-068e-01d3-cd34aa9d6864@de.ibm.com>

* Christian Borntraeger (borntraeger@de.ibm.com) wrote:
> 
> 
> On 03/01/2018 01:35 PM, Christian Borntraeger wrote:
> > 
> > 
> > On 03/01/2018 01:28 PM, Dr. David Alan Gilbert wrote:
> >> * Christian Borntraeger (borntraeger@de.ibm.com) wrote:
> >>>
> >>>
> >>> On 03/01/2018 12:45 PM, Dr. David Alan Gilbert wrote:
> >>>> * Christian Borntraeger (borntraeger@de.ibm.com) wrote:
> >>>>>
> >>>>>
> >>>>> On 03/01/2018 10:24 AM, Dr. David Alan Gilbert wrote:
> >>>>>> * Thomas Huth (thuth@redhat.com) wrote:
> >>>>>>> On 28.02.2018 20:53, Christian Borntraeger wrote:
> >>>>>>>> When a guests reboots with diagnose 308 subcode 3 it requests the memory
> >>>>>>>> to be cleared. We did not do it so far. This does not only violate the
> >>>>>>>> architecture, it also misses the chance to free up that memory on
> >>>>>>>> reboot, which would help on host memory over commitment.  By using
> >>>>>>>> ram_block_discard_range we can cover both cases.
> >>>>>>>
> >>>>>>> Sounds like a good idea. I wonder whether that release_all_ram()
> >>>>>>> function should maybe rather reside in exec.c, so that other machines
> >>>>>>> that want to clear all RAM at reset time can use it, too?
> >>>>>>>
> >>>>>>>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> >>>>>>>> ---
> >>>>>>>>  target/s390x/kvm.c | 19 +++++++++++++++++++
> >>>>>>>>  1 file changed, 19 insertions(+)
> >>>>>>>>
> >>>>>>>> diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c
> >>>>>>>> index 8f3a422288..2e145ad5c3 100644
> >>>>>>>> --- a/target/s390x/kvm.c
> >>>>>>>> +++ b/target/s390x/kvm.c
> >>>>>>>> @@ -34,6 +34,8 @@
> >>>>>>>>  #include "qapi/error.h"
> >>>>>>>>  #include "qemu/error-report.h"
> >>>>>>>>  #include "qemu/timer.h"
> >>>>>>>> +#include "qemu/rcu_queue.h"
> >>>>>>>> +#include "sysemu/cpus.h"
> >>>>>>>>  #include "sysemu/sysemu.h"
> >>>>>>>>  #include "sysemu/hw_accel.h"
> >>>>>>>>  #include "hw/boards.h"
> >>>>>>>> @@ -41,6 +43,7 @@
> >>>>>>>>  #include "sysemu/device_tree.h"
> >>>>>>>>  #include "exec/gdbstub.h"
> >>>>>>>>  #include "exec/address-spaces.h"
> >>>>>>>> +#include "exec/ram_addr.h"
> >>>>>>>>  #include "trace.h"
> >>>>>>>>  #include "qapi-event.h"
> >>>>>>>>  #include "hw/s390x/s390-pci-inst.h"
> >>>>>>>> @@ -1841,6 +1844,14 @@ static int kvm_arch_handle_debug_exit(S390CPU *cpu)
> >>>>>>>>      return ret;
> >>>>>>>>  }
> >>>>>>>>  
> >>>>>>>> +static void release_all_rams(void)
> >>>>>>>
> >>>>>>> s/rams/ram/ maybe?
> >>>>>>>
> >>>>>>>> +{
> >>>>>>>> +    struct RAMBlock *rb;
> >>>>>>>> +
> >>>>>>>> +    QLIST_FOREACH_RCU(rb, &ram_list.blocks, next)
> >>>>>>>> +        ram_block_discard_range(rb, 0, rb->used_length);
> >>>>>>>
> >>>>>>> From a coding style point of view, I think there should be curly braces
> >>>>>>> around ram_block_discard_range() ?
> >>>>>>
> >>>>>> I think this might break if it happens during a postcopy migrate.
> >>>>>> The destination CPU is running, so it can do a reboot at just the wrong
> >>>>>> time; and then the pages (that are protected by userfaultfd) would get
> >>>>>> deallocated and trigger userfaultfd requests if accessed.
> >>>>>
> >>>>> Yes, userfaultd/postcopy is really fragile and relies on things that are not
> >>>>> necessarily true (e.g. virito-balloon can also invalidate pages).
> >>>>
> >>>> That's why we use qemu_balloon_inhibit around postcopy to stop
> >>>> ballooning; I'm not aware of anything else that does the same.
> >>>
> >>> we also have at least the pte_unused thing in mm/rmap.c that clearly
> >>> predates userfaultfd. We might need to look into this as well....
> >>
> >> I've not come across that; what does that do?
> > 
> > It can drop a page on page out if the page is no longer of value. It is used by
> > the CMMA (guest page hinting) code of s390x.
> > 
> > see kernel mm/rmap.c
> > 
> > 
> > static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
> >                      unsigned long address, void *arg)
> > {
> > [...]
> >                 } else if (pte_unused(pteval)) {
> >                         /*
> >                          * The guest indicated that the page content is of no
> >                          * interest anymore. Simply discard the pte, vmscan
> >                          * will take care of the rest.
> >                          */
> > 			dec_mm_counter(mm, mm_counter(page));
> >                         /* We have to invalidate as we cleared the pte */
> >                         mmu_notifier_invalidate_range(mm, address,
> >                                                       address + PAGE_SIZE);
> >                 } else if (IS_ENABLED(CONFIG_MIGRATION) &&
> >                                 (flags & (TTU_MIGRATION|TTU_SPLIT_FREEZE))) {
> > [...]
> > 
> > 
> 
> Maybe something like this in the kernel
> 
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 47db27f8049e..9bdf4d448987 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1483,7 +1483,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
>                                 set_pte_at(mm, address, pvmw.pte, pteval);
>                         }
>  
> -               } else if (pte_unused(pteval)) {
> +               } else if (pte_unused(pteval) && !vma->vm_userfaultfd_ctx.ctx) {
>                         /*
>                          * The guest indicated that the page content is of no
>                          * interest anymore. Simply discard the pte, vmscan
> 
> 
> could help?

I guess so, but please check with aarcange; I don't know the mm code.

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

  reply	other threads:[~2018-03-01 12:58 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-28 19:53 [Qemu-devel] [PATCH 1/1] s390/kvm: implement clearing part of IPL clear Christian Borntraeger
2018-03-01  3:58 ` Thomas Huth
2018-03-01  7:37   ` Christian Borntraeger
2018-03-01  8:44   ` Paolo Bonzini
2018-03-01  9:24   ` Dr. David Alan Gilbert
2018-03-01 11:00     ` Christian Borntraeger
2018-03-01 11:45       ` Dr. David Alan Gilbert
2018-03-01 12:08         ` Christian Borntraeger
2018-03-01 12:28           ` Dr. David Alan Gilbert
2018-03-01 12:35             ` Christian Borntraeger
2018-03-01 12:39               ` Christian Borntraeger
2018-03-01 12:58                 ` Dr. David Alan Gilbert [this message]
2018-03-01 12:49               ` Dr. David Alan Gilbert
2018-03-01  9:21 ` David Hildenbrand
2018-03-05 12:54 ` Cornelia Huck
2018-03-05 13:04   ` Christian Borntraeger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180301125811.GF2994@work-vm \
    --to=dgilbert@redhat.com \
    --cc=borntraeger@de.ibm.com \
    --cc=cohuck@redhat.com \
    --cc=david@redhat.com \
    --cc=frankja@linux.vnet.ibm.com \
    --cc=pasic@linux.vnet.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-s390x@nongnu.org \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).