qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Eduardo Habkost <ehabkost@redhat.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: qemu-devel@nongnu.org, Paolo Bonzini <pbonzini@redhat.com>,
	Igor Mammedov <imammedo@redhat.com>,
	Zack Cornelius <zack.cornelius@kove.net>
Subject: Re: [Qemu-devel] [PATCH 3/5] memory: Add RAM_NONPERSISTENT flag
Date: Thu, 22 Jun 2017 14:27:02 -0300	[thread overview]
Message-ID: <20170622172702.GB20956@localhost.localdomain> (raw)
In-Reply-To: <20170622121457.GC2624@work-vm>

On Thu, Jun 22, 2017 at 01:14:58PM +0100, Dr. David Alan Gilbert wrote:
> * Eduardo Habkost (ehabkost@redhat.com) wrote:
> > The new flag will make qemu_ram_free() discard the contents of the
> > block.  It will be used to let QEMU be configured to avoid flushing file
> > contents to disk when exiting.  As MADV_REMOVE is not always supported,
> > the new code will try MADV_NOTNEEDED in case MADV_REMOVE fails.
> 
> I'd like to understand what semantics you're trying to achieve and thus
> why you prefer REMOVE to DONTNEED.   If you're trying to avoid changes
> being written back then doesn't a DONTNEED get rid of any changes that
> have yet to be written?  Or are there changes that have already been
> queued that REMOVE will kill off?
> 

Generally speaking, it look(ed) like REMOVE is a superset of DONTNEED:
DONTNEED will free and zero pages only on anonymous private mappings;
REMOVE will free resources and zero pages on additional cases.

One case where I can think REMOVE would be useful is tmpfs when swapping
is involved: with REMOVE, the host can drop swap contents or avoid
writing memory contents to swap even if we are using a shared tmpfs
mapping.

Other filesystems might have similar cases where unnecessary I/O
operations might be performed even after madvise(MADV_DONTNEED) is
called.  MADV_REMOVE lets us simply tell the kernel to drop the data.

I'm CCing Zack Cornelius, who initially suggested MADV_REMOVE, in case
he can describe more specific use cases.


> If you're just trying to save-time in writeback, it's interesting to
> note my requirement is that by the time I exit this function the
> process of throwing away the memory contents must be complete;
> I think your requirements are a lot lazier as to when it happens.

This is a very good point.  I was assuming that REMOVE is a superset of
DONTNEED, but based on the manpage it doesn't seem to be guaranteed.
Probably I shouldn't try to reuse ram_block_discard_range() and write a
separate helper for madvise(MADV_REMOVE), as the requirements are
different.


> > The new flag will also indicate that ram_block_discard_range() can use
> > MADV_REMOVE when discarding memory pages.  I have considered calling
> > MADV_REMOVE unconditionally (as destroying the RAM contents seems to be
> > OK every time ram_block_discard_range() is called), but for safety I
> > decided to restrict the new code to blocks having RAM_NONPERSISTENT set.
> 
> The manpage on MADV_REMOVE is confusing; it says it doesn't work on Huge
> TLB pages, but says it does work on anything that can do
> FALLOC_FL_PUNCH_HOLE - which as far as I can tell hugetlbfs does.

Yes, it's confusing.  I need to do some testing to find out if HugeTLBFS
supports MADV_REMOVE today.  But my use case is just an optimization, so
it won't be a big deal if it doesn't cover every case in the first
version.

> 
> I've got some code in my shared-postcopy world that has this function do
> the following which is kind of similar:
> 
>         /* The logic here is messy;
>          *    madvise DONTNEED fails for hugepages
>          *    fallocate works on hugepages and shmem
>          */
>         need_madvise = (rb->page_size == qemu_host_page_size) &&
>                        (rb->fd == -1 || !(rb->flags & RAM_SHARED));
>         need_fallocate = rb->fd != -1;

This looks safer to me.  I was bothered by the missing check for
(rb->fd != -1) in the current code.

>         if (ret == -1 && need_fallocate) {
> #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
>             ret = fallocate(rb->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
>                             start, length);
> #endif
>         }
>         if (need_madvise && (!need_fallocate || (ret == 0))) {

I'm confused by the (ret == 0) check here.  Do you still want to call
madvise() if fallocate() succeeded?

> #if defined(CONFIG_MADVISE)
>             ret =  madvise(host_startaddr, length, MADV_DONTNEED);
>             fprintf(stderr, "%s: Did madvise for %p got %d\n", __func__, host_startaddr, ret);
> #endif
>         }


Anyway, now I'm considering simply not touching
ram_block_discard_range() and adding a new helper, because the
requirements are different.  Maybe in the future we can make the two
functions share code, if we decide FALLOC_FL_PUNCH_HOLE will be useful
for RAM_NONPERSISTENT too.

(BTW, I will probably rename "persistent=no"/RAM_NONPERSISTENT to
something more explicit about data being dropped, like
"free-on-exit=yes" or "disposable=yes").

> 
> Dave
> 
> > Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
> > ---
> >  exec.c | 17 ++++++++++++++++-
> >  1 file changed, 16 insertions(+), 1 deletion(-)
> > 
> > diff --git a/exec.c b/exec.c
> > index 585d6ed6d7..a6e9ed4ece 100644
> > --- a/exec.c
> > +++ b/exec.c
> > @@ -102,6 +102,11 @@ static MemoryRegion io_mem_unassigned;
> >   */
> >  #define RAM_RESIZEABLE (1 << 2)
> >  
> > +/* RAMBlock contents are not persistent, and we can discard memory contents
> > + * when freeing the memory block.
> > + */
> > +#define RAM_NONPERSISTENT (1 << 3)
> > +
> >  #endif
> >  
> >  #ifdef TARGET_PAGE_BITS_VARY
> > @@ -2061,6 +2066,10 @@ void qemu_ram_free(RAMBlock *block)
> >          ram_block_notify_remove(block->host, block->max_length);
> >      }
> >  
> > +    if (block->flags & RAM_NONPERSISTENT) {
> > +        ram_block_discard_range(block, 0, block->max_length);
> > +    }
> > +
> >      qemu_mutex_lock_ramlist();
> >      QLIST_REMOVE_RCU(block, next);
> >      ram_list.mru_block = NULL;
> > @@ -3537,7 +3546,13 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
> >              /* Note: We need the madvise MADV_DONTNEED behaviour of definitely
> >               * freeing the page.
> >               */
> > -            ret = madvise(host_startaddr, length, MADV_DONTNEED);
> > +            if (rb->flags & RAM_NONPERSISTENT) {
> > +                ret = madvise(host_startaddr, length, MADV_REMOVE);
> > +            }
> > +            /* Fallback to MADV_DONTNEED if MADV_REMOVE fails */
> > +            if (ret || !(rb->flags & RAM_NONPERSISTENT)) {
> > +                ret = madvise(host_startaddr, length, MADV_DONTNEED);
> > +            }
> >  #endif
> >          } else {
> >              /* Huge page case  - unfortunately it can't do DONTNEED, but
> > -- 
> > 2.11.0.259.g40922b1
> > 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



-- 
Eduardo

  reply	other threads:[~2017-06-22 17:27 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-14 20:29 [Qemu-devel] [PATCH 0/5] hostmem-file: Add "persistent" option Eduardo Habkost
2017-06-14 20:29 ` [Qemu-devel] [PATCH 1/5] vl: Clean up user-creatable objects when exiting Eduardo Habkost
2017-06-14 20:29 ` [Qemu-devel] [PATCH 2/5] memory: Allow RAM up to block->max_length to be discarded Eduardo Habkost
2017-06-22 11:47   ` Dr. David Alan Gilbert
2017-06-14 20:29 ` [Qemu-devel] [PATCH 3/5] memory: Add RAM_NONPERSISTENT flag Eduardo Habkost
2017-06-22 12:14   ` Dr. David Alan Gilbert
2017-06-22 17:27     ` Eduardo Habkost [this message]
2017-06-22 18:56       ` Dr. David Alan Gilbert
2017-06-14 20:29 ` [Qemu-devel] [PATCH 4/5] memory: Add 'persistent' parameter to memory_region_init_ram_from_file() Eduardo Habkost
2017-06-22 12:26   ` Dr. David Alan Gilbert
2017-06-22 12:41     ` Eduardo Habkost
2017-06-14 20:30 ` [Qemu-devel] [PATCH 5/5] hostmem-file: Add "persistent" option Eduardo Habkost
2017-06-14 21:50 ` [Qemu-devel] [PATCH 0/5] " no-reply
2017-07-06 18:47 ` Eduardo Habkost
2017-08-11 16:33 ` Eduardo Habkost
2017-08-11 16:44   ` Daniel P. Berrange
2017-08-11 18:15     ` Eduardo Habkost
2017-08-14  9:39       ` Daniel P. Berrange
2017-08-14 11:40         ` Eduardo Habkost
2017-08-14 18:33           ` Zack Cornelius

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170622172702.GB20956@localhost.localdomain \
    --to=ehabkost@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=zack.cornelius@kove.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).