From: Eduardo Habkost <ehabkost@redhat.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: qemu-devel@nongnu.org, Paolo Bonzini <pbonzini@redhat.com>,
Igor Mammedov <imammedo@redhat.com>,
Zack Cornelius <zack.cornelius@kove.net>
Subject: Re: [Qemu-devel] [PATCH 3/5] memory: Add RAM_NONPERSISTENT flag
Date: Thu, 22 Jun 2017 14:27:02 -0300 [thread overview]
Message-ID: <20170622172702.GB20956@localhost.localdomain> (raw)
In-Reply-To: <20170622121457.GC2624@work-vm>
On Thu, Jun 22, 2017 at 01:14:58PM +0100, Dr. David Alan Gilbert wrote:
> * Eduardo Habkost (ehabkost@redhat.com) wrote:
> > The new flag will make qemu_ram_free() discard the contents of the
> > block. It will be used to let QEMU be configured to avoid flushing file
> > contents to disk when exiting. As MADV_REMOVE is not always supported,
> > the new code will try MADV_NOTNEEDED in case MADV_REMOVE fails.
>
> I'd like to understand what semantics you're trying to achieve and thus
> why you prefer REMOVE to DONTNEED. If you're trying to avoid changes
> being written back then doesn't a DONTNEED get rid of any changes that
> have yet to be written? Or are there changes that have already been
> queued that REMOVE will kill off?
>
Generally speaking, it look(ed) like REMOVE is a superset of DONTNEED:
DONTNEED will free and zero pages only on anonymous private mappings;
REMOVE will free resources and zero pages on additional cases.
One case where I can think REMOVE would be useful is tmpfs when swapping
is involved: with REMOVE, the host can drop swap contents or avoid
writing memory contents to swap even if we are using a shared tmpfs
mapping.
Other filesystems might have similar cases where unnecessary I/O
operations might be performed even after madvise(MADV_DONTNEED) is
called. MADV_REMOVE lets us simply tell the kernel to drop the data.
I'm CCing Zack Cornelius, who initially suggested MADV_REMOVE, in case
he can describe more specific use cases.
> If you're just trying to save-time in writeback, it's interesting to
> note my requirement is that by the time I exit this function the
> process of throwing away the memory contents must be complete;
> I think your requirements are a lot lazier as to when it happens.
This is a very good point. I was assuming that REMOVE is a superset of
DONTNEED, but based on the manpage it doesn't seem to be guaranteed.
Probably I shouldn't try to reuse ram_block_discard_range() and write a
separate helper for madvise(MADV_REMOVE), as the requirements are
different.
> > The new flag will also indicate that ram_block_discard_range() can use
> > MADV_REMOVE when discarding memory pages. I have considered calling
> > MADV_REMOVE unconditionally (as destroying the RAM contents seems to be
> > OK every time ram_block_discard_range() is called), but for safety I
> > decided to restrict the new code to blocks having RAM_NONPERSISTENT set.
>
> The manpage on MADV_REMOVE is confusing; it says it doesn't work on Huge
> TLB pages, but says it does work on anything that can do
> FALLOC_FL_PUNCH_HOLE - which as far as I can tell hugetlbfs does.
Yes, it's confusing. I need to do some testing to find out if HugeTLBFS
supports MADV_REMOVE today. But my use case is just an optimization, so
it won't be a big deal if it doesn't cover every case in the first
version.
>
> I've got some code in my shared-postcopy world that has this function do
> the following which is kind of similar:
>
> /* The logic here is messy;
> * madvise DONTNEED fails for hugepages
> * fallocate works on hugepages and shmem
> */
> need_madvise = (rb->page_size == qemu_host_page_size) &&
> (rb->fd == -1 || !(rb->flags & RAM_SHARED));
> need_fallocate = rb->fd != -1;
This looks safer to me. I was bothered by the missing check for
(rb->fd != -1) in the current code.
> if (ret == -1 && need_fallocate) {
> #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
> ret = fallocate(rb->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
> start, length);
> #endif
> }
> if (need_madvise && (!need_fallocate || (ret == 0))) {
I'm confused by the (ret == 0) check here. Do you still want to call
madvise() if fallocate() succeeded?
> #if defined(CONFIG_MADVISE)
> ret = madvise(host_startaddr, length, MADV_DONTNEED);
> fprintf(stderr, "%s: Did madvise for %p got %d\n", __func__, host_startaddr, ret);
> #endif
> }
Anyway, now I'm considering simply not touching
ram_block_discard_range() and adding a new helper, because the
requirements are different. Maybe in the future we can make the two
functions share code, if we decide FALLOC_FL_PUNCH_HOLE will be useful
for RAM_NONPERSISTENT too.
(BTW, I will probably rename "persistent=no"/RAM_NONPERSISTENT to
something more explicit about data being dropped, like
"free-on-exit=yes" or "disposable=yes").
>
> Dave
>
> > Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
> > ---
> > exec.c | 17 ++++++++++++++++-
> > 1 file changed, 16 insertions(+), 1 deletion(-)
> >
> > diff --git a/exec.c b/exec.c
> > index 585d6ed6d7..a6e9ed4ece 100644
> > --- a/exec.c
> > +++ b/exec.c
> > @@ -102,6 +102,11 @@ static MemoryRegion io_mem_unassigned;
> > */
> > #define RAM_RESIZEABLE (1 << 2)
> >
> > +/* RAMBlock contents are not persistent, and we can discard memory contents
> > + * when freeing the memory block.
> > + */
> > +#define RAM_NONPERSISTENT (1 << 3)
> > +
> > #endif
> >
> > #ifdef TARGET_PAGE_BITS_VARY
> > @@ -2061,6 +2066,10 @@ void qemu_ram_free(RAMBlock *block)
> > ram_block_notify_remove(block->host, block->max_length);
> > }
> >
> > + if (block->flags & RAM_NONPERSISTENT) {
> > + ram_block_discard_range(block, 0, block->max_length);
> > + }
> > +
> > qemu_mutex_lock_ramlist();
> > QLIST_REMOVE_RCU(block, next);
> > ram_list.mru_block = NULL;
> > @@ -3537,7 +3546,13 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
> > /* Note: We need the madvise MADV_DONTNEED behaviour of definitely
> > * freeing the page.
> > */
> > - ret = madvise(host_startaddr, length, MADV_DONTNEED);
> > + if (rb->flags & RAM_NONPERSISTENT) {
> > + ret = madvise(host_startaddr, length, MADV_REMOVE);
> > + }
> > + /* Fallback to MADV_DONTNEED if MADV_REMOVE fails */
> > + if (ret || !(rb->flags & RAM_NONPERSISTENT)) {
> > + ret = madvise(host_startaddr, length, MADV_DONTNEED);
> > + }
> > #endif
> > } else {
> > /* Huge page case - unfortunately it can't do DONTNEED, but
> > --
> > 2.11.0.259.g40922b1
> >
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Eduardo
next prev parent reply other threads:[~2017-06-22 17:27 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-14 20:29 [Qemu-devel] [PATCH 0/5] hostmem-file: Add "persistent" option Eduardo Habkost
2017-06-14 20:29 ` [Qemu-devel] [PATCH 1/5] vl: Clean up user-creatable objects when exiting Eduardo Habkost
2017-06-14 20:29 ` [Qemu-devel] [PATCH 2/5] memory: Allow RAM up to block->max_length to be discarded Eduardo Habkost
2017-06-22 11:47 ` Dr. David Alan Gilbert
2017-06-14 20:29 ` [Qemu-devel] [PATCH 3/5] memory: Add RAM_NONPERSISTENT flag Eduardo Habkost
2017-06-22 12:14 ` Dr. David Alan Gilbert
2017-06-22 17:27 ` Eduardo Habkost [this message]
2017-06-22 18:56 ` Dr. David Alan Gilbert
2017-06-14 20:29 ` [Qemu-devel] [PATCH 4/5] memory: Add 'persistent' parameter to memory_region_init_ram_from_file() Eduardo Habkost
2017-06-22 12:26 ` Dr. David Alan Gilbert
2017-06-22 12:41 ` Eduardo Habkost
2017-06-14 20:30 ` [Qemu-devel] [PATCH 5/5] hostmem-file: Add "persistent" option Eduardo Habkost
2017-06-14 21:50 ` [Qemu-devel] [PATCH 0/5] " no-reply
2017-07-06 18:47 ` Eduardo Habkost
2017-08-11 16:33 ` Eduardo Habkost
2017-08-11 16:44 ` Daniel P. Berrange
2017-08-11 18:15 ` Eduardo Habkost
2017-08-14 9:39 ` Daniel P. Berrange
2017-08-14 11:40 ` Eduardo Habkost
2017-08-14 18:33 ` Zack Cornelius
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170622172702.GB20956@localhost.localdomain \
--to=ehabkost@redhat.com \
--cc=dgilbert@redhat.com \
--cc=imammedo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=zack.cornelius@kove.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).