From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Peter Maydell <peter.maydell@linaro.org>
Cc: "David Hildenbrand" <david@redhat.com>,
"Philippe Mathieu-Daudé" <f4bug@amsat.org>,
"Peter Xu" <peterx@redhat.com>,
qemu-devel@nongnu.org, "Andrew Deason" <adeason@sinenomine.net>,
"Paolo Bonzini" <pbonzini@redhat.com>
Subject: Re: [PATCH] softmmu/physmem: Use qemu_madvise
Date: Wed, 16 Mar 2022 09:37:01 +0000 [thread overview]
Message-ID: <YjGvvRvPRV3ACbFY@work-vm> (raw)
In-Reply-To: <CAFEAcA96=yDKOknYmCKriWDJe4g-q07+b8yL3tFUf9=G-o84zA@mail.gmail.com>
* Peter Maydell (peter.maydell@linaro.org) wrote:
> On Wed, 16 Mar 2022 at 07:53, David Hildenbrand <david@redhat.com> wrote:
> >
> > On 16.03.22 05:04, Andrew Deason wrote:
> > > We have a thin wrapper around madvise, called qemu_madvise, which
> > > provides consistent behavior for the !CONFIG_MADVISE case, and works
> > > around some platform-specific quirks (some platforms only provide
> > > posix_madvise, and some don't offer all 'advise' types). This specific
> > > caller of madvise has never used it, tracing back to its original
> > > introduction in commit e0b266f01dd2 ("migration_completion: Take
> > > current state").
> > >
> > > Call qemu_madvise here, to follow the same logic as all of our other
> > > madvise callers. This slightly changes the behavior for
> > > !CONFIG_MADVISE (EINVAL instead of ENOSYS, and a slightly different
> > > error message), but this is now more consistent with other callers
> > > that use qemu_madvise.
> > >
> > > Signed-off-by: Andrew Deason <adeason@sinenomine.net>
> > > ---
> > > Looking at the history of commits that touch this madvise() call, it
> > > doesn't _look_ like there's any reason to be directly calling madvise vs
> > > qemu_advise (I don't see anything mentioned), but I'm not sure.
> > >
> > > softmmu/physmem.c | 12 ++----------
> > > 1 file changed, 2 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/softmmu/physmem.c b/softmmu/physmem.c
> > > index 43ae70fbe2..900c692b5e 100644
> > > --- a/softmmu/physmem.c
> > > +++ b/softmmu/physmem.c
> > > @@ -3584,40 +3584,32 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
> > > rb->idstr, start, length, ret);
> > > goto err;
> > > #endif
> > > }
> > > if (need_madvise) {
> > > /* For normal RAM this causes it to be unmapped,
> > > * for shared memory it causes the local mapping to disappear
> > > * and to fall back on the file contents (which we just
> > > * fallocate'd away).
> > > */
> > > -#if defined(CONFIG_MADVISE)
> > > if (qemu_ram_is_shared(rb) && rb->fd < 0) {
> > > - ret = madvise(host_startaddr, length, QEMU_MADV_REMOVE);
> > > + ret = qemu_madvise(host_startaddr, length, QEMU_MADV_REMOVE);
> > > } else {
> > > - ret = madvise(host_startaddr, length, QEMU_MADV_DONTNEED);
> > > + ret = qemu_madvise(host_startaddr, length, QEMU_MADV_DONTNEED);
> >
> > posix_madvise(QEMU_MADV_DONTNEED) has completely different semantics
> > then madvise() -- it's not a discard that we need here.
> >
> > So ram_block_discard_range() would now succeed in environments (BSD?)
> > where it's supposed to fail.
> >
> > So AFAIKs this isn't sane.
>
> But CONFIG_MADVISE just means "host has madvise()"; it doesn't imply
> "this is a Linux madvise() with MADV_DONTNEED". Solaris madvise()
> doesn't seem to have MADV_DONTNEED at all; a quick look at the
> FreeBSD manpage suggests its madvise MADV_DONTNEED is identical
> to its posix_madvise MADV_DONTNEED.
>
> If we need "specifically Linux MADV_DONTNEED semantics" maybe we
> should define a QEMU_MADV_LINUX_DONTNEED which either (a) does the
> right thing or (b) fails, and use qemu_madvise() regardless.
>
> Certainly the current code is pretty fragile to being changed by
> people who don't understand the undocumented subtlety behind
> the use of a direct madvise() call here.
Yeh and I'm not sure I can remembe rall the subtleties; there's a big
hairy set of ifdef's in include/qemu/madvise.h that makes
sure we always have the definition of QEMU_MADV_REMOVE/DONTNEED
even on platforms that might not define it themselves.
But I think this code is used for things with different degrees
of care about the semantics; e.g. 'balloon' just cares that
it frees memory up and doesn't care about the detailed semantics
that much; so it's probably fine with that.
Postcopy is much more touchy, but then it's only going to be
calling this on Linux anyway (because of the userfault dependency).
Dave
> -- PMM
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
next prev parent reply other threads:[~2022-03-16 9:41 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-16 4:04 [PATCH] softmmu/physmem: Use qemu_madvise Andrew Deason
2022-03-16 4:14 ` Peter Xu
2022-03-16 7:51 ` David Hildenbrand
2022-03-16 9:26 ` Peter Maydell
2022-03-16 9:37 ` Dr. David Alan Gilbert [this message]
2022-03-16 9:41 ` David Hildenbrand
2022-03-16 14:29 ` Andrew Deason
2022-03-22 16:39 ` Andrew Deason
2022-03-22 16:43 ` David Hildenbrand
2022-03-22 16:53 ` Dr. David Alan Gilbert
2022-03-22 17:34 ` Andrew Deason
2022-03-22 17:58 ` Dr. David Alan Gilbert
2022-03-22 19:35 ` Andrew Deason
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YjGvvRvPRV3ACbFY@work-vm \
--to=dgilbert@redhat.com \
--cc=adeason@sinenomine.net \
--cc=david@redhat.com \
--cc=f4bug@amsat.org \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.