From: David Hildenbrand <david@redhat.com>
To: Mike Kravetz <mike.kravetz@oracle.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Cc: Michal Hocko <mhocko@suse.com>,
Naoya Horiguchi <naoya.horiguchi@linux.dev>,
Axel Rasmussen <axelrasmussen@google.com>,
Peter Xu <peterx@redhat.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Mina Almasry <almasrymina@google.com>,
Shuah Khan <shuah@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [RFC PATCH 0/3] Add hugetlb MADV_DONTNEED support
Date: Thu, 27 Jan 2022 12:57:46 +0100 [thread overview]
Message-ID: <b476e461-aba9-e92b-d392-270029ab6b18@redhat.com> (raw)
In-Reply-To: <20220113180308.15610-1-mike.kravetz@oracle.com>
On 13.01.22 19:03, Mike Kravetz wrote:
> Userfaultfd selftests for hugetlb does not perform UFFD_EVENT_REMAP
> testing. However, mremap support was recently added in commit
> 550a7d60bd5e ("mm, hugepages: add mremap() support for hugepage backed
> vma"). While attempting to enable mremap support in the test, it was
> discovered that the mremap test indirectly depends on MADV_DONTNEED.
>
> hugetlb does not support MADV_DONTNEED. However, the only thing
> preventing support is a check in can_madv_lru_vma(). Simply removing
> the check will enable support.
>
> This is sent as a RFC because there is no existing use case calling
> for hugetlb MADV_DONTNEED support except possibly the userfaultfd test.
> However, adding support makes sense as it is fairly trivial and brings
> hugetlb functionality more in line with 'normal' memory.
>
Just a note:
QEMU doesn't use huge anonymous memory directly (MAP_ANON | MAP_HUGE...)
but instead always goes either via hugetlbfs or via memfd.
For MAP_PRIVATE hugetlb mappings, fallocate(FALLOC_FL_PUNCH_HOLE) seems
to get the job done (IOW: also discards private anon pages). See the
comments in the QEMU code below. I remember that that is somewhat
inconsistent. For ordinary MAP_PRIVATE mapped files I remember that we
always need fallocate(FALLOC_FL_PUNCH_HOLE) + madvise(QEMU_MADV_DONTNEED)
to make sure
a) All file pages are removed
b) All private anon pages are removed
IIRC hugetlbfs really is different in that regard, but maybe other fs
behave similarly.
That's why QEMU was able to live for now without MADV_DONTNEED support
for hugetlbfs and most probably won't ever need it.
...
/* The logic here is messy;
* madvise DONTNEED fails for hugepages
* fallocate works on hugepages and shmem
* shared anonymous memory requires madvise REMOVE
*/
need_madvise = (rb->page_size == qemu_host_page_size);
need_fallocate = rb->fd != -1;
if (need_fallocate) {
/* For a file, this causes the area of the file to be zero'd
* if read, and for hugetlbfs also causes it to be unmapped
* so a userfault will trigger.
*/
#ifdef CONFIG_FALLOCATE_PUNCH_HOLE
ret = fallocate(rb->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
start, length);
if (ret) {
ret = -errno;
error_report("ram_block_discard_range: Failed to fallocate "
"%s:%" PRIx64 " +%zx (%d)",
rb->idstr, start, length, ret);
goto err;
}
#else
ret = -ENOSYS;
error_report("ram_block_discard_range: fallocate not available/file"
"%s:%" PRIx64 " +%zx (%d)",
rb->idstr, start, length, ret);
goto err;
#endif
}
if (need_madvise) {
/* For normal RAM this causes it to be unmapped,
* for shared memory it causes the local mapping to disappear
* and to fall back on the file contents (which we just
* fallocate'd away).
*/
#if defined(CONFIG_MADVISE)
if (qemu_ram_is_shared(rb) && rb->fd < 0) {
ret = madvise(host_startaddr, length, QEMU_MADV_REMOVE);
} else {
ret = madvise(host_startaddr, length, QEMU_MADV_DONTNEED);
}
if (ret) {
ret = -errno;
error_report("ram_block_discard_range: Failed to discard range "
"%s:%" PRIx64 " +%zx (%d)",
rb->idstr, start, length, ret);
goto err;
}
#else
...
--
Thanks,
David / dhildenb
next prev parent reply other threads:[~2022-01-27 11:57 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-13 18:03 [RFC PATCH 0/3] Add hugetlb MADV_DONTNEED support Mike Kravetz
2022-01-13 18:03 ` [RFC PATCH 1/3] mm: enable MADV_DONTNEED for hugetlb mappings Mike Kravetz
2022-01-27 2:58 ` Naoya Horiguchi
2022-01-27 3:19 ` Mike Kravetz
2022-01-13 18:03 ` [RFC PATCH 2/3] selftests/vm: add hugetlb madvise MADV_DONTNEED MADV_REMOVE test Mike Kravetz
2022-01-13 18:03 ` [RFC PATCH 3/3] userfaultfd/selftests: enable huegtlb remap and remove event testing Mike Kravetz
2022-01-27 11:57 ` David Hildenbrand [this message]
2022-01-27 17:52 ` [RFC PATCH 0/3] Add hugetlb MADV_DONTNEED support Axel Rasmussen
2022-01-28 9:44 ` David Hildenbrand
2022-01-27 17:55 ` Mike Kravetz
2022-01-28 9:55 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b476e461-aba9-e92b-d392-270029ab6b18@redhat.com \
--to=david@redhat.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=almasrymina@google.com \
--cc=axelrasmussen@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=mike.kravetz@oracle.com \
--cc=naoya.horiguchi@linux.dev \
--cc=peterx@redhat.com \
--cc=shuah@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).