From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Alexey Perevalov <a.perevalov@samsung.com>
Cc: qemu-devel@nongnu.org, Andrea Arcangeli <aarcange@redhat.com>,
Mike Kravetz <mike.kravetz@oracle.com>
Subject: Re: [Qemu-devel] Dual userfaultfd behavior
Date: Mon, 13 Mar 2017 10:53:39 +0000 [thread overview]
Message-ID: <20170313105338.GA2626@work-vm> (raw)
In-Reply-To: <20170310140806.GA11225@aperevalov-ubuntu>
* Alexey Perevalov (a.perevalov@samsung.com) wrote:
> Hi, David, Andrea and Mike
Hi Alexey,
> The problem I want to discuss it's 1G hugepage based VM and post copy live
> migration.
>
> I would like to know your opinion on following approach of avoiding such
> problem:
> Once we have mmap'ed area through 1G hugetlbfs, remap physical pages
> with /dev/mem. It will be 2 types of vmas mapped to the same PFN.
> Register userfaultfd for newly obtained virtual
> addresses, it could reduce granularity of pages and reduce downtime per
> one 1G page. So registering userfaultfd for 2Mb, when the real hugepage
> was 1G, I think, could help.
>
> Current postcopy implementation in QEMU allows to make live migration
> from 1G based hugepage VM to 2Mb based hugepages VM (sanity checks prevent
> it).
>
> Also I checked, it's possible to remap through /dev/mem and get PFN
> based vmas, register userfaultfd (with allowance in vma_can_userfault)
> and finally make UFFDIO_COPY with allowing PFN based vmas in __mcopy_atomic.
>
> But there are a lot of drawback of such approach:
> First of all it's /dev/mem interface. Need to provide full access
> (kernel w/o CONFIG_STRICT_DEVMEM) and need to disable PAT.
> The second drawback, maybe I just didn't find possibility to remap
> hugepages again, but mmap of /dev/mem character driver maps 4Kb pages.
> I don't know how THP could help here, but madvise with MADV_HUGEPAGE
> didn't. So 4Kb is not exactly what needed, due to overhead of
> encapsulation summary downtime is worse than in other cases.
> It would be great to have interface to obtain new virtual address based
> on existing PFN, but for hugepages.
Yes, and I think as well on some architectures there can be cache problems
from mapping the same page in two addresses unless we're careful.
I think to do this we'd basically need the kernel to set up something
similar to what you're saying, but without the mess of having to
go via /dev/mem. Ideally it would all happen magically when I mark
a hugetlb page as userfault and start UFFDIO_COPYing in 4kb pages;
but I can imagine perhaps some more syscalls needed to tell it to do it.
I've no idea how hard that is to do though.
Dave
> Honestly, I can't find another use case for this feature.
>
>
> --
>
> BR
> Alexey
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
next prev parent reply other threads:[~2017-03-13 10:53 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20170310140816eucas1p2bd85d3759739d64bb184ce0fbd74f90d@eucas1p2.samsung.com>
2017-03-10 14:08 ` [Qemu-devel] Dual userfaultfd behavior Alexey Perevalov
2017-03-13 10:53 ` Dr. David Alan Gilbert [this message]
2017-03-13 21:46 ` Andrea Arcangeli
2017-03-15 13:47 ` Alexey Perevalov
2017-03-17 23:27 ` Mike Kravetz
2017-04-10 16:31 ` Alexey Perevalov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170313105338.GA2626@work-vm \
--to=dgilbert@redhat.com \
--cc=a.perevalov@samsung.com \
--cc=aarcange@redhat.com \
--cc=mike.kravetz@oracle.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.