From: Andy Lutomirski <luto@amacapital.net>
To: Andrea Arcangeli <aarcange@redhat.com>,
qemu-devel@nongnu.org, kvm@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Cc: "\"Dr. David Alan Gilbert\"" <dgilbert@redhat.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Andrew Morton <akpm@linux-foundation.org>,
Android Kernel Team <kernel-team@android.com>,
Robert Love <rlove@google.com>, Mel Gorman <mel@csn.ul.ie>,
Hugh Dickins <hughd@google.com>, Dave Hansen <dave@sr71.net>,
Rik van Riel <riel@redhat.com>,
Dmitry Adamushko <dmitry.adamushko@gmail.com>,
Neil Brown <neilb@suse.de>, Mike Hommey <mh@glandium.org>,
Taras Glek <tglek@mozilla.com>, Jan Kara <jack@suse.cz>,
KOSAKI Motohiro <kosaki.motohiro@gmail.com>,
Michel Lespinasse <walken@google.com>,
Minchan Kim <minchan@kernel.org>,
Keith Packard <keithp@keithp.com>,
"Huangpeng (Peter)" <peter.huangpeng@huawei.com>,
Isaku Yamahata <yamahata@valinux.co.jp>,
Linux API <linux-api@vger.kernel.org>
Subject: Re: [PATCH 08/10] userfaultfd: add new syscall to provide memory externalization
Date: Wed, 02 Jul 2014 18:56:03 -0700 [thread overview]
Message-ID: <53B4B833.9010508@mit.edu> (raw)
In-Reply-To: <1404319816-30229-9-git-send-email-aarcange@redhat.com>
On 07/02/2014 09:50 AM, Andrea Arcangeli wrote:
> Once an userfaultfd is created MADV_USERFAULT regions talks through
> the userfaultfd protocol with the thread responsible for doing the
> memory externalization of the process.
>
> The protocol starts by userland writing the requested/preferred
> USERFAULT_PROTOCOL version into the userfault fd (64bit write), if
> kernel knows it, it will ack it by allowing userland to read 64bit
> from the userfault fd that will contain the same 64bit
> USERFAULT_PROTOCOL version that userland asked. Otherwise userfault
> will read __u64 value -1ULL (aka USERFAULTFD_UNKNOWN_PROTOCOL) and it
> will have to try again by writing an older protocol version if
> suitable for its usage too, and read it back again until it stops
> reading -1ULL. After that the userfaultfd protocol starts.
>
> The protocol consists in the userfault fd reads 64bit in size
> providing userland the fault addresses. After a userfault address has
> been read and the fault is resolved by userland, the application must
> write back 128bits in the form of [ start, end ] range (64bit each)
> that will tell the kernel such a range has been mapped. Multiple read
> userfaults can be resolved in a single range write. poll() can be used
> to know when there are new userfaults to read (POLLIN) and when there
> are threads waiting a wakeup through a range write (POLLOUT).
>
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> +#ifdef CONFIG_PROC_FS
> +static int userfaultfd_show_fdinfo(struct seq_file *m, struct file *f)
> +{
> + struct userfaultfd_ctx *ctx = f->private_data;
> + int ret;
> + wait_queue_t *wq;
> + struct userfaultfd_wait_queue *uwq;
> + unsigned long pending = 0, total = 0;
> +
> + spin_lock(&ctx->fault_wqh.lock);
> + list_for_each_entry(wq, &ctx->fault_wqh.task_list, task_list) {
> + uwq = container_of(wq, struct userfaultfd_wait_queue, wq);
> + if (uwq->pending)
> + pending++;
> + total++;
> + }
> + spin_unlock(&ctx->fault_wqh.lock);
> +
> + ret = seq_printf(m, "pending:\t%lu\ntotal:\t%lu\n", pending, total);
This should show the protocol version, too.
> +
> +SYSCALL_DEFINE1(userfaultfd, int, flags)
> +{
> + int fd, error;
> + struct file *file;
This looks like it can't be used more than once in a process. That will
be unfortunate for libraries. Would it be feasible to either have
userfaultfd claim a range of addresses or for a vma to be explicitly
associated with a userfaultfd? (In the latter case, giant PROT_NONE
MAP_NORESERVE mappings could be used.)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Andy Lutomirski <luto@amacapital.net>
To: Andrea Arcangeli <aarcange@redhat.com>,
qemu-devel@nongnu.org, kvm@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Cc: "\"Dr. David Alan Gilbert\"" <dgilbert@redhat.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Andrew Morton <akpm@linux-foundation.org>,
Android Kernel Team <kernel-team@android.com>,
Robert Love <rlove@google.com>, Mel Gorman <mel@csn.ul.ie>,
Hugh Dickins <hughd@google.com>, Dave Hansen <dave@sr71.net>,
Rik van Riel <riel@redhat.com>,
Dmitry Adamushko <dmitry.adamushko@gmail.com>,
Neil Brown <neilb@suse.de>, Mike Hommey <mh@glandium.org>,
Taras Glek <tglek@mozilla.com>, Jan Kara <jack@suse.cz>,
KOSAKI Motohiro <kosaki.motohiro@gmail.com>,
Michel Lespinasse <walken@google.com>,
Minchan Kim <minchan@kernel.org>,
Keith Packard <keithp@keithp.com>,
"Huangpeng (Peter)" <peter.huangpeng@huawei.com>,
Isaku Yamahata <yamahata@valinux.co.jp>,
Linux API <linux-api@vger.kernel.org>
Subject: Re: [PATCH 08/10] userfaultfd: add new syscall to provide memory externalization
Date: Wed, 02 Jul 2014 18:56:03 -0700 [thread overview]
Message-ID: <53B4B833.9010508@mit.edu> (raw)
In-Reply-To: <1404319816-30229-9-git-send-email-aarcange@redhat.com>
On 07/02/2014 09:50 AM, Andrea Arcangeli wrote:
> Once an userfaultfd is created MADV_USERFAULT regions talks through
> the userfaultfd protocol with the thread responsible for doing the
> memory externalization of the process.
>
> The protocol starts by userland writing the requested/preferred
> USERFAULT_PROTOCOL version into the userfault fd (64bit write), if
> kernel knows it, it will ack it by allowing userland to read 64bit
> from the userfault fd that will contain the same 64bit
> USERFAULT_PROTOCOL version that userland asked. Otherwise userfault
> will read __u64 value -1ULL (aka USERFAULTFD_UNKNOWN_PROTOCOL) and it
> will have to try again by writing an older protocol version if
> suitable for its usage too, and read it back again until it stops
> reading -1ULL. After that the userfaultfd protocol starts.
>
> The protocol consists in the userfault fd reads 64bit in size
> providing userland the fault addresses. After a userfault address has
> been read and the fault is resolved by userland, the application must
> write back 128bits in the form of [ start, end ] range (64bit each)
> that will tell the kernel such a range has been mapped. Multiple read
> userfaults can be resolved in a single range write. poll() can be used
> to know when there are new userfaults to read (POLLIN) and when there
> are threads waiting a wakeup through a range write (POLLOUT).
>
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> +#ifdef CONFIG_PROC_FS
> +static int userfaultfd_show_fdinfo(struct seq_file *m, struct file *f)
> +{
> + struct userfaultfd_ctx *ctx = f->private_data;
> + int ret;
> + wait_queue_t *wq;
> + struct userfaultfd_wait_queue *uwq;
> + unsigned long pending = 0, total = 0;
> +
> + spin_lock(&ctx->fault_wqh.lock);
> + list_for_each_entry(wq, &ctx->fault_wqh.task_list, task_list) {
> + uwq = container_of(wq, struct userfaultfd_wait_queue, wq);
> + if (uwq->pending)
> + pending++;
> + total++;
> + }
> + spin_unlock(&ctx->fault_wqh.lock);
> +
> + ret = seq_printf(m, "pending:\t%lu\ntotal:\t%lu\n", pending, total);
This should show the protocol version, too.
> +
> +SYSCALL_DEFINE1(userfaultfd, int, flags)
> +{
> + int fd, error;
> + struct file *file;
This looks like it can't be used more than once in a process. That will
be unfortunate for libraries. Would it be feasible to either have
userfaultfd claim a range of addresses or for a vma to be explicitly
associated with a userfaultfd? (In the latter case, giant PROT_NONE
MAP_NORESERVE mappings could be used.)
WARNING: multiple messages have this Message-ID (diff)
From: Andy Lutomirski <luto@amacapital.net>
To: Andrea Arcangeli <aarcange@redhat.com>,
qemu-devel@nongnu.org, kvm@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Cc: Taras Glek <tglek@mozilla.com>, Robert Love <rlove@google.com>,
Dave Hansen <dave@sr71.net>, Jan Kara <jack@suse.cz>,
Minchan Kim <minchan@kernel.org>, Mel Gorman <mel@csn.ul.ie>,
Linux API <linux-api@vger.kernel.org>,
Hugh Dickins <hughd@google.com>,
"\"Dr. David Alan Gilbert\"" <dgilbert@redhat.com>,
"Huangpeng (Peter)" <peter.huangpeng@huawei.com>,
Neil Brown <neilb@suse.de>,
Dmitry Adamushko <dmitry.adamushko@gmail.com>,
Johannes Weiner <hannes@cmpxchg.org>,
KOSAKI Motohiro <kosaki.motohiro@gmail.com>,
Mike Hommey <mh@glandium.org>,
Andrew Morton <akpm@linux-foundation.org>,
Michel Lespinasse <walken@google.com>,
Android Kernel Team <kernel-team@android.com>,
Keith Packard <keithp@keithp.com>,
Isaku Yamahata <yamahata@valinux.co.jp>
Subject: Re: [Qemu-devel] [PATCH 08/10] userfaultfd: add new syscall to provide memory externalization
Date: Wed, 02 Jul 2014 18:56:03 -0700 [thread overview]
Message-ID: <53B4B833.9010508@mit.edu> (raw)
In-Reply-To: <1404319816-30229-9-git-send-email-aarcange@redhat.com>
On 07/02/2014 09:50 AM, Andrea Arcangeli wrote:
> Once an userfaultfd is created MADV_USERFAULT regions talks through
> the userfaultfd protocol with the thread responsible for doing the
> memory externalization of the process.
>
> The protocol starts by userland writing the requested/preferred
> USERFAULT_PROTOCOL version into the userfault fd (64bit write), if
> kernel knows it, it will ack it by allowing userland to read 64bit
> from the userfault fd that will contain the same 64bit
> USERFAULT_PROTOCOL version that userland asked. Otherwise userfault
> will read __u64 value -1ULL (aka USERFAULTFD_UNKNOWN_PROTOCOL) and it
> will have to try again by writing an older protocol version if
> suitable for its usage too, and read it back again until it stops
> reading -1ULL. After that the userfaultfd protocol starts.
>
> The protocol consists in the userfault fd reads 64bit in size
> providing userland the fault addresses. After a userfault address has
> been read and the fault is resolved by userland, the application must
> write back 128bits in the form of [ start, end ] range (64bit each)
> that will tell the kernel such a range has been mapped. Multiple read
> userfaults can be resolved in a single range write. poll() can be used
> to know when there are new userfaults to read (POLLIN) and when there
> are threads waiting a wakeup through a range write (POLLOUT).
>
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> +#ifdef CONFIG_PROC_FS
> +static int userfaultfd_show_fdinfo(struct seq_file *m, struct file *f)
> +{
> + struct userfaultfd_ctx *ctx = f->private_data;
> + int ret;
> + wait_queue_t *wq;
> + struct userfaultfd_wait_queue *uwq;
> + unsigned long pending = 0, total = 0;
> +
> + spin_lock(&ctx->fault_wqh.lock);
> + list_for_each_entry(wq, &ctx->fault_wqh.task_list, task_list) {
> + uwq = container_of(wq, struct userfaultfd_wait_queue, wq);
> + if (uwq->pending)
> + pending++;
> + total++;
> + }
> + spin_unlock(&ctx->fault_wqh.lock);
> +
> + ret = seq_printf(m, "pending:\t%lu\ntotal:\t%lu\n", pending, total);
This should show the protocol version, too.
> +
> +SYSCALL_DEFINE1(userfaultfd, int, flags)
> +{
> + int fd, error;
> + struct file *file;
This looks like it can't be used more than once in a process. That will
be unfortunate for libraries. Would it be feasible to either have
userfaultfd claim a range of addresses or for a vma to be explicitly
associated with a userfaultfd? (In the latter case, giant PROT_NONE
MAP_NORESERVE mappings could be used.)
next prev parent reply other threads:[~2014-07-03 1:56 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-02 16:50 [PATCH 00/10] RFC: userfault Andrea Arcangeli
2014-07-02 16:50 ` [Qemu-devel] " Andrea Arcangeli
2014-07-02 16:50 ` Andrea Arcangeli
2014-07-02 16:50 ` [PATCH 01/10] mm: madvise MADV_USERFAULT: prepare vm_flags to allow more than 32bits Andrea Arcangeli
2014-07-02 16:50 ` [Qemu-devel] " Andrea Arcangeli
2014-07-02 16:50 ` Andrea Arcangeli
2014-07-02 16:50 ` [PATCH 02/10] mm: madvise MADV_USERFAULT Andrea Arcangeli
2014-07-02 16:50 ` [Qemu-devel] " Andrea Arcangeli
2014-07-02 16:50 ` Andrea Arcangeli
2014-07-02 16:50 ` [PATCH 03/10] mm: PT lock: export double_pt_lock/unlock Andrea Arcangeli
2014-07-02 16:50 ` [Qemu-devel] " Andrea Arcangeli
2014-07-02 16:50 ` Andrea Arcangeli
2014-07-02 16:50 ` [PATCH 04/10] mm: rmap preparation for remap_anon_pages Andrea Arcangeli
2014-07-02 16:50 ` [Qemu-devel] " Andrea Arcangeli
2014-07-02 16:50 ` Andrea Arcangeli
2014-07-02 16:50 ` [PATCH 05/10] mm: swp_entry_swapcount Andrea Arcangeli
2014-07-02 16:50 ` [Qemu-devel] " Andrea Arcangeli
2014-07-02 16:50 ` Andrea Arcangeli
2014-07-02 16:50 ` [PATCH 06/10] mm: sys_remap_anon_pages Andrea Arcangeli
2014-07-02 16:50 ` [Qemu-devel] " Andrea Arcangeli
2014-07-02 16:50 ` Andrea Arcangeli
2014-07-04 11:30 ` Michael Kerrisk
2014-07-04 11:30 ` [Qemu-devel] " Michael Kerrisk
2014-07-04 11:30 ` Michael Kerrisk
2014-07-04 11:30 ` Michael Kerrisk
2014-07-02 16:50 ` [PATCH 07/10] waitqueue: add nr wake parameter to __wake_up_locked_key Andrea Arcangeli
2014-07-02 16:50 ` [Qemu-devel] " Andrea Arcangeli
2014-07-02 16:50 ` Andrea Arcangeli
2014-07-02 16:50 ` [PATCH 08/10] userfaultfd: add new syscall to provide memory externalization Andrea Arcangeli
2014-07-02 16:50 ` [Qemu-devel] " Andrea Arcangeli
2014-07-02 16:50 ` Andrea Arcangeli
2014-07-03 1:56 ` Andy Lutomirski [this message]
2014-07-03 1:56 ` [Qemu-devel] " Andy Lutomirski
2014-07-03 1:56 ` Andy Lutomirski
2014-07-03 13:19 ` Andrea Arcangeli
2014-07-03 13:19 ` [Qemu-devel] " Andrea Arcangeli
2014-07-03 13:19 ` Andrea Arcangeli
2014-07-03 13:19 ` Andrea Arcangeli
2014-07-03 13:19 ` Andrea Arcangeli
2014-07-02 16:50 ` [PATCH 09/10] userfaultfd: make userfaultfd_write non blocking Andrea Arcangeli
2014-07-02 16:50 ` [Qemu-devel] " Andrea Arcangeli
2014-07-02 16:50 ` Andrea Arcangeli
2014-07-02 16:50 ` [PATCH 10/10] userfaultfd: use VM_FAULT_RETRY in handle_userfault() Andrea Arcangeli
2014-07-02 16:50 ` [Qemu-devel] " Andrea Arcangeli
2014-07-02 16:50 ` Andrea Arcangeli
[not found] ` <1404319816-30229-1-git-send-email-aarcange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-07-03 1:51 ` [PATCH 00/10] RFC: userfault Andy Lutomirski
2014-07-03 1:51 ` [Qemu-devel] " Andy Lutomirski
2014-07-03 1:51 ` Andy Lutomirski
2014-07-03 1:51 ` Andy Lutomirski
2014-07-03 13:45 ` [Qemu-devel] " Christopher Covington
2014-07-03 13:45 ` Christopher Covington
2014-07-03 13:45 ` Christopher Covington
2014-07-03 14:08 ` Andrea Arcangeli
2014-07-03 14:08 ` Andrea Arcangeli
2014-07-03 14:08 ` Andrea Arcangeli
2014-07-03 15:41 ` Dave Hansen
2014-07-03 15:41 ` [Qemu-devel] " Dave Hansen
2014-07-03 15:41 ` Dave Hansen
2014-07-03 15:41 ` Dave Hansen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53B4B833.9010508@mit.edu \
--to=luto@amacapital.net \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=dave@sr71.net \
--cc=dgilbert@redhat.com \
--cc=dmitry.adamushko@gmail.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=keithp@keithp.com \
--cc=kernel-team@android.com \
--cc=kosaki.motohiro@gmail.com \
--cc=kvm@vger.kernel.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=mh@glandium.org \
--cc=minchan@kernel.org \
--cc=neilb@suse.de \
--cc=peter.huangpeng@huawei.com \
--cc=qemu-devel@nongnu.org \
--cc=riel@redhat.com \
--cc=rlove@google.com \
--cc=tglek@mozilla.com \
--cc=walken@google.com \
--cc=yamahata@valinux.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.