linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: Mike Rapoport <rppt@linux.ibm.com>
Cc: lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, Peter Xu <peterx@redhat.com>,
	Blake Caldwell <blake.caldwell@colorado.edu>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Michal Hocko <mhocko@kernel.org>, Mel Gorman <mgorman@suse.de>,
	Vlastimil Babka <vbabka@suse.cz>,
	David Rientjes <rientjes@google.com>,
	Andrei Vagin <avagin@gmail.com>,
	Pavel Emelyanov <xemul@virtuozzo.com>
Subject: Re: [LSF/MM TOPIC]: userfaultfd (was: [LSF/MM TOPIC] NUMA remote THP vs NUMA local non-THP under MADV_HUGEPAGE)
Date: Wed, 30 Jan 2019 09:43:04 -0500	[thread overview]
Message-ID: <20190130144304.GA19021@redhat.com> (raw)
In-Reply-To: <20190130081336.GC17937@rapoport-lnx>

Hello Mike,

On Wed, Jan 30, 2019 at 10:13:36AM +0200, Mike Rapoport wrote:
> We (CRIU) have some concerns about obsoleting soft-dirty in favor of
> uffd-wp. If there are other soft-dirty users these concerns would be
> relevant to them as well.
> 
> With soft-dirty we collect the information about the changed memory every
> pre-dump iteration in the following manner:
> * freeze the tasks
> * find entries in /proc/pid/pagemap with SOFT_DIRTY set
> * unfreeze the tasks
> * dump the modified pages to disk/remote host
> 
> While we do need to traverse the /proc/pid/pagemap to identify dirty pages,
> in between the pre-dump iterations and during the actual memory dump the
> tasks are running freely.
> 
> If we are to switch to uffd-wp, every write by the snapshotted/migrated
> task will incur latency of uffd-wp processing by the monitor.

That's valid concern indeed.

I didn't go into the details of what additional feature is needed in
addition to what is already present present in Peter's current
patchset, but you're correct that in order to perform well to do the
softdirty equivalent, we'll also need to add an async event model.

The async event model would be set during UFFD registration. It'd work
like async signals, you just queue up uffd events in the kernel by
allocating them with a slab object (not in the kernel stack of the
faulting process). Only if the monitor won't read() them fast enough
it'll eventually block the write protect fault and release the
mmap_sem but the page fault would always be resolved by the kernel
even in that case. For the monitor there'll be just a stream of
uffd_msg structures to read in multiples of the uffd_msg structure
size with a single syscall per wakeup of the monitor. Conceptually
it'd work the same as how PML works for EPT.

The main downside will be an allocation per fault (soft dirty doesn't
need to do such allocation), but there will be no round-trip to
userland latency added to the wrprotect fault that needs to be logged.

We need the synchronous/blocking uffd-wp for other things that aren't
related to soft dirty and can't be achieved with an async model like
softdirty. Adding an async model later would be a self contained
feature inside uffd.

So the idea would be to ignore any comparison with softdirty until
uffd-wp is finalized, and then evaluate the possibility of adding an
async model which would be simple thing to add in comparison of the
uffd-wp feature itself.

The theoretical expectation would be that softdirty would perform
better for small processes (but for those the overall logging overhead
is small anyway), but when it gets to the hundred-gigabytes/terabytes
regions, async uffd-wp should perform much better.

Thanks,
Andrea


  parent reply	other threads:[~2019-01-30 14:43 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-29 23:40 [LSF/MM TOPIC] NUMA remote THP vs NUMA local non-THP under MADV_HUGEPAGE Andrea Arcangeli
2019-01-30  7:17 ` Michal Hocko
2019-01-30  8:13 ` [LSF/MM TOPIC]: userfaultfd (was: [LSF/MM TOPIC] NUMA remote THP vs NUMA local non-THP under MADV_HUGEPAGE) Mike Rapoport
2019-01-30  9:23   ` Peter Xu
2019-01-31  9:54     ` Mike Rapoport
2019-01-30 14:43   ` Andrea Arcangeli [this message]
2019-01-30 23:14 ` [LSF/MM TOPIC] NUMA remote THP vs NUMA local non-THP under MADV_HUGEPAGE Mike Kravetz
2019-02-01 14:17 ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190130144304.GA19021@redhat.com \
    --to=aarcange@redhat.com \
    --cc=avagin@gmail.com \
    --cc=blake.caldwell@colorado.edu \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=peterx@redhat.com \
    --cc=rientjes@google.com \
    --cc=rppt@linux.ibm.com \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=vbabka@suse.cz \
    --cc=xemul@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).