linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: prakash sangappa <prakash.sangappa@oracle.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Christoph Hellwig <hch@infradead.org>,
	linux-api@vger.kernel.org, John Stultz <john.stultz@linaro.org>
Subject: Re: [RFC PATCH] userfaultfd: Add feature to request for a signal delivery
Date: Fri, 30 Jun 2017 17:55:08 -0700	[thread overview]
Message-ID: <5956F2EC.1000805@oracle.com> (raw)
In-Reply-To: <20170630130813.GA5738@redhat.com>


On 6/30/2017 6:08 AM, Andrea Arcangeli wrote:
> On Fri, Jun 30, 2017 at 11:47:35AM +0200, Michal Hocko wrote:
[...]
>> As an aside, I rememeber that prior to MADV_FREE there was long
>> discussion about lazy freeing of memory from userspace. Some users
>> wanted to be signalled when their memory was freed by the system so that
>> they could rebuild the original content (e.g. uncompressed images in
>> memory). It seems like MADV_FREE + this signalling could be used for
>> that usecase. John would surely know more about those usecases.
> That would provide an equivalent API to the one volatile pages
> provided agreed. So it would allow to adapt code (if any?) more easily
> to drop the duplicate feature in volatile pages code (however it would
> be faster if the userland code using volatile pages lazy reclaim mode
> was converted to poll the uffd so the kernel talks directly to the
> monitor without involving a SIGBUS signal handler which will cause
> spurious enter/exit if compared to signal-less uffd API).
>
> The main benefit in my view is not volatile pages but that
> UFFD_FEATURE_SIGBUS would work equally well to enforce robustness on
> all kind of memory not only hugetlbfs (so one could run the database
> with robustness on THP over tmpfs) and the new cache can be injected
> in the filesystem using UFFDIO_COPY which is likely faster than
> fallocate as UFFDIO_COPY was already demonstrated to be faster even
> than a regular page fault.

Interesting that UFFDIO_COPY is faster then fallocate().  In the DB use case
the page does not need to be allocated at the time a process trips on 
the hugetlbfs
file hole and receives SIGBUS.  fallocate() is called on the hugetlbfs file,
when more memory needs to be allocated by a separate process.

> It's also simpler to handle backwards compatibility with the
> UFFDIO_API call, that allows probing if UFFD_FEATURE_SIGBUS is
> supported by the running kernel regardless of kernel version (so it
> can be backported and enabled by the database, without the database
> noticing it's on a older kernel version).

Yes, this is useful as this change will need to be back ported.

> So while this wasn't the intended way to use the userfault and I
> already pointed out the possibility to use a single monitor to do all
> this, I'm positive about UFFD_FEATURE_SIGBUS if the overhead of having
> a monitor is so concerning.
>
> Ultimately there are many pros and just a single cons: the branch in
> handle_userfault().
>
> I wonder if it would be possible to use static_branch_enable() in
> UFFDIO_API and static_branch_unlikely in handle_userfault() to
> eliminate that branch but perhaps it's overkill and UFFDIO_API is
> unprivileged and it would send an IPI to all CPUs. I don't think we
> normally expose the static_branch_enable() to unprivileged userland
> and making UFFD_FEATURE_SIGBUS a privileged op doesn't sound
> attractive (although the alternative of altering a hugetlbfs mount
> option would be a privileged op).

Regarding hugetlbfs mount option, one consideration is to allow mounts of
hugetlbfs inside user namespaces's mount namespace. Which would allow
non privileged processes to mount hugetlbfs for use inside a user 
namespace.
This may be needed even for the 'min_size' mount option using which an
application could reserve huge pages and mount a filesystem for its use,
with out the need to have privileges given the system has enough hugepages
configured.  It seems if non privileged processes are allowed to mount 
hugetlbfs
filesystem, then min_size should be subject to some resource limits.

Mounting inside user namespace will be a different patch proposal later.


>
> Thanks,
> Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-07-01  0:55 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-26 19:46 [RFC PATCH] userfaultfd: Add feature to request for a signal delivery Prakash Sangappa
2017-06-27  7:06 ` Michal Hocko
2017-06-27 15:35   ` Mike Rapoport
2017-06-27 16:01     ` Prakash Sangappa
2017-06-28 13:18       ` Mike Rapoport
2017-06-28 18:23         ` Prakash Sangappa
2017-06-29  8:09           ` Michal Hocko
2017-06-29 21:41             ` prakash.sangappa
2017-06-30  9:47               ` Michal Hocko
2017-06-30 13:08                 ` Andrea Arcangeli
2017-07-01  0:55                   ` prakash sangappa [this message]
2017-07-04 16:40                     ` Andrea Arcangeli
2017-07-05 22:24                       ` prakash.sangappa
2017-07-05 18:41                 ` John Stultz
2017-06-29 10:46           ` Mike Rapoport
2017-06-29 21:49             ` prakash.sangappa
2017-06-27 15:47   ` Prakash Sangappa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5956F2EC.1000805@oracle.com \
    --to=prakash.sangappa@oracle.com \
    --cc=aarcange@redhat.com \
    --cc=dave.hansen@intel.com \
    --cc=hch@infradead.org \
    --cc=john.stultz@linaro.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=rppt@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).