Linux userland API discussions

* Re: [Qemu-devel] [PATCH 08/17] mm: madvise MADV_USERFAULT
From: Kirill A. Shutemov @ 2014-10-07 11:30 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Robert Love, Dave Hansen, Jan Kara, kvm, Neil Brown,
	Stefan Hajnoczi, qemu-devel, linux-mm, KOSAKI Motohiro,
	Michel Lespinasse, Andrea Arcangeli, Taras Glek, Juan Quintela,
	Hugh Dickins, Isaku Yamahata, Mel Gorman, Sasha Levin,
	Android Kernel Team, Andrew Jones, Huangpeng (Peter),
	Andres Lagar-Cavilla, Christopher Covington, Anthony Liguori,
	Paolo Bonzini, Keith Packard
In-Reply-To: <20141007110102.GJ2404@work-vm>

On Tue, Oct 07, 2014 at 12:01:02PM +0100, Dr. David Alan Gilbert wrote:
> * Kirill A. Shutemov (kirill@shutemov.name) wrote:
> > On Tue, Oct 07, 2014 at 11:46:04AM +0100, Dr. David Alan Gilbert wrote:
> > > * Kirill A. Shutemov (kirill@shutemov.name) wrote:
> > > > On Fri, Oct 03, 2014 at 07:07:58PM +0200, Andrea Arcangeli wrote:
> > > > > MADV_USERFAULT is a new madvise flag that will set VM_USERFAULT in the
> > > > > vma flags. Whenever VM_USERFAULT is set in an anonymous vma, if
> > > > > userland touches a still unmapped virtual address, a sigbus signal is
> > > > > sent instead of allocating a new page. The sigbus signal handler will
> > > > > then resolve the page fault in userland by calling the
> > > > > remap_anon_pages syscall.
> > > > 
> > > > Hm. I wounder if this functionality really fits madvise(2) interface: as
> > > > far as I understand it, it provides a way to give a *hint* to kernel which
> > > > may or may not trigger an action from kernel side. I don't think an
> > > > application will behaive reasonably if kernel ignore the *advise* and will
> > > > not send SIGBUS, but allocate memory.
> > > 
> > > Aren't DONTNEED and DONTDUMP  similar cases of madvise operations that are
> > > expected to do what they say ?
> > 
> > No. If kernel would ignore MADV_DONTNEED or MADV_DONTDUMP it will not
> > affect correctness, just behaviour will be suboptimal: more than needed
> > memory used or wasted space in coredump.
> 
> That's not how the manpage reads for DONTNEED; it calls it out as a special
> case near the top, and explicitly says what will happen if you read the
> area marked as DONTNEED.

Your are right. MADV_DONTNEED doesn't fit the interface too. That's bad
and we can't fix it. But it's not a reason to make this mistake again.

Read the next sentence: "The kernel is free to ignore the advice."

Note, POSIX_MADV_DONTNEED has totally different semantics.

> It looks like there are openssl patches that use DONTDUMP to explicitly
> make sure keys etc don't land in cores.

That's nice to have. But openssl works on systems without the interface,
meaning it's not essential for functionality.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply