From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:54165) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XbWb8-0002zZ-7H for qemu-devel@nongnu.org; Tue, 07 Oct 2014 11:22:52 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XbWb2-0001Nd-6d for qemu-devel@nongnu.org; Tue, 07 Oct 2014 11:22:46 -0400 Received: from mta-out1.inet.fi ([62.71.2.226]:42252 helo=jenni1.inet.fi) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XbWb1-0001NZ-SD for qemu-devel@nongnu.org; Tue, 07 Oct 2014 11:22:40 -0400 Date: Tue, 7 Oct 2014 18:21:50 +0300 From: "Kirill A. Shutemov" Message-ID: <20141007152150.GA989@node.dhcp.inet.fi> References: <1412356087-16115-1-git-send-email-aarcange@redhat.com> <1412356087-16115-9-git-send-email-aarcange@redhat.com> <20141007103645.GB30762@node.dhcp.inet.fi> <20141007132458.GZ2342@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141007132458.GZ2342@redhat.com> Subject: Re: [Qemu-devel] [PATCH 08/17] mm: madvise MADV_USERFAULT List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Andrea Arcangeli Cc: Robert Love , Dave Hansen , Jan Kara , kvm@vger.kernel.org, Neil Brown , Stefan Hajnoczi , qemu-devel@nongnu.org, linux-mm@kvack.org, KOSAKI Motohiro , Michel Lespinasse , Taras Glek , Andrew Jones , Juan Quintela , Hugh Dickins , Isaku Yamahata , Mel Gorman , Sasha Levin , Android Kernel Team , "\\\"Dr. David Alan Gilbert\\\"" , "Huangpeng (Peter)" , Andres Lagar-Cavilla , Christopher Covington , Anthony Liguori , Mike Hommey , Keith Packard , Wenchao Xia , linux-api@vger.kernel.org, linux-kernel@vger.kernel.org, Andy Lutomirski , Minchan Kim , Dmitry Adamushko , Johannes Weiner , Paolo Bonzini , Andrew Morton , Linus Torvalds , Peter Feiner On Tue, Oct 07, 2014 at 03:24:58PM +0200, Andrea Arcangeli wrote: > Hi Kirill, > > On Tue, Oct 07, 2014 at 01:36:45PM +0300, Kirill A. Shutemov wrote: > > On Fri, Oct 03, 2014 at 07:07:58PM +0200, Andrea Arcangeli wrote: > > > MADV_USERFAULT is a new madvise flag that will set VM_USERFAULT in the > > > vma flags. Whenever VM_USERFAULT is set in an anonymous vma, if > > > userland touches a still unmapped virtual address, a sigbus signal is > > > sent instead of allocating a new page. The sigbus signal handler will > > > then resolve the page fault in userland by calling the > > > remap_anon_pages syscall. > > > > Hm. I wounder if this functionality really fits madvise(2) interface: as > > far as I understand it, it provides a way to give a *hint* to kernel which > > may or may not trigger an action from kernel side. I don't think an > > application will behaive reasonably if kernel ignore the *advise* and will > > not send SIGBUS, but allocate memory. > > > > I would suggest to consider to use some other interface for the > > functionality: a new syscall or, perhaps, mprotect(). > > I didn't feel like adding PROT_USERFAULT to mprotect, which looks > hardwired to just these flags: PROT_NOALLOC may be? > > PROT_NONE The memory cannot be accessed at all. > > PROT_READ The memory can be read. > > PROT_WRITE The memory can be modified. > > PROT_EXEC The memory can be executed. To be complete: PROT_GROWSDOWN, PROT_GROWSUP and unused PROT_SEM. > So here somebody should comment and choose between: > > 1) set VM_USERFAULT with mprotect(PROT_USERFAULT) instead of > the current madvise(MADV_USERFAULT) > > 2) drop MADV_USERFAULT and VM_USERFAULT and force the usage of the > userfaultfd protocol as the only way for userland to catch > userfaults (each userfaultfd must already register itself into its > own virtual memory ranges so it's a trivial change for userfaultfd > users that deletes just 1 or 2 lines of userland code, but it would > prevent to use the SIGBUS behavior with info->si_addr=faultaddr for > other users) > > 3) keep things as they are now: use MADV_USERFAULT for SIGBUS > userfaults, with optional intersection between the > vm_flags&VM_USERFAULT ranges and the userfaultfd registered ranges > with vma->vm_userfaultfd_ctx!=NULL to know if to engage the > userfaultfd protocol instead of the plain SIGBUS 4) new syscall? > I will update the code accordingly to feedback, so please comment. I don't have strong points on this. Just *feel* it doesn't fit advice semantics. The only userspace interface I've designed was not proven good by time. I would listen what senior maintainers say. :) -- Kirill A. Shutemov