From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965098AbVHJN0h (ORCPT ); Wed, 10 Aug 2005 09:26:37 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S965104AbVHJN0h (ORCPT ); Wed, 10 Aug 2005 09:26:37 -0400 Received: from Volter-FW.ser.netvision.net.il ([212.143.107.30]:18589 "EHLO taurus.voltaire.com") by vger.kernel.org with ESMTP id S965098AbVHJN0g (ORCPT ); Wed, 10 Aug 2005 09:26:36 -0400 Date: Wed, 10 Aug 2005 16:26:12 +0300 To: Hugh Dickins Cc: "Michael S. Tsirkin" , Roland Dreier , linux-kernel@vger.kernel.org, openib-general@openib.org Subject: Re: [openib-general] Re: [PATCH repost] PROT_DONTCOPY: ifiniband uverbs fork support Message-ID: <20050810132611.GP16361@minantech.com> References: <20050719165542.GB16028@mellanox.co.il> <20050725171928.GC12206@mellanox.co.il> <20050726133553.GA22276@mellanox.co.il> <20050810083943.GM16361@minantech.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: From: glebn@voltaire.com (Gleb Natapov) X-OriginalArrivalTime: 10 Aug 2005 13:26:43.0232 (UTC) FILETIME=[26005200:01C59DAF] Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Aug 10, 2005 at 02:22:40PM +0100, Hugh Dickins wrote: > On Wed, 10 Aug 2005, Gleb Natapov wrote: > > On Tue, Aug 09, 2005 at 07:13:33PM +0100, Hugh Dickins wrote: > > > Even more I'd prefer one of these two solutions below, which sidestep > > > that uncleanliness - but both of these would be in mmap only, no clean > > > way to change afterwards (except by munmap or mmap MAP_FIXED): > > > > > > 1. Use the standard mmap(NULL, len, PROT_READ|PROT_WRITE, > > > MAP_SHARED|MAP_ANONYMOUS, -1, 0) which gives you a memory object > > > shared with children, so write-protection and COW won't come into it. > > > > > > or if there's good reason why that's no good, > > > > > > 2. Define a MAP_DONTCOPY to mmap: we have a fine tradition of MAP_flags > > > to achieve this or that effect, adding one more would be cleaner than > > > now corrupting mprotect or madvise. > > > > They are both relying on the way user allocates memory for RDMA. The idea > > behind Michael's propose it to let library (MPI for instance) to tell to the > > kernel that the pages are used for RDMA and it is not safe to copy them now. > > The pages may be anywhere in the process address space bss, text, stack > > whatever. > > That's a nice aim, but I don't think it can quite be done in the face of > the fork issue - one way or another, we have to change the behaviour of a > forked RDMA area slightly, which might interfere with common assumptions. > > Your stack example is a good one: if we end up setting VM_DONTCOPY on > the user stack, then I don't think fork's child will get very far without > hitting a SIGSEGV. I know, but I prefer child SIGSEGV than silent data corruption. In most cases child will exec immediately after fork so no problem in this case. -- Gleb.