From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757540Ab3K0T6b (ORCPT ); Wed, 27 Nov 2013 14:58:31 -0500 Received: from terminus.zytor.com ([198.137.202.10]:51816 "EHLO mail.zytor.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751648Ab3K0T63 (ORCPT ); Wed, 27 Nov 2013 14:58:29 -0500 Message-ID: <52964EC7.9000504@zytor.com> Date: Wed, 27 Nov 2013 11:57:59 -0800 From: "H. Peter Anvin" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.1.0 MIME-Version: 1.0 To: Linus Torvalds , Andy Lutomirski CC: Andi Kleen , Linux Kernel Mailing List , the arch/x86 maintainers , Andi Kleen Subject: Re: [PATCH] Add a text_poke syscall v2 References: <1385426236-14960-1-git-send-email-andi@firstfloor.org> <5294F10C.8060901@amacapital.net> In-Reply-To: X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/26/2013 12:03 PM, Linus Torvalds wrote: > On Tue, Nov 26, 2013 at 11:05 AM, Andy Lutomirski wrote: >> >> IIRC someone proposed that, rather than specifying a "handler", that any >> user thread that traps just wait until the poke completes. This would >> complicate the kernel implementation a bit, but it would make the user >> code a good deal simpler. Is there any reason that this is a bad idea? > > Please do it this way instead, because user space will get the > callback version wrong and then - because it never actually triggers > in practice in normal situations - it will cause very *very* subtle > bugs that we can't fix. > > Making the kernel serialize the accesses is the right thing to do. > Just a new per-mm mutex should trivially do it, then you don't even > have to check the "current->mm == bp_target_mm" thing at all, you just > make the bp handler do a simple > > if (mutex_lock_interruptible(&mm->text_poke_mutex) >= 0) > mutex_unlock(&mm->text_poke_mutex); > > and return. All done. > > Plus the callback thing is pointless if we can do the instruction > switch atomically (which would be true for UP, for single-thread, and > potentially for certain sizes/alignments coupled with known rules for > particular micro-architectures). So it's not a particularly good > interface anyway. > > Btw, I also think that there's a separate problem wrt shared pages. > Should we perhaps only allow this in private mappings? Because right > now it has that "current->mm == bp_target_mm" thing, and generally it > only works on one particular mm, but by using "get_user_pages_fast(, > 1,..)" it really only requires write permissions on the page. So it > could be shared mapping, and I could easily see people doing that on > purpose ("open executable file, then use text_poke() to change it for > this architecture") and THAT DOES NOT WORK with the current patch if > somebody else is also running that app.. > I have already said I don't think this is a good interface as it doesn't provide sane semantics for batching changes -- the whole handler and timeout mechanism (which isn't even implemented yet, and as a result probably will never be used by userspace) is basically a hack for that. The other alternative is that we only expose the "synchronize all cores" operation as a system call, and let user space do the rest of the work. Anything else can be done in user space, and a lot of the subtleties with locking pages, user space access and so on simply go away... user space will be responsible or will get protection faults. -hpa