public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "H. Peter Anvin" <hpa@zytor.com>
To: Linus Torvalds <torvalds@linux-foundation.org>,
	Andy Lutomirski <luto@amacapital.net>
Cc: Andi Kleen <andi@firstfloor.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	the arch/x86 maintainers <x86@kernel.org>,
	Andi Kleen <ak@linux.intel.com>
Subject: Re: [PATCH] Add a text_poke syscall v2
Date: Wed, 27 Nov 2013 11:57:59 -0800	[thread overview]
Message-ID: <52964EC7.9000504@zytor.com> (raw)
In-Reply-To: <CA+55aFzBXzQS90XYZKHRr0N66uBrDmS=o_U9u=e4qq2CzTg8ww@mail.gmail.com>

On 11/26/2013 12:03 PM, Linus Torvalds wrote:
> On Tue, Nov 26, 2013 at 11:05 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>>
>> IIRC someone proposed that, rather than specifying a "handler", that any
>> user thread that traps just wait until the poke completes.  This would
>> complicate the kernel implementation a bit, but it would make the user
>> code a good deal simpler.  Is there any reason that this is a bad idea?
> 
> Please do it this way instead, because user space will get the
> callback version wrong and then - because it never actually triggers
> in practice in normal situations - it will cause very *very* subtle
> bugs that we can't fix.
> 
> Making the kernel serialize the accesses is the right thing to do.
> Just a new per-mm mutex should trivially do it, then you don't even
> have to check the "current->mm == bp_target_mm" thing at all, you just
> make the bp handler do a simple
> 
>     if (mutex_lock_interruptible(&mm->text_poke_mutex) >= 0)
>         mutex_unlock(&mm->text_poke_mutex);
> 
> and return. All done.
> 
> Plus the callback thing is pointless if we can do the instruction
> switch atomically (which would be true for UP, for single-thread, and
> potentially for certain sizes/alignments coupled with known rules for
> particular micro-architectures). So it's not a particularly good
> interface anyway.
> 
> Btw, I also think that there's a separate problem wrt shared pages.
> Should we perhaps only allow this in private mappings? Because right
> now it has that "current->mm == bp_target_mm" thing, and generally it
> only works on one particular mm, but by using "get_user_pages_fast(,
> 1,..)" it really only requires write permissions on the page. So it
> could be shared mapping, and I could easily see people doing that on
> purpose ("open executable file, then use text_poke() to change it for
> this architecture") and THAT DOES NOT WORK with the current patch if
> somebody else is also running that app..
> 

I have already said I don't think this is a good interface as it doesn't
provide sane semantics for batching changes -- the whole handler and
timeout mechanism (which isn't even implemented yet, and as a result
probably will never be used by userspace) is basically a hack for that.

The other alternative is that we only expose the "synchronize all cores"
operation as a system call, and let user space do the rest of the work.
 Anything else can be done in user space, and a lot of the subtleties
with locking pages, user space access and so on simply go away... user
space will be responsible or will get protection faults.

	-hpa



  reply	other threads:[~2013-11-27 19:58 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-26  0:37 [PATCH] Add a text_poke syscall v2 Andi Kleen
2013-11-26 19:05 ` Andy Lutomirski
2013-11-26 19:11   ` Andi Kleen
2013-11-26 20:03   ` Linus Torvalds
2013-11-27 19:57     ` H. Peter Anvin [this message]
2013-11-27 22:02       ` H. Peter Anvin
2013-11-27 22:21         ` Andy Lutomirski
2013-11-27 22:21         ` Borislav Petkov
2013-11-27 22:24           ` H. Peter Anvin
2013-11-27 22:25           ` H. Peter Anvin
2013-11-27 22:29             ` Borislav Petkov
2013-11-27 22:31               ` H. Peter Anvin
2013-11-27 23:04                 ` Linus Torvalds
2013-11-27 23:13                   ` Borislav Petkov
2013-11-27 22:40               ` H. Peter Anvin
2013-11-27 23:10                 ` Borislav Petkov
2013-11-27 23:20                   ` H. Peter Anvin
2013-11-27 23:40                     ` Borislav Petkov
2013-11-27 23:47                       ` H. Peter Anvin
2013-11-27 22:41         ` Linus Torvalds
2013-11-27 22:53           ` H. Peter Anvin
2013-11-27 23:15             ` Linus Torvalds
2013-11-27 23:28               ` H. Peter Anvin
2013-11-28  2:01                 ` Linus Torvalds
2013-11-28  2:10                   ` H. Peter Anvin
2013-11-28  9:12                   ` Jiri Kosina
2013-11-27 23:44               ` Andi Kleen
2013-11-29 18:35 ` Oleg Nesterov
2013-11-29 19:54   ` Andi Kleen
2013-11-29 20:05     ` Oleg Nesterov
2013-11-29 20:17       ` H. Peter Anvin
2013-11-29 20:35         ` Oleg Nesterov
2013-11-29 21:24           ` H. Peter Anvin
2013-11-30 14:56             ` Oleg Nesterov
2013-11-29 23:24       ` Jiri Kosina
2013-11-30  0:22         ` Linus Torvalds
2013-12-03 18:49           ` [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly Oleg Nesterov
2013-12-03 19:00             ` Linus Torvalds
2013-12-03 19:20               ` H. Peter Anvin
2013-12-03 20:01                 ` Oleg Nesterov
2013-12-03 20:21                   ` H. Peter Anvin
2013-12-03 20:38                     ` Oleg Nesterov
2013-12-03 20:43                       ` H. Peter Anvin
2013-12-03 20:54                         ` Oleg Nesterov
2013-12-03 22:01                           ` Linus Torvalds
2013-12-03 23:47                             ` H. Peter Anvin
2013-12-04 11:30                               ` Oleg Nesterov
2013-12-04 11:11                             ` Oleg Nesterov
2013-12-04 16:01                               ` H. Peter Anvin
2013-12-04 16:48                                 ` Oleg Nesterov
2013-12-04 16:54                                   ` H. Peter Anvin
2013-12-04 17:15                                     ` Linus Torvalds
2013-12-04 17:43                                       ` Oleg Nesterov
2013-12-05 17:23                                         ` Oleg Nesterov
2013-12-05 17:49                                           ` Borislav Petkov
2013-12-05 18:45                                             ` Oleg Nesterov
2013-12-04 18:32                                       ` H. Peter Anvin
2013-12-05  8:28                                       ` Jon Medhurst (Tixy)
2013-12-03 22:42                           ` H. Peter Anvin
2013-12-03 19:53               ` Oleg Nesterov
2013-11-30 15:20         ` [PATCH] Add a text_poke syscall v2 Oleg Nesterov
2013-11-30 16:51         ` Oleg Nesterov
2013-11-30 17:31           ` Oleg Nesterov
2013-11-30  5:16       ` H. Peter Anvin
2013-11-30 14:52         ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52964EC7.9000504@zytor.com \
    --to=hpa@zytor.com \
    --cc=ak@linux.intel.com \
    --cc=andi@firstfloor.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox