All of lore.kernel.org
 help / color / mirror / Atom feed
From: "H. Peter Anvin" <hpa@zytor.com>
To: Linus Torvalds <torvalds@linux-foundation.org>,
	Andy Lutomirski <luto@amacapital.net>
Cc: Andi Kleen <andi@firstfloor.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	the arch/x86 maintainers <x86@kernel.org>,
	Andi Kleen <ak@linux.intel.com>
Subject: Re: [PATCH] Add a text_poke syscall v2
Date: Wed, 27 Nov 2013 11:57:59 -0800	[thread overview]
Message-ID: <52964EC7.9000504@zytor.com> (raw)
In-Reply-To: <CA+55aFzBXzQS90XYZKHRr0N66uBrDmS=o_U9u=e4qq2CzTg8ww@mail.gmail.com>

On 11/26/2013 12:03 PM, Linus Torvalds wrote:
> On Tue, Nov 26, 2013 at 11:05 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>>
>> IIRC someone proposed that, rather than specifying a "handler", that any
>> user thread that traps just wait until the poke completes.  This would
>> complicate the kernel implementation a bit, but it would make the user
>> code a good deal simpler.  Is there any reason that this is a bad idea?
> 
> Please do it this way instead, because user space will get the
> callback version wrong and then - because it never actually triggers
> in practice in normal situations - it will cause very *very* subtle
> bugs that we can't fix.
> 
> Making the kernel serialize the accesses is the right thing to do.
> Just a new per-mm mutex should trivially do it, then you don't even
> have to check the "current->mm == bp_target_mm" thing at all, you just
> make the bp handler do a simple
> 
>     if (mutex_lock_interruptible(&mm->text_poke_mutex) >= 0)
>         mutex_unlock(&mm->text_poke_mutex);
> 
> and return. All done.
> 
> Plus the callback thing is pointless if we can do the instruction
> switch atomically (which would be true for UP, for single-thread, and
> potentially for certain sizes/alignments coupled with known rules for
> particular micro-architectures). So it's not a particularly good
> interface anyway.
> 
> Btw, I also think that there's a separate problem wrt shared pages.
> Should we perhaps only allow this in private mappings? Because right
> now it has that "current->mm == bp_target_mm" thing, and generally it
> only works on one particular mm, but by using "get_user_pages_fast(,
> 1,..)" it really only requires write permissions on the page. So it
> could be shared mapping, and I could easily see people doing that on
> purpose ("open executable file, then use text_poke() to change it for
> this architecture") and THAT DOES NOT WORK with the current patch if
> somebody else is also running that app..
> 

I have already said I don't think this is a good interface as it doesn't
provide sane semantics for batching changes -- the whole handler and
timeout mechanism (which isn't even implemented yet, and as a result
probably will never be used by userspace) is basically a hack for that.

The other alternative is that we only expose the "synchronize all cores"
operation as a system call, and let user space do the rest of the work.
 Anything else can be done in user space, and a lot of the subtleties
with locking pages, user space access and so on simply go away... user
space will be responsible or will get protection faults.

	-hpa



  reply	other threads:[~2013-11-27 19:58 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-26  0:37 [PATCH] Add a text_poke syscall v2 Andi Kleen
2013-11-26 19:05 ` Andy Lutomirski
2013-11-26 19:11   ` Andi Kleen
2013-11-26 20:03   ` Linus Torvalds
2013-11-27 19:57     ` H. Peter Anvin [this message]
2013-11-27 22:02       ` H. Peter Anvin
2013-11-27 22:21         ` Andy Lutomirski
2013-11-27 22:21         ` Borislav Petkov
2013-11-27 22:24           ` H. Peter Anvin
2013-11-27 22:25           ` H. Peter Anvin
2013-11-27 22:29             ` Borislav Petkov
2013-11-27 22:31               ` H. Peter Anvin
2013-11-27 23:04                 ` Linus Torvalds
2013-11-27 23:13                   ` Borislav Petkov
2013-11-27 22:40               ` H. Peter Anvin
2013-11-27 23:10                 ` Borislav Petkov
2013-11-27 23:20                   ` H. Peter Anvin
2013-11-27 23:40                     ` Borislav Petkov
2013-11-27 23:47                       ` H. Peter Anvin
2013-11-27 22:41         ` Linus Torvalds
2013-11-27 22:53           ` H. Peter Anvin
2013-11-27 23:15             ` Linus Torvalds
2013-11-27 23:28               ` H. Peter Anvin
2013-11-28  2:01                 ` Linus Torvalds
2013-11-28  2:10                   ` H. Peter Anvin
2013-11-28  9:12                   ` Jiri Kosina
2013-11-27 23:44               ` Andi Kleen
2013-11-29 18:35 ` Oleg Nesterov
2013-11-29 19:54   ` Andi Kleen
2013-11-29 20:05     ` Oleg Nesterov
2013-11-29 20:17       ` H. Peter Anvin
2013-11-29 20:35         ` Oleg Nesterov
2013-11-29 21:24           ` H. Peter Anvin
2013-11-30 14:56             ` Oleg Nesterov
2013-11-29 23:24       ` Jiri Kosina
2013-11-30  0:22         ` Linus Torvalds
2013-12-03 18:49           ` [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly Oleg Nesterov
2013-12-03 19:00             ` Linus Torvalds
2013-12-03 19:20               ` H. Peter Anvin
2013-12-03 20:01                 ` Oleg Nesterov
2013-12-03 20:21                   ` H. Peter Anvin
2013-12-03 20:38                     ` Oleg Nesterov
2013-12-03 20:43                       ` H. Peter Anvin
2013-12-03 20:54                         ` Oleg Nesterov
2013-12-03 22:01                           ` Linus Torvalds
2013-12-03 23:47                             ` H. Peter Anvin
2013-12-04 11:30                               ` Oleg Nesterov
2013-12-04 11:11                             ` Oleg Nesterov
2013-12-04 16:01                               ` H. Peter Anvin
2013-12-04 16:48                                 ` Oleg Nesterov
2013-12-04 16:54                                   ` H. Peter Anvin
2013-12-04 17:15                                     ` Linus Torvalds
2013-12-04 17:43                                       ` Oleg Nesterov
2013-12-05 17:23                                         ` Oleg Nesterov
2013-12-05 17:49                                           ` Borislav Petkov
2013-12-05 18:45                                             ` Oleg Nesterov
2013-12-04 18:32                                       ` H. Peter Anvin
2013-12-05  8:28                                       ` Jon Medhurst (Tixy)
2013-12-03 22:42                           ` H. Peter Anvin
2013-12-03 19:53               ` Oleg Nesterov
2013-11-30 15:20         ` [PATCH] Add a text_poke syscall v2 Oleg Nesterov
2013-11-30 16:51         ` Oleg Nesterov
2013-11-30 17:31           ` Oleg Nesterov
2013-11-30  5:16       ` H. Peter Anvin
2013-11-30 14:52         ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52964EC7.9000504@zytor.com \
    --to=hpa@zytor.com \
    --cc=ak@linux.intel.com \
    --cc=andi@firstfloor.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.