From: Oleg Nesterov <oleg@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>,
Hugh Dickins <hughd@google.com>,
Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Kosina <jkosina@suse.cz>, Andi Kleen <andi@firstfloor.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
the arch/x86 maintainers <x86@kernel.org>,
Andi Kleen <ak@linux.intel.com>, "H. Peter Anvin" <hpa@zytor.com>,
Ingo Molnar <mingo@kernel.org>, Borislav Petkov <bp@alien8.de>
Subject: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly
Date: Tue, 3 Dec 2013 19:49:09 +0100 [thread overview]
Message-ID: <20131203184909.GA17571@redhat.com> (raw)
In-Reply-To: <CA+55aFxJGidV4=ot0z_Va8+y2QWTDYghTm1Gin84StZ+R1SfXg@mail.gmail.com>
On 11/29, Linus Torvalds wrote:
>
> On Fri, Nov 29, 2013 at 3:24 PM, Jiri Kosina <jkosina@suse.cz> wrote:
> >
> > Do you think this'd be faster than the int3-based aproach?
>
> Unlikely to be faster, but perhaps more robust and more portable. Maybe.
Can't we at least change uprobe_write_opcode() and kill the
nontrivial __replace_page() logic?
See the patch below. For review only, I can't understand if
it needs mmu_notifier_invalidate* and set_pte_notify(), and
of course I can easily miss something else.
Currently uprobe_write_opcode() always allocs/installs the new
page, even if the previous uprobe_write_opcode() has already
created the cowed anonymous page.
With this patch uprobe_write_opcode() calls gup(write, force),
then invalidates pte, then modifies the page, and restores the
old pte.
The patch is hardly readable, this is how the code looks:
int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr,
uprobe_opcode_t opcode)
{
struct page *page;
struct vm_area_struct *vma;
pte_t *ptep, entry;
spinlock_t *ptlp;
int ret;
/* Read the page with vaddr into memory */
ret = get_user_pages(NULL, mm, vaddr, 1, 0, 1, &page, NULL);
if (ret <= 0)
return ret;
ret = verify_opcode(page, vaddr, &opcode);
if (ret <= 0)
goto put;
retry:
put_page(page);
/* COW this page if not writable */
ret = get_user_pages(NULL, mm, vaddr, 1, 1, 1, &page, &vma);
if (ret <= 0)
goto put;
ptep = page_check_address(page, mm, vaddr, &ptlp, 0);
if (!ptep)
goto retry;
/* Unmap this page to ensure that nobody can execute it */
flush_cache_page(vma, vaddr, pte_pfn(*ptep));
entry = ptep_clear_flush(vma, vaddr, ptep);
/* Nobody can fault in this page, modify it */
copy_to_page(page, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE);
/* Restore the old mapping */
entry = pte_mkdirty(entry);
set_pte_at(mm, vaddr, ptep, entry);
update_mmu_cache(vma, vaddr, ptep);
pte_unmap_unlock(ptep, ptlp);
put:
put_page(page);
return ret;
}
you can safely ignore the fist get_user_pages() and verify_opcode().
I'll appretiate any review, thanks in advance.
Oleg.
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -114,65 +114,6 @@ static loff_t vaddr_to_offset(struct vm_area_struct *vma, unsigned long vaddr)
}
/**
- * __replace_page - replace page in vma by new page.
- * based on replace_page in mm/ksm.c
- *
- * @vma: vma that holds the pte pointing to page
- * @addr: address the old @page is mapped at
- * @page: the cowed page we are replacing by kpage
- * @kpage: the modified page we replace page by
- *
- * Returns 0 on success, -EFAULT on failure.
- */
-static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
- struct page *page, struct page *kpage)
-{
- struct mm_struct *mm = vma->vm_mm;
- spinlock_t *ptl;
- pte_t *ptep;
- int err;
- /* For mmu_notifiers */
- const unsigned long mmun_start = addr;
- const unsigned long mmun_end = addr + PAGE_SIZE;
-
- /* For try_to_free_swap() and munlock_vma_page() below */
- lock_page(page);
-
- mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
- err = -EAGAIN;
- ptep = page_check_address(page, mm, addr, &ptl, 0);
- if (!ptep)
- goto unlock;
-
- get_page(kpage);
- page_add_new_anon_rmap(kpage, vma, addr);
-
- if (!PageAnon(page)) {
- dec_mm_counter(mm, MM_FILEPAGES);
- inc_mm_counter(mm, MM_ANONPAGES);
- }
-
- flush_cache_page(vma, addr, pte_pfn(*ptep));
- ptep_clear_flush(vma, addr, ptep);
- set_pte_at_notify(mm, addr, ptep, mk_pte(kpage, vma->vm_page_prot));
-
- page_remove_rmap(page);
- if (!page_mapped(page))
- try_to_free_swap(page);
- pte_unmap_unlock(ptep, ptl);
-
- if (vma->vm_flags & VM_LOCKED)
- munlock_vma_page(page);
- put_page(page);
-
- err = 0;
- unlock:
- mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
- unlock_page(page);
- return err;
-}
-
-/**
* is_swbp_insn - check if instruction is breakpoint instruction.
* @insn: instruction to be checked.
* Default implementation of is_swbp_insn
@@ -264,43 +205,46 @@ static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t
int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr,
uprobe_opcode_t opcode)
{
- struct page *old_page, *new_page;
+ struct page *page;
struct vm_area_struct *vma;
+ pte_t *ptep, entry;
+ spinlock_t *ptlp;
int ret;
-retry:
/* Read the page with vaddr into memory */
- ret = get_user_pages(NULL, mm, vaddr, 1, 0, 1, &old_page, &vma);
+ ret = get_user_pages(NULL, mm, vaddr, 1, 0, 1, &page, NULL);
if (ret <= 0)
return ret;
- ret = verify_opcode(old_page, vaddr, &opcode);
+ ret = verify_opcode(page, vaddr, &opcode);
if (ret <= 0)
- goto put_old;
-
- ret = -ENOMEM;
- new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, vaddr);
- if (!new_page)
- goto put_old;
+ goto put;
- __SetPageUptodate(new_page);
-
- copy_highpage(new_page, old_page);
- copy_to_page(new_page, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE);
+ retry:
+ put_page(page);
+ /* COW this page if not writable */
+ ret = get_user_pages(NULL, mm, vaddr, 1, 1, 1, &page, &vma);
+ if (ret <= 0)
+ goto put;
- ret = anon_vma_prepare(vma);
- if (ret)
- goto put_new;
+ ptep = page_check_address(page, mm, vaddr, &ptlp, 0);
+ if (!ptep)
+ goto retry;
- ret = __replace_page(vma, vaddr, old_page, new_page);
+ /* Unmap this page to ensure that nobody can execute it */
+ flush_cache_page(vma, vaddr, pte_pfn(*ptep));
+ entry = ptep_clear_flush(vma, vaddr, ptep);
-put_new:
- page_cache_release(new_page);
-put_old:
- put_page(old_page);
+ /* Nobody can fault in this page, modify it */
+ copy_to_page(page, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE);
- if (unlikely(ret == -EAGAIN))
- goto retry;
+ /* Restore the old mapping */
+ entry = pte_mkdirty(entry);
+ set_pte_at(mm, vaddr, ptep, entry);
+ update_mmu_cache(vma, vaddr, ptep);
+ pte_unmap_unlock(ptep, ptlp);
+ put:
+ put_page(page);
return ret;
}
next prev parent reply other threads:[~2013-12-03 18:49 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-26 0:37 [PATCH] Add a text_poke syscall v2 Andi Kleen
2013-11-26 19:05 ` Andy Lutomirski
2013-11-26 19:11 ` Andi Kleen
2013-11-26 20:03 ` Linus Torvalds
2013-11-27 19:57 ` H. Peter Anvin
2013-11-27 22:02 ` H. Peter Anvin
2013-11-27 22:21 ` Andy Lutomirski
2013-11-27 22:21 ` Borislav Petkov
2013-11-27 22:24 ` H. Peter Anvin
2013-11-27 22:25 ` H. Peter Anvin
2013-11-27 22:29 ` Borislav Petkov
2013-11-27 22:31 ` H. Peter Anvin
2013-11-27 23:04 ` Linus Torvalds
2013-11-27 23:13 ` Borislav Petkov
2013-11-27 22:40 ` H. Peter Anvin
2013-11-27 23:10 ` Borislav Petkov
2013-11-27 23:20 ` H. Peter Anvin
2013-11-27 23:40 ` Borislav Petkov
2013-11-27 23:47 ` H. Peter Anvin
2013-11-27 22:41 ` Linus Torvalds
2013-11-27 22:53 ` H. Peter Anvin
2013-11-27 23:15 ` Linus Torvalds
2013-11-27 23:28 ` H. Peter Anvin
2013-11-28 2:01 ` Linus Torvalds
2013-11-28 2:10 ` H. Peter Anvin
2013-11-28 9:12 ` Jiri Kosina
2013-11-27 23:44 ` Andi Kleen
2013-11-29 18:35 ` Oleg Nesterov
2013-11-29 19:54 ` Andi Kleen
2013-11-29 20:05 ` Oleg Nesterov
2013-11-29 20:17 ` H. Peter Anvin
2013-11-29 20:35 ` Oleg Nesterov
2013-11-29 21:24 ` H. Peter Anvin
2013-11-30 14:56 ` Oleg Nesterov
2013-11-29 23:24 ` Jiri Kosina
2013-11-30 0:22 ` Linus Torvalds
2013-12-03 18:49 ` Oleg Nesterov [this message]
2013-12-03 19:00 ` [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly Linus Torvalds
2013-12-03 19:20 ` H. Peter Anvin
2013-12-03 20:01 ` Oleg Nesterov
2013-12-03 20:21 ` H. Peter Anvin
2013-12-03 20:38 ` Oleg Nesterov
2013-12-03 20:43 ` H. Peter Anvin
2013-12-03 20:54 ` Oleg Nesterov
2013-12-03 22:01 ` Linus Torvalds
2013-12-03 23:47 ` H. Peter Anvin
2013-12-04 11:30 ` Oleg Nesterov
2013-12-04 11:11 ` Oleg Nesterov
2013-12-04 16:01 ` H. Peter Anvin
2013-12-04 16:48 ` Oleg Nesterov
2013-12-04 16:54 ` H. Peter Anvin
2013-12-04 17:15 ` Linus Torvalds
2013-12-04 17:43 ` Oleg Nesterov
2013-12-05 17:23 ` Oleg Nesterov
2013-12-05 17:49 ` Borislav Petkov
2013-12-05 18:45 ` Oleg Nesterov
2013-12-04 18:32 ` H. Peter Anvin
2013-12-05 8:28 ` Jon Medhurst (Tixy)
2013-12-03 22:42 ` H. Peter Anvin
2013-12-03 19:53 ` Oleg Nesterov
2013-11-30 15:20 ` [PATCH] Add a text_poke syscall v2 Oleg Nesterov
2013-11-30 16:51 ` Oleg Nesterov
2013-11-30 17:31 ` Oleg Nesterov
2013-11-30 5:16 ` H. Peter Anvin
2013-11-30 14:52 ` Oleg Nesterov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131203184909.GA17571@redhat.com \
--to=oleg@redhat.com \
--cc=ak@linux.intel.com \
--cc=andi@firstfloor.org \
--cc=bp@alien8.de \
--cc=hpa@zytor.com \
--cc=hughd@google.com \
--cc=jkosina@suse.cz \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=torvalds@linux-foundation.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.