From: Oleg Nesterov <oleg@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>, Hugh Dickins <hughd@google.com>,
Peter Zijlstra <peterz@infradead.org>,
Jiri Kosina <jkosina@suse.cz>, Andi Kleen <andi@firstfloor.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
the arch/x86 maintainers <x86@kernel.org>,
Andi Kleen <ak@linux.intel.com>, Ingo Molnar <mingo@kernel.org>,
Borislav Petkov <bp@alien8.de>,
Ananth N Mavinakayanahalli <ananth@in.ibm.com>,
Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Subject: Re: [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly
Date: Wed, 4 Dec 2013 18:43:28 +0100 [thread overview]
Message-ID: <20131204174328.GA8082@redhat.com> (raw)
In-Reply-To: <CA+55aFyMx74jiKgS-o0axuuTBw7295UEMSf_vYDQCAghk3Vr=w@mail.gmail.com>
Peter, Linus, I got lost.
So what do you finally think about this change? Please see v2 below:
- update the comment above gup(write, force)
- add flush_icache_page() before set_pte_at() (nop on x86
and powerpc)
- fix the returned value, and with this change it seems
to work although I do not trust my testing.
I am attaching the code:
int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr,
uprobe_opcode_t opcode)
{
struct page *page;
struct vm_area_struct *vma;
pte_t *ptep, entry;
spinlock_t *ptlp;
int ret;
/* Read the page with vaddr into memory */
ret = get_user_pages(NULL, mm, vaddr, 1, 0, 1, &page, NULL);
if (ret < 0)
return ret;
ret = verify_opcode(page, vaddr, &opcode);
if (ret < 0)
goto put;
retry:
put_page(page);
/* Break the mapping unless the page is already anonymous */
ret = get_user_pages(NULL, mm, vaddr, 1, 1, 1, &page, &vma);
if (ret <= 0)
goto put;
ptep = page_check_address(page, mm, vaddr, &ptlp, 0);
if (!ptep)
goto retry;
/* Unmap this page to ensure that nobody can execute it */
flush_cache_page(vma, vaddr, pte_pfn(*ptep));
entry = ptep_clear_flush(vma, vaddr, ptep);
/* Nobody can fault in this page, modify it */
copy_to_page(page, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE);
/* Restore the old mapping */
entry = pte_mkdirty(entry);
flush_icache_page(vma, page);
set_pte_at(mm, vaddr, ptep, entry);
update_mmu_cache(vma, vaddr, ptep);
pte_unmap_unlock(ptep, ptlp);
ret = 0;
put:
put_page(page);
return ret;
}
And/Or. Are you still saying that on x86 (and powerpc?) we do not need
these pte games at all and uprobe_write_opcode() can simply do:
/* Break the mapping unless the page is already anonymous */
ret = get_user_pages(NULL, mm, vaddr, 1, 1, 1, &page, &vma);
// this page can't be swapped out due to page_freeze_refs()
copy_to_page(page, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE);
set_page_dirty_lock(page);
Just in case... I have no idea if this matters or not, but
uprobe_write_opcode() is also used to restore the original insn which
can even cross the page. We still write a single (1st) byte in this
case, the byte which was previously replaced by int3.
Oleg.
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -114,65 +114,6 @@ static loff_t vaddr_to_offset(struct vm_area_struct *vma, unsigned long vaddr)
}
/**
- * __replace_page - replace page in vma by new page.
- * based on replace_page in mm/ksm.c
- *
- * @vma: vma that holds the pte pointing to page
- * @addr: address the old @page is mapped at
- * @page: the cowed page we are replacing by kpage
- * @kpage: the modified page we replace page by
- *
- * Returns 0 on success, -EFAULT on failure.
- */
-static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
- struct page *page, struct page *kpage)
-{
- struct mm_struct *mm = vma->vm_mm;
- spinlock_t *ptl;
- pte_t *ptep;
- int err;
- /* For mmu_notifiers */
- const unsigned long mmun_start = addr;
- const unsigned long mmun_end = addr + PAGE_SIZE;
-
- /* For try_to_free_swap() and munlock_vma_page() below */
- lock_page(page);
-
- mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
- err = -EAGAIN;
- ptep = page_check_address(page, mm, addr, &ptl, 0);
- if (!ptep)
- goto unlock;
-
- get_page(kpage);
- page_add_new_anon_rmap(kpage, vma, addr);
-
- if (!PageAnon(page)) {
- dec_mm_counter(mm, MM_FILEPAGES);
- inc_mm_counter(mm, MM_ANONPAGES);
- }
-
- flush_cache_page(vma, addr, pte_pfn(*ptep));
- ptep_clear_flush(vma, addr, ptep);
- set_pte_at_notify(mm, addr, ptep, mk_pte(kpage, vma->vm_page_prot));
-
- page_remove_rmap(page);
- if (!page_mapped(page))
- try_to_free_swap(page);
- pte_unmap_unlock(ptep, ptl);
-
- if (vma->vm_flags & VM_LOCKED)
- munlock_vma_page(page);
- put_page(page);
-
- err = 0;
- unlock:
- mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
- unlock_page(page);
- return err;
-}
-
-/**
* is_swbp_insn - check if instruction is breakpoint instruction.
* @insn: instruction to be checked.
* Default implementation of is_swbp_insn
@@ -264,43 +205,48 @@ static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t
int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr,
uprobe_opcode_t opcode)
{
- struct page *old_page, *new_page;
+ struct page *page;
struct vm_area_struct *vma;
+ pte_t *ptep, entry;
+ spinlock_t *ptlp;
int ret;
-retry:
/* Read the page with vaddr into memory */
- ret = get_user_pages(NULL, mm, vaddr, 1, 0, 1, &old_page, &vma);
- if (ret <= 0)
+ ret = get_user_pages(NULL, mm, vaddr, 1, 0, 1, &page, NULL);
+ if (ret < 0)
return ret;
- ret = verify_opcode(old_page, vaddr, &opcode);
- if (ret <= 0)
- goto put_old;
-
- ret = -ENOMEM;
- new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, vaddr);
- if (!new_page)
- goto put_old;
-
- __SetPageUptodate(new_page);
-
- copy_highpage(new_page, old_page);
- copy_to_page(new_page, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE);
-
- ret = anon_vma_prepare(vma);
- if (ret)
- goto put_new;
-
- ret = __replace_page(vma, vaddr, old_page, new_page);
+ ret = verify_opcode(page, vaddr, &opcode);
+ if (ret < 0)
+ goto put;
-put_new:
- page_cache_release(new_page);
-put_old:
- put_page(old_page);
+ retry:
+ put_page(page);
+ /* Break the mapping unless the page is already anonymous */
+ ret = get_user_pages(NULL, mm, vaddr, 1, 1, 1, &page, &vma);
+ if (ret <= 0)
+ goto put;
- if (unlikely(ret == -EAGAIN))
+ ptep = page_check_address(page, mm, vaddr, &ptlp, 0);
+ if (!ptep)
goto retry;
+
+ /* Unmap this page to ensure that nobody can execute it */
+ flush_cache_page(vma, vaddr, pte_pfn(*ptep));
+ entry = ptep_clear_flush(vma, vaddr, ptep);
+
+ /* Nobody can fault in this page, modify it */
+ copy_to_page(page, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE);
+
+ /* Restore the old mapping */
+ entry = pte_mkdirty(entry);
+ flush_icache_page(vma, page);
+ set_pte_at(mm, vaddr, ptep, entry);
+ update_mmu_cache(vma, vaddr, ptep);
+ pte_unmap_unlock(ptep, ptlp);
+ ret = 0;
+ put:
+ put_page(page);
return ret;
}
next prev parent reply other threads:[~2013-12-04 17:43 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-26 0:37 [PATCH] Add a text_poke syscall v2 Andi Kleen
2013-11-26 19:05 ` Andy Lutomirski
2013-11-26 19:11 ` Andi Kleen
2013-11-26 20:03 ` Linus Torvalds
2013-11-27 19:57 ` H. Peter Anvin
2013-11-27 22:02 ` H. Peter Anvin
2013-11-27 22:21 ` Andy Lutomirski
2013-11-27 22:21 ` Borislav Petkov
2013-11-27 22:24 ` H. Peter Anvin
2013-11-27 22:25 ` H. Peter Anvin
2013-11-27 22:29 ` Borislav Petkov
2013-11-27 22:31 ` H. Peter Anvin
2013-11-27 23:04 ` Linus Torvalds
2013-11-27 23:13 ` Borislav Petkov
2013-11-27 22:40 ` H. Peter Anvin
2013-11-27 23:10 ` Borislav Petkov
2013-11-27 23:20 ` H. Peter Anvin
2013-11-27 23:40 ` Borislav Petkov
2013-11-27 23:47 ` H. Peter Anvin
2013-11-27 22:41 ` Linus Torvalds
2013-11-27 22:53 ` H. Peter Anvin
2013-11-27 23:15 ` Linus Torvalds
2013-11-27 23:28 ` H. Peter Anvin
2013-11-28 2:01 ` Linus Torvalds
2013-11-28 2:10 ` H. Peter Anvin
2013-11-28 9:12 ` Jiri Kosina
2013-11-27 23:44 ` Andi Kleen
2013-11-29 18:35 ` Oleg Nesterov
2013-11-29 19:54 ` Andi Kleen
2013-11-29 20:05 ` Oleg Nesterov
2013-11-29 20:17 ` H. Peter Anvin
2013-11-29 20:35 ` Oleg Nesterov
2013-11-29 21:24 ` H. Peter Anvin
2013-11-30 14:56 ` Oleg Nesterov
2013-11-29 23:24 ` Jiri Kosina
2013-11-30 0:22 ` Linus Torvalds
2013-12-03 18:49 ` [PATCH?] uprobes: change uprobe_write_opcode() to modify the page directly Oleg Nesterov
2013-12-03 19:00 ` Linus Torvalds
2013-12-03 19:20 ` H. Peter Anvin
2013-12-03 20:01 ` Oleg Nesterov
2013-12-03 20:21 ` H. Peter Anvin
2013-12-03 20:38 ` Oleg Nesterov
2013-12-03 20:43 ` H. Peter Anvin
2013-12-03 20:54 ` Oleg Nesterov
2013-12-03 22:01 ` Linus Torvalds
2013-12-03 23:47 ` H. Peter Anvin
2013-12-04 11:30 ` Oleg Nesterov
2013-12-04 11:11 ` Oleg Nesterov
2013-12-04 16:01 ` H. Peter Anvin
2013-12-04 16:48 ` Oleg Nesterov
2013-12-04 16:54 ` H. Peter Anvin
2013-12-04 17:15 ` Linus Torvalds
2013-12-04 17:43 ` Oleg Nesterov [this message]
2013-12-05 17:23 ` Oleg Nesterov
2013-12-05 17:49 ` Borislav Petkov
2013-12-05 18:45 ` Oleg Nesterov
2013-12-04 18:32 ` H. Peter Anvin
2013-12-05 8:28 ` Jon Medhurst (Tixy)
2013-12-03 22:42 ` H. Peter Anvin
2013-12-03 19:53 ` Oleg Nesterov
2013-11-30 15:20 ` [PATCH] Add a text_poke syscall v2 Oleg Nesterov
2013-11-30 16:51 ` Oleg Nesterov
2013-11-30 17:31 ` Oleg Nesterov
2013-11-30 5:16 ` H. Peter Anvin
2013-11-30 14:52 ` Oleg Nesterov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131204174328.GA8082@redhat.com \
--to=oleg@redhat.com \
--cc=ak@linux.intel.com \
--cc=ananth@in.ibm.com \
--cc=andi@firstfloor.org \
--cc=bp@alien8.de \
--cc=hpa@zytor.com \
--cc=hughd@google.com \
--cc=jkosina@suse.cz \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=srikar@linux.vnet.ibm.com \
--cc=torvalds@linux-foundation.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.