* [PATCH 0/1] uprobes: Kill __replace_page(), change uprobe_write_opcode() to rely on gup(WRITE)
@ 2013-12-09 21:18 Oleg Nesterov
2013-12-09 21:18 ` [PATCH 1/1] " Oleg Nesterov
2013-12-10 2:08 ` [PATCH 0/1] " Linus Torvalds
0 siblings, 2 replies; 10+ messages in thread
From: Oleg Nesterov @ 2013-12-09 21:18 UTC (permalink / raw)
To: Linus Torvalds, H. Peter Anvin
Cc: Ananth N Mavinakayanahalli, Andi Kleen, Borislav Petkov,
Hugh Dickins, Ingo Molnar, Jiri Kosina, Peter Zijlstra,
Srikar Dronamraju, linux-kernel
Hello.
It is not clear to me if Linus still dislikes this change or not.
Let me send the patch "officially" so that it can be nacked if I
misunderstood the result of discussion.
Changes:
- add a huge comment above gup(WRITE | FORCE)
- add WARN_ON(!(PageAnon() && page_mapcount() == 1))
to ensure it works as expected
If (say, on x86) we can avoid the pte games, we can simply add
if (IS_ENABLED(CONFIG_WHATEVER)) {
copy_to_page(...);
set_page_dirty_locked(page);
goto put;
}
right after the 2nd get_user_pages().
In any case I believe it would be very nice to kill __replace_page(),
and even the fact this patch removes include(mm/internal.h) makes me
think this patch makes sense. Assuming it is correct.
Oleg.
---
int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr,
uprobe_opcode_t opcode)
{
struct page *page;
struct vm_area_struct *vma;
pte_t *ptep, entry;
spinlock_t *ptlp;
int ret;
/* Read the page with vaddr into memory */
ret = get_user_pages(NULL, mm, vaddr, 1, 0, 1, &page, NULL);
if (ret < 0)
return ret;
ret = verify_opcode(page, vaddr, &opcode);
if (ret <= 0)
goto put;
retry:
put_page(page);
/*
* Break the mapping unless the page is already anonymous and
* unshare the page, see the WARN_ON() below.
*
* We never write to the VM_SHARED vma, every caller must check
* valid_vma(). FOLL_WRITE | FOLL_FORCE should anonymize this
* page unless uprobe_write_opcode() was already called in the
* past or the application itself did mprotect(PROT_WRITE) and
* wrote into this page.
*
* If it was already anonymous it can be shared due to dup_mm(),
* in this case do_wp_page() or do_swap_page() will do another
* cow to unshare, so we can safely modify it.
*/
ret = get_user_pages(NULL, mm, vaddr, 1, 1, 1, &page, &vma);
if (ret < 0)
return ret;
ptep = page_check_address(page, mm, vaddr, &ptlp, 0);
if (!ptep)
goto retry;
ret = 0;
if (WARN_ON(!PageAnon(page) || page_mapcount(page) != 1)) {
dump_page(page);
ret = -EFAULT;
goto unlock;
}
/* Unmap this page to ensure that nobody can execute it */
flush_cache_page(vma, vaddr, pte_pfn(*ptep));
entry = ptep_clear_flush(vma, vaddr, ptep);
/* Nobody can fault in this page, modify it */
copy_to_page(page, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE);
/* Restore the old mapping */
entry = pte_mkdirty(entry);
flush_icache_page(vma, page);
set_pte_at(mm, vaddr, ptep, entry);
update_mmu_cache(vma, vaddr, ptep);
unlock:
pte_unmap_unlock(ptep, ptlp);
put:
put_page(page);
return ret;
}
^ permalink raw reply [flat|nested] 10+ messages in thread* [PATCH 1/1] uprobes: Kill __replace_page(), change uprobe_write_opcode() to rely on gup(WRITE) 2013-12-09 21:18 [PATCH 0/1] uprobes: Kill __replace_page(), change uprobe_write_opcode() to rely on gup(WRITE) Oleg Nesterov @ 2013-12-09 21:18 ` Oleg Nesterov 2013-12-10 2:08 ` [PATCH 0/1] " Linus Torvalds 1 sibling, 0 replies; 10+ messages in thread From: Oleg Nesterov @ 2013-12-09 21:18 UTC (permalink / raw) To: Linus Torvalds, H. Peter Anvin Cc: Ananth N Mavinakayanahalli, Andi Kleen, Borislav Petkov, Hugh Dickins, Ingo Molnar, Jiri Kosina, Peter Zijlstra, Srikar Dronamraju, linux-kernel __replace_page() logic is nontrivial, it plays with vm internals. uprobe_write_opcode() can rely on gup(WRITE|FORCE) which should break the mapping and unshare the page. After that we can unmap this page temporary to ensure that nobody can access it, modify its content, and restore the same pte + _PAGE_DIRTY. This greatly simplifies the code and avoids the extra alloc_page() if uprobe_write_opcode() was already called in the past. Note: perhaps we do not need pte games at least on x86 which always updates a single byte, we can trivially change this code later to do copy_to_page() + set_page_dirty_lock() after the 2nd get_user_pages() if CONFIG_X86. Signed-off-by: Oleg Nesterov <oleg@redhat.com> --- kernel/events/uprobes.c | 133 +++++++++++++++++----------------------------- 1 files changed, 49 insertions(+), 84 deletions(-) diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 24b7d6c..1bae429 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -33,7 +33,6 @@ #include <linux/swap.h> /* try_to_free_swap */ #include <linux/ptrace.h> /* user_enable_single_step */ #include <linux/kdebug.h> /* notifier mechanism */ -#include "../../mm/internal.h" /* munlock_vma_page */ #include <linux/percpu-rwsem.h> #include <linux/task_work.h> @@ -114,65 +113,6 @@ static loff_t vaddr_to_offset(struct vm_area_struct *vma, unsigned long vaddr) } /** - * __replace_page - replace page in vma by new page. - * based on replace_page in mm/ksm.c - * - * @vma: vma that holds the pte pointing to page - * @addr: address the old @page is mapped at - * @page: the cowed page we are replacing by kpage - * @kpage: the modified page we replace page by - * - * Returns 0 on success, -EFAULT on failure. - */ -static int __replace_page(struct vm_area_struct *vma, unsigned long addr, - struct page *page, struct page *kpage) -{ - struct mm_struct *mm = vma->vm_mm; - spinlock_t *ptl; - pte_t *ptep; - int err; - /* For mmu_notifiers */ - const unsigned long mmun_start = addr; - const unsigned long mmun_end = addr + PAGE_SIZE; - - /* For try_to_free_swap() and munlock_vma_page() below */ - lock_page(page); - - mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end); - err = -EAGAIN; - ptep = page_check_address(page, mm, addr, &ptl, 0); - if (!ptep) - goto unlock; - - get_page(kpage); - page_add_new_anon_rmap(kpage, vma, addr); - - if (!PageAnon(page)) { - dec_mm_counter(mm, MM_FILEPAGES); - inc_mm_counter(mm, MM_ANONPAGES); - } - - flush_cache_page(vma, addr, pte_pfn(*ptep)); - ptep_clear_flush(vma, addr, ptep); - set_pte_at_notify(mm, addr, ptep, mk_pte(kpage, vma->vm_page_prot)); - - page_remove_rmap(page); - if (!page_mapped(page)) - try_to_free_swap(page); - pte_unmap_unlock(ptep, ptl); - - if (vma->vm_flags & VM_LOCKED) - munlock_vma_page(page); - put_page(page); - - err = 0; - unlock: - mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end); - unlock_page(page); - return err; -} - -/** * is_swbp_insn - check if instruction is breakpoint instruction. * @insn: instruction to be checked. * Default implementation of is_swbp_insn @@ -264,43 +204,68 @@ static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t opcode) { - struct page *old_page, *new_page; + struct page *page; struct vm_area_struct *vma; + pte_t *ptep, entry; + spinlock_t *ptlp; int ret; -retry: /* Read the page with vaddr into memory */ - ret = get_user_pages(NULL, mm, vaddr, 1, 0, 1, &old_page, &vma); - if (ret <= 0) + ret = get_user_pages(NULL, mm, vaddr, 1, 0, 1, &page, NULL); + if (ret < 0) return ret; - ret = verify_opcode(old_page, vaddr, &opcode); + ret = verify_opcode(page, vaddr, &opcode); if (ret <= 0) - goto put_old; + goto put; - ret = -ENOMEM; - new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, vaddr); - if (!new_page) - goto put_old; - - __SetPageUptodate(new_page); + retry: + put_page(page); + /* + * Break the mapping unless the page is already anonymous and + * unshare the page, see the WARN_ON() below. + * + * We never write to the VM_SHARED vma, every caller must check + * valid_vma(). FOLL_WRITE | FOLL_FORCE should anonymize this + * page unless uprobe_write_opcode() was already called in the + * past or the application itself did mprotect(PROT_WRITE) and + * wrote into this page. + * + * If it was already anonymous it can be shared due to dup_mm(), + * in this case do_wp_page() or do_swap_page() will do another + * cow to unshare, so we can safely modify it. + */ + ret = get_user_pages(NULL, mm, vaddr, 1, 1, 1, &page, &vma); + if (ret < 0) + return ret; - copy_highpage(new_page, old_page); - copy_to_page(new_page, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE); + ptep = page_check_address(page, mm, vaddr, &ptlp, 0); + if (!ptep) + goto retry; - ret = anon_vma_prepare(vma); - if (ret) - goto put_new; + ret = 0; + if (WARN_ON(!PageAnon(page) || page_mapcount(page) != 1)) { + dump_page(page); + ret = -EFAULT; + goto unlock; + } - ret = __replace_page(vma, vaddr, old_page, new_page); + /* Unmap this page to ensure that nobody can execute it */ + flush_cache_page(vma, vaddr, pte_pfn(*ptep)); + entry = ptep_clear_flush(vma, vaddr, ptep); -put_new: - page_cache_release(new_page); -put_old: - put_page(old_page); + /* Nobody can fault in this page, modify it */ + copy_to_page(page, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE); - if (unlikely(ret == -EAGAIN)) - goto retry; + /* Restore the old mapping */ + entry = pte_mkdirty(entry); + flush_icache_page(vma, page); + set_pte_at(mm, vaddr, ptep, entry); + update_mmu_cache(vma, vaddr, ptep); + unlock: + pte_unmap_unlock(ptep, ptlp); + put: + put_page(page); return ret; } -- 1.5.5.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 0/1] uprobes: Kill __replace_page(), change uprobe_write_opcode() to rely on gup(WRITE) 2013-12-09 21:18 [PATCH 0/1] uprobes: Kill __replace_page(), change uprobe_write_opcode() to rely on gup(WRITE) Oleg Nesterov 2013-12-09 21:18 ` [PATCH 1/1] " Oleg Nesterov @ 2013-12-10 2:08 ` Linus Torvalds 2013-12-10 19:18 ` Oleg Nesterov 1 sibling, 1 reply; 10+ messages in thread From: Linus Torvalds @ 2013-12-10 2:08 UTC (permalink / raw) To: Oleg Nesterov Cc: H. Peter Anvin, Ananth N Mavinakayanahalli, Andi Kleen, Borislav Petkov, Hugh Dickins, Ingo Molnar, Jiri Kosina, Peter Zijlstra, Srikar Dronamraju, Linux Kernel Mailing List On Mon, Dec 9, 2013 at 1:18 PM, Oleg Nesterov <oleg@redhat.com> wrote: > > It is not clear to me if Linus still dislikes this change or not. > Let me send the patch "officially" so that it can be nacked if I > misunderstood the result of discussion. So quite frankly, if the caller guarantees that it has looked up the vma, and verified it, then I'm ok with it. But in that case, I think you should pass in the vma as an argument, and verify it for now. Because quite frankly, the reason I reacted to this all is that it is NOT AT ALL obvious that the callers actually do that. That uprobe_write_opcode() also only works if the instruction is within a single page, another thing that it doesn't actually check. And again, it is not at all obvious by looking at the callers that that check has ever been done. The interface looks like you can just rewrite an arbitrary byte range using that function. In other words: looking at that function, my immediate reaction is still "it's buggy". Why? Because it seems to make assumptions about the callers that are never checked and are not at all obvious that they have ever *been* checked. In fact, callers can clearly call that thing without ever even looking up the vma (which it must to do check that it's not shared), and we look it up *again* by using that slow nasty get_user_pages() thing. Also, quite frankly, I think the routine is still just horrible. If I read it right, even after that cleanup you do a page table walk *THREE* times: - 2x get_user_pages() - page_check_address() and that's without the whole "retry" thing. It all seems pretty damn pointless. Wouldn't it be better to do it *once*, and then the retry logic is for "oops, we need to cow the page we looked up, so we need to drop the page table lock and allocate a new page". So I'd actually prefer this to (a) pass in the vma (to make it obvious that the caller must look it up!) (b) add a few sanity checks (not page-crossing) and I think possibly (c) make it a bit smarter still. Linus ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/1] uprobes: Kill __replace_page(), change uprobe_write_opcode() to rely on gup(WRITE) 2013-12-10 2:08 ` [PATCH 0/1] " Linus Torvalds @ 2013-12-10 19:18 ` Oleg Nesterov 2013-12-10 19:38 ` Linus Torvalds 0 siblings, 1 reply; 10+ messages in thread From: Oleg Nesterov @ 2013-12-10 19:18 UTC (permalink / raw) To: Linus Torvalds Cc: H. Peter Anvin, Ananth N Mavinakayanahalli, Andi Kleen, Borislav Petkov, Hugh Dickins, Ingo Molnar, Jiri Kosina, Peter Zijlstra, Srikar Dronamraju, Linux Kernel Mailing List On 12/09, Linus Torvalds wrote: > > On Mon, Dec 9, 2013 at 1:18 PM, Oleg Nesterov <oleg@redhat.com> wrote: > > > > It is not clear to me if Linus still dislikes this change or not. > > Let me send the patch "officially" so that it can be nacked if I > > misunderstood the result of discussion. > > So quite frankly, if the caller guarantees that it has looked up the > vma, and verified it, then I'm ok with it. > > But in that case, I think you should pass in the vma as an argument, > and verify it for now. Because quite frankly, the reason I reacted to > this all is that it is NOT AT ALL obvious that the callers actually do > that. > > That uprobe_write_opcode() also only works if the instruction is > within a single page, another thing that it doesn't actually check. > And again, it is not at all obvious by looking at the callers that > that check has ever been done. The interface looks like you can just > rewrite an arbitrary byte range using that function. OK. I'll update the comment above uprobe_write_opcode() to make this all clear. It is wrong anyway, ->mmap_sem is held for writing. > In fact, callers can clearly call that > thing without ever even looking up the vma (which it must to do check > that it's not shared), and we look it up *again* by using that slow > nasty get_user_pages() thing. Well, if we use get_user_pages() we have vma for free. And this is the same vma which was verified by the caller who must check valid_vma(), the caller doesn't drop mmap_sem. I agree, this should be documented. And yes, we can pass vma instead of mm, but this needs a separate change. We need to change the signature of set_swbp() and set_orig_insn(). And the current code also relies on get_user_pages(&vma) in this sense. But I am not sure about "and verify it for now" above. Do you mean that uprobe_write_opcode() should do another valid_vma() ? I don't think this will look good, unless we put this check into WARN_ON(). Or just verify that the argument matches the vma found by gup() ? And perhaps we can do this later? This change will conflict with the pending arm patches (arm reimplements set_swbp() because it uses another insn != UPROBE_SWBP_INSN depending on original insn). This doesn't look like a serious problem to me. > Also, quite frankly, I think the routine is still just horrible. I agree. I do think it becomes better (if the patch is correct), just because we do not play with vm internals, at least that much. > If I > read it right, even after that cleanup you do a page table walk > *THREE* times: > > - 2x get_user_pages() Yes. The 1st get_user_pages() + verify_opcode() is only needed to avoid the unnecessary write (and cow). Without this check, say, uprobe_unregister() can create the anonymous cow'ed page even if this task was not probed. This makes sense, but perhaps we can optimize this somehow, not sure. But this was not added by the patch. > - page_check_address() Yes. The patch adds another page table walk, this is unfortunate. > and that's without the whole "retry" thing. It all seems pretty damn > pointless. Wouldn't it be better to do it *once*, and then the retry > logic is for "oops, we need to cow the page we looked up, so we need > to drop the page table lock and allocate a new page". Hmm, OK. I guess we can rely on the same PageAnon() + page_mapcount() check, just we should not WARN() the first time. Plus we should probably also check page_swapcount(). But I need to recheck. Thanks. I'll try to make v2. Oleg. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/1] uprobes: Kill __replace_page(), change uprobe_write_opcode() to rely on gup(WRITE) 2013-12-10 19:18 ` Oleg Nesterov @ 2013-12-10 19:38 ` Linus Torvalds 2013-12-10 20:04 ` Oleg Nesterov 0 siblings, 1 reply; 10+ messages in thread From: Linus Torvalds @ 2013-12-10 19:38 UTC (permalink / raw) To: Oleg Nesterov Cc: H. Peter Anvin, Ananth N Mavinakayanahalli, Andi Kleen, Borislav Petkov, Hugh Dickins, Ingo Molnar, Jiri Kosina, Peter Zijlstra, Srikar Dronamraju, Linux Kernel Mailing List On Tue, Dec 10, 2013 at 11:18 AM, Oleg Nesterov <oleg@redhat.com> wrote: > > Well, if we use get_user_pages() we have vma for free. No we don't. get_user_pages() is expensive as hell. We'd be *much* better off using get_user_pages_fast() if possible - and I bet _is_ possible in 99% of all cases. So saying that we get vma "for free" is complete BS. We get it "for very expensive." > But I am not sure about "and verify it for now" above. Do you mean > that uprobe_write_opcode() should do another valid_vma() ? No. I think we should just do something like if (vma->vm_flags & (VM_SHARED | VM_HUGETLB)) return -EINVAL; and actually make the logic obvious. We should *also* make sure we do the right thing for THP, because the VM_HUGETLB check isn't obviously sufficient. Does the code really do the right thing for a writable hugepage? I'd like that to be *obvious*. Quite frankly, I think that adding a few of these kinds of *obvious* checks and then just looking up the page table _once_ - instead of three times - would make the code safer and faster. As it is, it is neither safe nor fast. It *may* be safe if callers do all the right black magic, but it's by no means obvious. Linus ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/1] uprobes: Kill __replace_page(), change uprobe_write_opcode() to rely on gup(WRITE) 2013-12-10 19:38 ` Linus Torvalds @ 2013-12-10 20:04 ` Oleg Nesterov 2013-12-10 20:16 ` Linus Torvalds 2013-12-10 20:16 ` Oleg Nesterov 0 siblings, 2 replies; 10+ messages in thread From: Oleg Nesterov @ 2013-12-10 20:04 UTC (permalink / raw) To: Linus Torvalds Cc: H. Peter Anvin, Ananth N Mavinakayanahalli, Andi Kleen, Borislav Petkov, Hugh Dickins, Ingo Molnar, Jiri Kosina, Peter Zijlstra, Srikar Dronamraju, Linux Kernel Mailing List On 12/10, Linus Torvalds wrote: > > On Tue, Dec 10, 2013 at 11:18 AM, Oleg Nesterov <oleg@redhat.com> wrote: > > > > Well, if we use get_user_pages() we have vma for free. > > No we don't. get_user_pages() is expensive as hell. I understand, I said "if we use get_user_pages()". > We'd be *much* better off using get_user_pages_fast() if possible - > and I bet _is_ possible in 99% of all cases. We can't. get_user_pages_fast() takes mmap_sem. Perhaps we can do something, say add FOLL_FAST_ONLY, but until then we need the 1st get_user_pages(write => 0) anyway. > > But I am not sure about "and verify it for now" above. Do you mean > > that uprobe_write_opcode() should do another valid_vma() ? > > No. I think we should just do something like > > if (vma->vm_flags & (VM_SHARED | VM_HUGETLB)) > return -EINVAL; OK, but WARN_ON(VM_SHARED | VM_HUGETLB) looks better imho. Because this should not happen. > then just looking up the page table _once_ - instead of > three times - would make the code safer and faster. Unlikely we can do this only once... At least I do not see how. > It *may* be safe if callers do > all the right black magic, but it's by no means obvious. I agree, this needs more documentation in any case. Oleg. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/1] uprobes: Kill __replace_page(), change uprobe_write_opcode() to rely on gup(WRITE) 2013-12-10 20:04 ` Oleg Nesterov @ 2013-12-10 20:16 ` Linus Torvalds 2013-12-10 21:24 ` Oleg Nesterov 2013-12-10 20:16 ` Oleg Nesterov 1 sibling, 1 reply; 10+ messages in thread From: Linus Torvalds @ 2013-12-10 20:16 UTC (permalink / raw) To: Oleg Nesterov Cc: H. Peter Anvin, Ananth N Mavinakayanahalli, Andi Kleen, Borislav Petkov, Hugh Dickins, Ingo Molnar, Jiri Kosina, Peter Zijlstra, Srikar Dronamraju, Linux Kernel Mailing List On Tue, Dec 10, 2013 at 12:04 PM, Oleg Nesterov <oleg@redhat.com> wrote: >> We'd be *much* better off using get_user_pages_fast() if possible - >> and I bet _is_ possible in 99% of all cases. > > We can't. get_user_pages_fast() takes mmap_sem. Yeah, and we need to look up the page table entry anyway, so what we actually want here is just the page table walker, none of the "get page" crap at all. So the core function should (I think) just do something like: - new_page = NULL - get page table lock - look up page tables - if it's dirty and private, or we have a new page for it, replace the instruction - drop page table lock - we're done, free the new page if we didn't use it. with a retry for the "uhhuh, it's not dirty and private, and we don't have a new page", so we do - drop the lock - use a "get_user()" or something to page it in. - allocate a new page if necessary - goto retry which gets us a single page table walk for the common case, and a retry for the "uhhuh, we needed to page it in or allocate a new page" condition. Put another way: I actually think the existing "__replace_page()" code is closer to being good than that disgusting uprobe_write_opcode() function. I think you may be getting rid of the wrong ugly function. Linus ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/1] uprobes: Kill __replace_page(), change uprobe_write_opcode() to rely on gup(WRITE) 2013-12-10 20:16 ` Linus Torvalds @ 2013-12-10 21:24 ` Oleg Nesterov 0 siblings, 0 replies; 10+ messages in thread From: Oleg Nesterov @ 2013-12-10 21:24 UTC (permalink / raw) To: Linus Torvalds Cc: H. Peter Anvin, Ananth N Mavinakayanahalli, Andi Kleen, Borislav Petkov, Hugh Dickins, Ingo Molnar, Jiri Kosina, Peter Zijlstra, Srikar Dronamraju, Linux Kernel Mailing List On 12/10, Linus Torvalds wrote: > > On Tue, Dec 10, 2013 at 12:04 PM, Oleg Nesterov <oleg@redhat.com> wrote: > >> We'd be *much* better off using get_user_pages_fast() if possible - > >> and I bet _is_ possible in 99% of all cases. > > > > We can't. get_user_pages_fast() takes mmap_sem. > > Yeah, and we need to look up the page table entry anyway, so what we > actually want here is just the page table walker, Yes, this is clear. And damn, somehow I forgot that _fast() can (obviously) only use current->mm. > none of the "get > page" crap at all. > > So the core function should (I think) just do something like: I'll try to think, but this is what I actually tried to avoid. I mean, > Put another way: I actually think the existing "__replace_page()" code > is closer to being good than that disgusting uprobe_write_opcode() > function. I think you may be getting rid of the wrong ugly function. And perhaps you are right. But my only motivation was: rely on gup() to simplify this code. Yes, gup() is slow, but it works and we can avoid the "nontrivial" things like page_remove_rmap/munlock_vma_page. If we want to optimize this code, then this patch obviously goes to the wrong direction, I agree. Oleg. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/1] uprobes: Kill __replace_page(), change uprobe_write_opcode() to rely on gup(WRITE) 2013-12-10 20:04 ` Oleg Nesterov 2013-12-10 20:16 ` Linus Torvalds @ 2013-12-10 20:16 ` Oleg Nesterov 2013-12-10 20:19 ` Linus Torvalds 1 sibling, 1 reply; 10+ messages in thread From: Oleg Nesterov @ 2013-12-10 20:16 UTC (permalink / raw) To: Linus Torvalds Cc: H. Peter Anvin, Ananth N Mavinakayanahalli, Andi Kleen, Borislav Petkov, Hugh Dickins, Ingo Molnar, Jiri Kosina, Peter Zijlstra, Srikar Dronamraju, Linux Kernel Mailing List On 12/10, Oleg Nesterov wrote: > > On 12/10, Linus Torvalds wrote: > > > > We'd be *much* better off using get_user_pages_fast() if possible - > > and I bet _is_ possible in 99% of all cases. > > We can't. get_user_pages_fast() takes mmap_sem. Hmm. I am stupid, there is __get_user_pages_fast(). OK, I'll try to think more. Oleg. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/1] uprobes: Kill __replace_page(), change uprobe_write_opcode() to rely on gup(WRITE) 2013-12-10 20:16 ` Oleg Nesterov @ 2013-12-10 20:19 ` Linus Torvalds 0 siblings, 0 replies; 10+ messages in thread From: Linus Torvalds @ 2013-12-10 20:19 UTC (permalink / raw) To: Oleg Nesterov Cc: H. Peter Anvin, Ananth N Mavinakayanahalli, Andi Kleen, Borislav Petkov, Hugh Dickins, Ingo Molnar, Jiri Kosina, Peter Zijlstra, Srikar Dronamraju, Linux Kernel Mailing List On Tue, Dec 10, 2013 at 12:16 PM, Oleg Nesterov <oleg@redhat.com> wrote: > On 12/10, Oleg Nesterov wrote: >> >> On 12/10, Linus Torvalds wrote: >> > >> > We'd be *much* better off using get_user_pages_fast() if possible - >> > and I bet _is_ possible in 99% of all cases. >> >> We can't. get_user_pages_fast() takes mmap_sem. > > Hmm. I am stupid, there is __get_user_pages_fast(). OK, I'll try > to think more. There is indeed the non-locking version, however, since we want to get the pte-pointer anyway, I really think you were right in complaining about get_user_pages_fast() in the first place, and we'd be much better off just doing the page table lookup ourselves. Which we have to do later *anyway*. The whole [__]get_user_pages[_fast]() family of functions are for people who want the *page*. We want more than that. We really want the page table entry, since we have to potentially change it. Linus ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2013-12-10 21:24 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-12-09 21:18 [PATCH 0/1] uprobes: Kill __replace_page(), change uprobe_write_opcode() to rely on gup(WRITE) Oleg Nesterov 2013-12-09 21:18 ` [PATCH 1/1] " Oleg Nesterov 2013-12-10 2:08 ` [PATCH 0/1] " Linus Torvalds 2013-12-10 19:18 ` Oleg Nesterov 2013-12-10 19:38 ` Linus Torvalds 2013-12-10 20:04 ` Oleg Nesterov 2013-12-10 20:16 ` Linus Torvalds 2013-12-10 21:24 ` Oleg Nesterov 2013-12-10 20:16 ` Oleg Nesterov 2013-12-10 20:19 ` Linus Torvalds
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox