All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@elte.hu>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Masami Hiramatsu <mhiramat@redhat.com>,
	Randy Dunlap <rdunlap@xenotime.net>,
	Ananth N Mavinakayanahalli <ananth@in.ibm.com>,
	Jim Keniston <jkenisto@linux.vnet.ibm.com>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	"Frank Ch. Eigler" <fche@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Roland McGrath <roland@redhat.com>, Mel Gorman <mel@csn.ul.ie>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Hugh Dickins <hugh.dickins@tiscali.co.uk>,
	Rik van Riel <riel@redhat.com>
Subject: Re: [PATCH v2 7/11] Uprobes Implementation
Date: Fri, 23 Apr 2010 20:53:41 +0200	[thread overview]
Message-ID: <20100423185341.GA16129@redhat.com> (raw)
In-Reply-To: <20100423145813.GA5503@linux.vnet.ibm.com>

On 04/23, Srikar Dronamraju wrote:
>
> * Oleg Nesterov <oleg@redhat.com> [2010-04-22 17:40:59]:
>
> > On 04/22, Srikar Dronamraju wrote:
> > >
> > > I still need to verify this. I shall get back to you on this.
> > > However are there applications that mprotect(PROT_WRITE) text pages?
> >
> > Well, I think the kernel should assume that the user-space can do
> > anything.
> >
> > Hmm. And if this vma is VM_SHARED, then this bp could be actually
> > written to vm_file after mprotect().
>
> When I look through the load_.*_binary and load_.*_library functions,
> they seem to map the text regions MAP_PRIVATE|MAP_DENY_WRITE.

Sure, I didn't mean exec can use MAP_SHARED or mprotect().

> Also if vma are marked VM_SHARED and bp are inserted through ptrace,
> i.e(access_process_vm/get_user_pages), then we would still be writing to
> vm_file after mprotect?

Yes, that is why I mentioned register_uprobe() should check SHARED/MAYWRITE.

> Again, I am not sure if executable pages should be marked VM_SHARED.

Again, I didn't mean they should. But they can.

Not only VM_SHARED, the application can create the anonymous PROT_EXEC region,
in this case write_opcode() looks wrong, please see below.

> > 	@@ -2617,7 +2617,10 @@ int replace_page(struct vm_area_struct *vma, struct page *page,
> > 		}
> >
> > 		get_page(kpage);
> > 	-	page_add_anon_rmap(kpage, vma, addr);
> > 	+	if (PageAnon(kpage))
> > 	+		page_add_anon_rmap(kpage, vma, addr);
> > 	+	else
> > 	+		page_add_file_rmap(kpage);
> >
> > 		flush_cache_page(vma, addr, pte_pfn(*ptep));
> > 		ptep_clear_flush(vma, addr, ptep);
> >
> > I see no point in this patch, please see below.
> >
> > The next 4/11 patch introduces write_opcode() which roughly does:
> >
> > 	int write_opcode(unsigned long vaddr, user_bkpt_opcode_t opcode)
> > 	{
> > 		get_user_pages(write => false, &old_page);
> >
> > 		new_page = alloc_page_vma(...);
> >
> > 		... insert the bp into the new_page ...
> >
> > 		new_page->mapping = old_page->mapping;
> > 		new_page->index = old_page->index;
> >
> > 		replace_page(old_page, new_page);
> > 	}
> >
> > This doesn't look right at all to me.
> >
> > 	IF PageAnon(old_page):
> 		   ^^^ newpage

Yes,

> > 		in this case replace_page() calls page_add_anon_rmap() which
> > 		needs the locked page.
> >
> > 	ELSE:
> >
> > 		I don't think the new page should evere preserve the mapping,
> > 		this looks just wrong. It should be always anonymous.
>
> I did verify that page_add_file_rmap gets called from replace_page when
> we insert or remove a probe.

Of course! but see above, PageAnon() case is possible too. I think the
code should handle this case correctly anyway, but it seems it doesn't.
Not only page_add_anon_rmap() needs the locked page, I am not not sure
page_add_anon_rmap() is fine for write_opcode() which allocates the new
page. LRU? SetPageSwapBacked?

And you seem to miss my point. I think page_add_file_rmap() is always wrong.
I mean, no matter what is the page_mapping(old_page), the new page should be
mapped anonymously.

> I would leave it for vm experts to decide what the right thing to do.

Sure.

> > And in fact, I do not understand why write_opcode() needs replace_page().
> > It could just use get_user_pages(FOLL_WRITE | FOLL_FORCE), no? It should
> > create the anonymous page correctly.
>
> We were earlier doing access_process_vm that would inturn call
> get_user_pages to COW the page. However that needed that the threads of
> the target process be stopped.

OK, I missed this, thanks.

> Background page replacement was suggested by Linus and Peter.
> In this method.
> 1. we get a copy of the page.
> 2. modify the page
> 3. flush the tlbs.

OK.

I must admit, I don't understand the usage of the lockless get_pte() in
write_opcode(). replace_page() checks orig_pte, yes. But how this check
can help write_opcode and why it is needed? I do not think it can prevent
any race, pte can be changed even before write_opcode() calls get_pte().
I guess this is only done because replace_page() requires this argument?

Oleg.


  reply	other threads:[~2010-04-23 18:56 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-31 15:51 [PATCH v2 0/11] Uprobes patches Srikar Dronamraju
2010-03-31 15:51 ` [PATCH v2 1/11] Move Macro W to insn.h Srikar Dronamraju
2010-03-31 15:51 ` [PATCH v2 2/11] Move replace_page() to mm/memory.c Srikar Dronamraju
2010-03-31 15:51 ` [PATCH v2 3/11] Enhance replace_page() to support pagecache Srikar Dronamraju
2010-03-31 15:51 ` [PATCH v2 4/11] User Space Breakpoint Assistance Layer Srikar Dronamraju
2010-03-31 15:52 ` [PATCH v2 5/11] X86 details for user space breakpoint assistance Srikar Dronamraju
2010-03-31 15:52 ` [PATCH v2 6/11] Slot allocation for Execution out of line Srikar Dronamraju
2010-03-31 15:52 ` [PATCH v2 7/11] Uprobes Implementation Srikar Dronamraju
2010-04-13 18:35   ` Oleg Nesterov
2010-04-15  9:35     ` Srikar Dronamraju
2010-04-19 19:31       ` Oleg Nesterov
2010-04-20 12:43         ` Srikar Dronamraju
2010-04-20 15:30           ` Oleg Nesterov
2010-04-21  6:59             ` Srikar Dronamraju
2010-04-21 16:05               ` Oleg Nesterov
2010-04-22 13:31                 ` Srikar Dronamraju
2010-04-22 15:40                   ` Oleg Nesterov
2010-04-23 14:58                     ` Srikar Dronamraju
2010-04-23 18:53                       ` Oleg Nesterov [this message]
2010-05-11 20:47                       ` Peter Zijlstra
2010-05-11 20:44                     ` Peter Zijlstra
2010-05-11 20:45                     ` Peter Zijlstra
2010-05-12 10:31                       ` Srikar Dronamraju
2010-05-13 19:40                       ` Oleg Nesterov
2010-05-13 19:59                         ` Linus Torvalds
2010-05-13 22:12                           ` Andi Kleen
2010-05-13 22:25                             ` Linus Torvalds
2010-05-14  0:56                           ` Roland McGrath
2010-05-14  5:42                           ` Srikar Dronamraju
2010-05-11 20:43                   ` Peter Zijlstra
2010-05-12 10:41                     ` Srikar Dronamraju
2010-05-12 11:12                       ` Peter Zijlstra
2010-05-12 14:24                         ` Srikar Dronamraju
2010-05-11 20:32             ` Peter Zijlstra
2010-05-11 20:57               ` Frank Ch. Eigler
2010-05-11 21:01                 ` Peter Zijlstra
2010-03-31 15:52 ` [PATCH v2 8/11] X86 details for uprobes Srikar Dronamraju
2010-03-31 15:52 ` [PATCH v2 9/11] Uprobes Documentation patch Srikar Dronamraju
2010-03-31 15:52 ` [PATCH v2 10/11] Uprobes samples Srikar Dronamraju
2010-03-31 15:53 ` [PATCH v2 11/11] Uprobes traceevents patch Srikar Dronamraju
2010-03-31 21:24   ` Steven Rostedt
2010-04-01  4:16     ` Masami Hiramatsu
2010-05-12 14:57       ` Frederic Weisbecker
2010-05-12 11:02   ` Frederic Weisbecker
2010-05-12 14:34     ` Srikar Dronamraju
2010-05-12 15:15   ` Frederic Weisbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100423185341.GA16129@redhat.com \
    --to=oleg@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=ananth@in.ibm.com \
    --cc=fche@redhat.com \
    --cc=fweisbec@gmail.com \
    --cc=hugh.dickins@tiscali.co.uk \
    --cc=jkenisto@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mel@csn.ul.ie \
    --cc=mhiramat@redhat.com \
    --cc=mingo@elte.hu \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=rdunlap@xenotime.net \
    --cc=riel@redhat.com \
    --cc=roland@redhat.com \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.