* [PATCH v2 0/5] uprobes: write_opcode() cleanups
@ 2012-06-24 14:59 Oleg Nesterov
2012-06-24 15:00 ` [PATCH v2 1/5] uprobes: don't recheck vma/f_mapping in write_opcode() Oleg Nesterov
` (5 more replies)
0 siblings, 6 replies; 19+ messages in thread
From: Oleg Nesterov @ 2012-06-24 14:59 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Srikar Dronamraju
Cc: Ananth N Mavinakayanahalli, Anton Arapov, Hugh Dickins,
linux-kernel
Hello,
write_opcode() cleanups resend + new minor fix.
Changes:
- document the new argument in 2/5.
- drop the buggy 5/5, thanks Anton for your quick nack.
Probably I'll return to this later, I have another reason
for this change.
- so this 5/5 is new.
Srikar, please add your acks unless you have some objections.
Oleg.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v2 1/5] uprobes: don't recheck vma/f_mapping in write_opcode()
2012-06-24 14:59 [PATCH v2 0/5] uprobes: write_opcode() cleanups Oleg Nesterov
@ 2012-06-24 15:00 ` Oleg Nesterov
2012-07-09 10:30 ` Srikar Dronamraju
2012-06-24 15:00 ` [PATCH v2 2/5] uprobes: __replace_page() should not use page_address_in_vma() Oleg Nesterov
` (4 subsequent siblings)
5 siblings, 1 reply; 19+ messages in thread
From: Oleg Nesterov @ 2012-06-24 15:00 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Srikar Dronamraju
Cc: Ananth N Mavinakayanahalli, Anton Arapov, Hugh Dickins,
linux-kernel
write_opcode() rechecks valid_vma() and ->f_mapping, this is pointless.
The caller, register_for_each_vma() or uprobe_mmap(), has already done
these checks under mmap_sem.
To clarify, uprobe_mmap() checks valid_vma() only, but we can rely on
build_probe_list(vm_file->f_mapping->host).
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
kernel/events/uprobes.c | 19 +------------------
1 files changed, 1 insertions(+), 18 deletions(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index f935327..a2b32a5 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -206,33 +206,16 @@ static int write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
unsigned long vaddr, uprobe_opcode_t opcode)
{
struct page *old_page, *new_page;
- struct address_space *mapping;
void *vaddr_old, *vaddr_new;
struct vm_area_struct *vma;
- struct uprobe *uprobe;
int ret;
+
retry:
/* Read the page with vaddr into memory */
ret = get_user_pages(NULL, mm, vaddr, 1, 0, 0, &old_page, &vma);
if (ret <= 0)
return ret;
- ret = -EINVAL;
-
- /*
- * We are interested in text pages only. Our pages of interest
- * should be mapped for read and execute only. We desist from
- * adding probes in write mapped pages since the breakpoints
- * might end up in the file copy.
- */
- if (!valid_vma(vma, is_swbp_insn(&opcode)))
- goto put_out;
-
- uprobe = container_of(auprobe, struct uprobe, arch);
- mapping = uprobe->inode->i_mapping;
- if (mapping != vma->vm_file->f_mapping)
- goto put_out;
-
ret = -ENOMEM;
new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, vaddr);
if (!new_page)
--
1.5.5.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v2 2/5] uprobes: __replace_page() should not use page_address_in_vma()
2012-06-24 14:59 [PATCH v2 0/5] uprobes: write_opcode() cleanups Oleg Nesterov
2012-06-24 15:00 ` [PATCH v2 1/5] uprobes: don't recheck vma/f_mapping in write_opcode() Oleg Nesterov
@ 2012-06-24 15:00 ` Oleg Nesterov
2012-07-06 16:18 ` Oleg Nesterov
2012-06-24 15:00 ` [PATCH v2 3/5] uprobes: kill write_opcode()->lock_page(new_page) Oleg Nesterov
` (3 subsequent siblings)
5 siblings, 1 reply; 19+ messages in thread
From: Oleg Nesterov @ 2012-06-24 15:00 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Srikar Dronamraju
Cc: Ananth N Mavinakayanahalli, Anton Arapov, Hugh Dickins,
linux-kernel
page_address_in_vma(old_page) in __replace_page() is ugly and wrong.
The caller already knows the correct virtual address, this page was
found by get_user_pages(vaddr).
However, page_address_in_vma() can actually fail if page->mapping was
cleared by __delete_from_page_cache() after get_user_pages() returns.
But this means the race with page reclaim, write_opcode() should not
fail, it should retry and read this page again. Not sure this race is
really possible though, page_freeze_refs() logic should prevent it.
We could change __replace_page() to return -EAGAIN in this case, but
it would be better to simply use the caller's vaddr and rely on
page_check_address().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
kernel/events/uprobes.c | 11 ++++-------
1 files changed, 4 insertions(+), 7 deletions(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index a2b32a5..6fda799 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -127,22 +127,19 @@ static loff_t vma_address(struct vm_area_struct *vma, loff_t offset)
* based on replace_page in mm/ksm.c
*
* @vma: vma that holds the pte pointing to page
+ * @addr: address the old @page is mapped at
* @page: the cowed page we are replacing by kpage
* @kpage: the modified page we replace page by
*
* Returns 0 on success, -EFAULT on failure.
*/
-static int __replace_page(struct vm_area_struct *vma, struct page *page, struct page *kpage)
+static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
+ struct page *page, struct page *kpage)
{
struct mm_struct *mm = vma->vm_mm;
- unsigned long addr;
spinlock_t *ptl;
pte_t *ptep;
- addr = page_address_in_vma(page, vma);
- if (addr == -EFAULT)
- return -EFAULT;
-
ptep = page_check_address(page, mm, addr, &ptl, 0);
if (!ptep)
return -EAGAIN;
@@ -243,7 +240,7 @@ retry:
goto unlock_out;
lock_page(new_page);
- ret = __replace_page(vma, old_page, new_page);
+ ret = __replace_page(vma, vaddr, old_page, new_page);
unlock_page(new_page);
unlock_out:
--
1.5.5.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v2 3/5] uprobes: kill write_opcode()->lock_page(new_page)
2012-06-24 14:59 [PATCH v2 0/5] uprobes: write_opcode() cleanups Oleg Nesterov
2012-06-24 15:00 ` [PATCH v2 1/5] uprobes: don't recheck vma/f_mapping in write_opcode() Oleg Nesterov
2012-06-24 15:00 ` [PATCH v2 2/5] uprobes: __replace_page() should not use page_address_in_vma() Oleg Nesterov
@ 2012-06-24 15:00 ` Oleg Nesterov
2012-07-09 10:29 ` Srikar Dronamraju
2012-06-24 15:00 ` [PATCH v2 4/5] uprobes: cleanup and document write_opcode()->lock_page(old_page) Oleg Nesterov
` (2 subsequent siblings)
5 siblings, 1 reply; 19+ messages in thread
From: Oleg Nesterov @ 2012-06-24 15:00 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Srikar Dronamraju
Cc: Ananth N Mavinakayanahalli, Anton Arapov, Hugh Dickins,
linux-kernel
write_opcode() does lock_page(new_page) for no reason. Nobody can
see this page until __replace_page() exposes it under ptl lock, and
we do nothing with this page after pte_unmap_unlock().
If nothing else, the similar code in do_wp_page() doesn't lock the
new page for page_add_new_anon_rmap/set_pte_at_notify.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
kernel/events/uprobes.c | 2 --
1 files changed, 0 insertions(+), 2 deletions(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 6fda799..23c562b 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -239,9 +239,7 @@ retry:
if (ret)
goto unlock_out;
- lock_page(new_page);
ret = __replace_page(vma, vaddr, old_page, new_page);
- unlock_page(new_page);
unlock_out:
unlock_page(old_page);
--
1.5.5.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v2 4/5] uprobes: cleanup and document write_opcode()->lock_page(old_page)
2012-06-24 14:59 [PATCH v2 0/5] uprobes: write_opcode() cleanups Oleg Nesterov
` (2 preceding siblings ...)
2012-06-24 15:00 ` [PATCH v2 3/5] uprobes: kill write_opcode()->lock_page(new_page) Oleg Nesterov
@ 2012-06-24 15:00 ` Oleg Nesterov
2012-07-09 10:29 ` Srikar Dronamraju
2012-06-24 15:01 ` [PATCH v2 5/5] uprobes: __replace_page() needs munlock_vma_page() Oleg Nesterov
2012-07-06 10:54 ` [PATCH v2 0/5] uprobes: write_opcode() cleanups Ingo Molnar
5 siblings, 1 reply; 19+ messages in thread
From: Oleg Nesterov @ 2012-06-24 15:00 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Srikar Dronamraju
Cc: Ananth N Mavinakayanahalli, Anton Arapov, Hugh Dickins,
linux-kernel
The comment above write_opcode()->lock_page(old_page) tells about
the race with do_wp_page(). I don't really understand which exactly
race it means, but afaics this lock_page() was not enough to close
all races with do_wp_page().
Anyway, since 77fc4af1 this code is always called with ->mmap_sem
hold for writing so we can forget about do_wp_page().
However, we can't simply remove this lock_page(), and the only
(afaics) reason is __replace_page()->try_to_free_swap().
Nothing in write_opcode() needs it, move it into __replace_page()
and fix the comment.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
kernel/events/uprobes.c | 27 ++++++++++++++-------------
1 files changed, 14 insertions(+), 13 deletions(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 23c562b..5db150b 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -139,10 +139,15 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
struct mm_struct *mm = vma->vm_mm;
spinlock_t *ptl;
pte_t *ptep;
+ int err;
+ /* freeze PageSwapCache() for try_to_free_swap() below */
+ lock_page(page);
+
+ err = -EAGAIN;
ptep = page_check_address(page, mm, addr, &ptl, 0);
if (!ptep)
- return -EAGAIN;
+ goto unlock;
get_page(kpage);
page_add_new_anon_rmap(kpage, vma, addr);
@@ -162,7 +167,10 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
put_page(page);
pte_unmap_unlock(ptep, ptl);
- return 0;
+ err = 0;
+ unlock:
+ unlock_page(page);
+ return err;
}
/**
@@ -216,15 +224,10 @@ retry:
ret = -ENOMEM;
new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, vaddr);
if (!new_page)
- goto put_out;
+ goto put_old;
__SetPageUptodate(new_page);
- /*
- * lock page will serialize against do_wp_page()'s
- * PageAnon() handling
- */
- lock_page(old_page);
/* copy the page now that we've got it stable */
vaddr_old = kmap_atomic(old_page);
vaddr_new = kmap_atomic(new_page);
@@ -237,15 +240,13 @@ retry:
ret = anon_vma_prepare(vma);
if (ret)
- goto unlock_out;
+ goto put_new;
ret = __replace_page(vma, vaddr, old_page, new_page);
-unlock_out:
- unlock_page(old_page);
+put_new:
page_cache_release(new_page);
-
-put_out:
+put_old:
put_page(old_page);
if (unlikely(ret == -EAGAIN))
--
1.5.5.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v2 5/5] uprobes: __replace_page() needs munlock_vma_page()
2012-06-24 14:59 [PATCH v2 0/5] uprobes: write_opcode() cleanups Oleg Nesterov
` (3 preceding siblings ...)
2012-06-24 15:00 ` [PATCH v2 4/5] uprobes: cleanup and document write_opcode()->lock_page(old_page) Oleg Nesterov
@ 2012-06-24 15:01 ` Oleg Nesterov
2012-06-26 11:55 ` Anton Arapov
2012-07-06 10:54 ` [PATCH v2 0/5] uprobes: write_opcode() cleanups Ingo Molnar
5 siblings, 1 reply; 19+ messages in thread
From: Oleg Nesterov @ 2012-06-24 15:01 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Srikar Dronamraju
Cc: Ananth N Mavinakayanahalli, Anton Arapov, Hugh Dickins,
linux-kernel
Like do_wp_page(), __replace_page() should do munlock_vma_page()
for the case when the old page still has other !VM_LOCKED mappings.
Unfortunately this needs mm/internal.h.
Also, move put_page() outside of ptl lock. This doesn't really
matter but looks better.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
kernel/events/uprobes.c | 8 ++++++--
1 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 5db150b..889c62b 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -32,6 +32,7 @@
#include <linux/swap.h> /* try_to_free_swap */
#include <linux/ptrace.h> /* user_enable_single_step */
#include <linux/kdebug.h> /* notifier mechanism */
+#include "../../mm/internal.h" /* munlock_vma_page */
#include <linux/uprobes.h>
@@ -141,7 +142,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
pte_t *ptep;
int err;
- /* freeze PageSwapCache() for try_to_free_swap() below */
+ /* For try_to_free_swap() and munlock_vma_page() below */
lock_page(page);
err = -EAGAIN;
@@ -164,9 +165,12 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
page_remove_rmap(page);
if (!page_mapped(page))
try_to_free_swap(page);
- put_page(page);
pte_unmap_unlock(ptep, ptl);
+ if (vma->vm_flags & VM_LOCKED)
+ munlock_vma_page(page);
+ put_page(page);
+
err = 0;
unlock:
unlock_page(page);
--
1.5.5.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH v2 5/5] uprobes: __replace_page() needs munlock_vma_page()
2012-06-24 15:01 ` [PATCH v2 5/5] uprobes: __replace_page() needs munlock_vma_page() Oleg Nesterov
@ 2012-06-26 11:55 ` Anton Arapov
2012-06-26 15:47 ` Oleg Nesterov
0 siblings, 1 reply; 19+ messages in thread
From: Anton Arapov @ 2012-06-26 11:55 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Ingo Molnar, Peter Zijlstra, Srikar Dronamraju,
Ananth N Mavinakayanahalli, Hugh Dickins, linux-kernel
On Sun, Jun 24, 2012 at 05:01:11PM +0200, Oleg Nesterov wrote:
> Like do_wp_page(), __replace_page() should do munlock_vma_page()
> for the case when the old page still has other !VM_LOCKED mappings.
> Unfortunately this needs mm/internal.h.
>
> Also, move put_page() outside of ptl lock. This doesn't really
> matter but looks better.
>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
> kernel/events/uprobes.c | 8 ++++++--
> 1 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 5db150b..889c62b 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -32,6 +32,7 @@
> #include <linux/swap.h> /* try_to_free_swap */
> #include <linux/ptrace.h> /* user_enable_single_step */
> #include <linux/kdebug.h> /* notifier mechanism */
> +#include "../../mm/internal.h" /* munlock_vma_page */
We have vma_adress() defined in internal.h, under #ifdef CONFIG_TRANSPARENT_HUGEPAGE .
NAK. :-)
Anton.
>
> #include <linux/uprobes.h>
>
> @@ -141,7 +142,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
> pte_t *ptep;
> int err;
>
> - /* freeze PageSwapCache() for try_to_free_swap() below */
> + /* For try_to_free_swap() and munlock_vma_page() below */
> lock_page(page);
>
> err = -EAGAIN;
> @@ -164,9 +165,12 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
> page_remove_rmap(page);
> if (!page_mapped(page))
> try_to_free_swap(page);
> - put_page(page);
> pte_unmap_unlock(ptep, ptl);
>
> + if (vma->vm_flags & VM_LOCKED)
> + munlock_vma_page(page);
> + put_page(page);
> +
> err = 0;
> unlock:
> unlock_page(page);
> --
> 1.5.5.1
>
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 5/5] uprobes: __replace_page() needs munlock_vma_page()
2012-06-26 11:55 ` Anton Arapov
@ 2012-06-26 15:47 ` Oleg Nesterov
0 siblings, 0 replies; 19+ messages in thread
From: Oleg Nesterov @ 2012-06-26 15:47 UTC (permalink / raw)
To: Anton Arapov
Cc: Ingo Molnar, Peter Zijlstra, Srikar Dronamraju,
Ananth N Mavinakayanahalli, Hugh Dickins, linux-kernel
On 06/26, Anton Arapov wrote:
>
> On Sun, Jun 24, 2012 at 05:01:11PM +0200, Oleg Nesterov wrote:
> > Like do_wp_page(), __replace_page() should do munlock_vma_page()
> > for the case when the old page still has other !VM_LOCKED mappings.
> > Unfortunately this needs mm/internal.h.
> >
> > --- a/kernel/events/uprobes.c
> > +++ b/kernel/events/uprobes.c
> > @@ -32,6 +32,7 @@
> > #include <linux/swap.h> /* try_to_free_swap */
> > #include <linux/ptrace.h> /* user_enable_single_step */
> > #include <linux/kdebug.h> /* notifier mechanism */
> > +#include "../../mm/internal.h" /* munlock_vma_page */
>
> We have vma_adress() defined in internal.h, under #ifdef CONFIG_TRANSPARENT_HUGEPAGE .
>
> NAK. :-)
Damn you Anton ;) You seem to dislike number 5...
OK, please ignore this patch.
Can't resist. I swear, I specially checked mm/internal.h to ensure
it doesn't export mm/rmap.c:vma_address(), can't understand how I
didn't notice this declaration.
Oleg.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 0/5] uprobes: write_opcode() cleanups
2012-06-24 14:59 [PATCH v2 0/5] uprobes: write_opcode() cleanups Oleg Nesterov
` (4 preceding siblings ...)
2012-06-24 15:01 ` [PATCH v2 5/5] uprobes: __replace_page() needs munlock_vma_page() Oleg Nesterov
@ 2012-07-06 10:54 ` Ingo Molnar
2012-07-06 16:07 ` Oleg Nesterov
5 siblings, 1 reply; 19+ messages in thread
From: Ingo Molnar @ 2012-07-06 10:54 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Peter Zijlstra, Srikar Dronamraju, Ananth N Mavinakayanahalli,
Anton Arapov, Hugh Dickins, linux-kernel
* Oleg Nesterov <oleg@redhat.com> wrote:
> Hello,
>
> write_opcode() cleanups resend + new minor fix.
>
> Changes:
>
> - document the new argument in 2/5.
>
> - drop the buggy 5/5, thanks Anton for your quick nack.
> Probably I'll return to this later, I have another reason
> for this change.
>
> - so this 5/5 is new.
>
> Srikar, please add your acks unless you have some objections.
Just wondering what's the review status of patches #1-#4?
I'll skip #5 based on Oleg's request, but the others are looking
good.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 0/5] uprobes: write_opcode() cleanups
2012-07-06 10:54 ` [PATCH v2 0/5] uprobes: write_opcode() cleanups Ingo Molnar
@ 2012-07-06 16:07 ` Oleg Nesterov
2012-07-06 16:13 ` Oleg Nesterov
2012-07-09 10:27 ` Srikar Dronamraju
0 siblings, 2 replies; 19+ messages in thread
From: Oleg Nesterov @ 2012-07-06 16:07 UTC (permalink / raw)
To: Ingo Molnar
Cc: Peter Zijlstra, Srikar Dronamraju, Ananth N Mavinakayanahalli,
Anton Arapov, Hugh Dickins, linux-kernel
On 07/06, Ingo Molnar wrote:
>
> * Oleg Nesterov <oleg@redhat.com> wrote:
>
> > Hello,
> >
> > write_opcode() cleanups resend + new minor fix.
> >
> > Changes:
> >
> > - document the new argument in 2/5.
> >
> > - drop the buggy 5/5, thanks Anton for your quick nack.
> > Probably I'll return to this later, I have another reason
> > for this change.
> >
> > - so this 5/5 is new.
> >
> > Srikar, please add your acks unless you have some objections.
>
> Just wondering what's the review status of patches #1-#4?
I hope Srikar will ack 1-4 soon.
He observed the testing failures, but it turns out this series
is innocent.
I'll send more fixes soon.
> I'll skip #5 based on Oleg's request,
Yes, thanks. I still think 5/5 makes sense, but we need to do
something with uprobe.s:vma_address() first.
Oleg.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 0/5] uprobes: write_opcode() cleanups
2012-07-06 16:07 ` Oleg Nesterov
@ 2012-07-06 16:13 ` Oleg Nesterov
2012-07-09 9:12 ` Ingo Molnar
2012-07-09 10:27 ` Srikar Dronamraju
1 sibling, 1 reply; 19+ messages in thread
From: Oleg Nesterov @ 2012-07-06 16:13 UTC (permalink / raw)
To: Ingo Molnar
Cc: Peter Zijlstra, Srikar Dronamraju, Ananth N Mavinakayanahalli,
Anton Arapov, Hugh Dickins, linux-kernel
On 07/06, Oleg Nesterov wrote:
>
> On 07/06, Ingo Molnar wrote:
> >
> > * Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > > Hello,
> > >
> > > write_opcode() cleanups resend + new minor fix.
> > >
> > > Changes:
> > >
> > > - document the new argument in 2/5.
> > >
> > > - drop the buggy 5/5, thanks Anton for your quick nack.
> > > Probably I'll return to this later, I have another reason
> > > for this change.
> > >
> > > - so this 5/5 is new.
> > >
> > > Srikar, please add your acks unless you have some objections.
> >
> > Just wondering what's the review status of patches #1-#4?
>
> I hope Srikar will ack 1-4 soon.
>
> He observed the testing failures, but it turns out this series
> is innocent.
>
> I'll send more fixes soon.
>
> > I'll skip #5 based on Oleg's request,
>
> Yes, thanks. I still think 5/5 makes sense, but we need to do
> something with uprobe.s:vma_address() first.
Argh.
I wrote this email because I wanted to say that I updated the
changelog for 2/5 a little bit, but forgot to mention this.
I'll send the updated patch in reply to 2/5 (once again, only
the changelog was changed). But please let me know if you want
me to resend 1-4.
Oleg.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 2/5] uprobes: __replace_page() should not use page_address_in_vma()
2012-06-24 15:00 ` [PATCH v2 2/5] uprobes: __replace_page() should not use page_address_in_vma() Oleg Nesterov
@ 2012-07-06 16:18 ` Oleg Nesterov
2012-07-09 10:28 ` Srikar Dronamraju
0 siblings, 1 reply; 19+ messages in thread
From: Oleg Nesterov @ 2012-07-06 16:18 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Srikar Dronamraju
Cc: Ananth N Mavinakayanahalli, Anton Arapov, Hugh Dickins,
linux-kernel
On 06/24, Oleg Nesterov wrote:
>
> However, page_address_in_vma() can actually fail if page->mapping was
> cleared by __delete_from_page_cache() after get_user_pages() returns.
> But this means the race with page reclaim, write_opcode() should not
> fail, it should retry and read this page again. Not sure this race is
> really possible though, page_freeze_refs() logic should prevent it.
This looks a bit confusing, I'll try to to make this more clear...
The patch itself was not changed.
------------------------------------------------------------------------------
Subject: [PATCH v2 2/5] uprobes: __replace_page() should not use page_address_in_vma()
page_address_in_vma(old_page) in __replace_page() is ugly and wrong.
The caller already knows the correct virtual address, this page was
found by get_user_pages(vaddr).
However, page_address_in_vma() can actually fail if page->mapping was
cleared by __delete_from_page_cache() after get_user_pages() returns.
But this means the race with page reclaim, write_opcode() should not
fail, it should retry and read this page again. Probably the race with
remove_mapping() is not possible due to page_freeze_refs() logic, but
afaics at least shmem_writepage()->shmem_delete_from_page_cache() can
clear ->mapping.
We could change __replace_page() to return -EAGAIN in this case, but
it would be better to simply use the caller's vaddr and rely on
page_check_address().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
kernel/events/uprobes.c | 11 ++++-------
1 files changed, 4 insertions(+), 7 deletions(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index a2b32a5..6fda799 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -127,22 +127,19 @@ static loff_t vma_address(struct vm_area_struct *vma, loff_t offset)
* based on replace_page in mm/ksm.c
*
* @vma: vma that holds the pte pointing to page
+ * @addr: address the old @page is mapped at
* @page: the cowed page we are replacing by kpage
* @kpage: the modified page we replace page by
*
* Returns 0 on success, -EFAULT on failure.
*/
-static int __replace_page(struct vm_area_struct *vma, struct page *page, struct page *kpage)
+static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
+ struct page *page, struct page *kpage)
{
struct mm_struct *mm = vma->vm_mm;
- unsigned long addr;
spinlock_t *ptl;
pte_t *ptep;
- addr = page_address_in_vma(page, vma);
- if (addr == -EFAULT)
- return -EFAULT;
-
ptep = page_check_address(page, mm, addr, &ptl, 0);
if (!ptep)
return -EAGAIN;
@@ -243,7 +240,7 @@ retry:
goto unlock_out;
lock_page(new_page);
- ret = __replace_page(vma, old_page, new_page);
+ ret = __replace_page(vma, vaddr, old_page, new_page);
unlock_page(new_page);
unlock_out:
--
1.5.5.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH v2 0/5] uprobes: write_opcode() cleanups
2012-07-06 16:13 ` Oleg Nesterov
@ 2012-07-09 9:12 ` Ingo Molnar
2012-07-09 13:30 ` Oleg Nesterov
0 siblings, 1 reply; 19+ messages in thread
From: Ingo Molnar @ 2012-07-09 9:12 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Peter Zijlstra, Srikar Dronamraju, Ananth N Mavinakayanahalli,
Anton Arapov, Hugh Dickins, linux-kernel
* Oleg Nesterov <oleg@redhat.com> wrote:
> On 07/06, Oleg Nesterov wrote:
> >
> > On 07/06, Ingo Molnar wrote:
> > >
> > > * Oleg Nesterov <oleg@redhat.com> wrote:
> > >
> > > > Hello,
> > > >
> > > > write_opcode() cleanups resend + new minor fix.
> > > >
> > > > Changes:
> > > >
> > > > - document the new argument in 2/5.
> > > >
> > > > - drop the buggy 5/5, thanks Anton for your quick nack.
> > > > Probably I'll return to this later, I have another reason
> > > > for this change.
> > > >
> > > > - so this 5/5 is new.
> > > >
> > > > Srikar, please add your acks unless you have some objections.
> > >
> > > Just wondering what's the review status of patches #1-#4?
> >
> > I hope Srikar will ack 1-4 soon.
> >
> > He observed the testing failures, but it turns out this series
> > is innocent.
> >
> > I'll send more fixes soon.
> >
> > > I'll skip #5 based on Oleg's request,
> >
> > Yes, thanks. I still think 5/5 makes sense, but we need to do
> > something with uprobe.s:vma_address() first.
>
> Argh.
>
> I wrote this email because I wanted to say that I updated the
> changelog for 2/5 a little bit, but forgot to mention this.
> I'll send the updated patch in reply to 2/5 (once again, only
> the changelog was changed). But please let me know if you want
> me to resend 1-4.
Once Srikar's ack is in then it would be nice to update the
patches with the ack and resend #1-#4, to make sure I have all
the intended patches and nothing more or less than that.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 0/5] uprobes: write_opcode() cleanups
2012-07-06 16:07 ` Oleg Nesterov
2012-07-06 16:13 ` Oleg Nesterov
@ 2012-07-09 10:27 ` Srikar Dronamraju
1 sibling, 0 replies; 19+ messages in thread
From: Srikar Dronamraju @ 2012-07-09 10:27 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Ingo Molnar, Peter Zijlstra, Ananth N Mavinakayanahalli,
Anton Arapov, Hugh Dickins, linux-kernel
* Oleg Nesterov <oleg@redhat.com> [2012-07-06 18:07:04]:
> On 07/06, Ingo Molnar wrote:
> >
> > * Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > > Hello,
> > >
> > > write_opcode() cleanups resend + new minor fix.
> > >
> > > Changes:
> > >
> > > - document the new argument in 2/5.
> > >
> > > - drop the buggy 5/5, thanks Anton for your quick nack.
> > > Probably I'll return to this later, I have another reason
> > > for this change.
> > >
> > > - so this 5/5 is new.
> > >
> > > Srikar, please add your acks unless you have some objections.
> >
> > Just wondering what's the review status of patches #1-#4?
>
> I hope Srikar will ack 1-4 soon.
>
> He observed the testing failures, but it turns out this series
> is innocent.
>
The 1st patch in your series, seems to have solved the problem.
I am not able to reproduce the problem for now .. Will keep running for
some time.
I will send out the acks for each of the patches in these series now.
--
Thanks and Regards
Srikar
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 2/5] uprobes: __replace_page() should not use page_address_in_vma()
2012-07-06 16:18 ` Oleg Nesterov
@ 2012-07-09 10:28 ` Srikar Dronamraju
0 siblings, 0 replies; 19+ messages in thread
From: Srikar Dronamraju @ 2012-07-09 10:28 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Ingo Molnar, Peter Zijlstra, Ananth N Mavinakayanahalli,
Anton Arapov, Hugh Dickins, linux-kernel
> ------------------------------------------------------------------------------
> Subject: [PATCH v2 2/5] uprobes: __replace_page() should not use page_address_in_vma()
>
> page_address_in_vma(old_page) in __replace_page() is ugly and wrong.
> The caller already knows the correct virtual address, this page was
> found by get_user_pages(vaddr).
>
> However, page_address_in_vma() can actually fail if page->mapping was
> cleared by __delete_from_page_cache() after get_user_pages() returns.
> But this means the race with page reclaim, write_opcode() should not
> fail, it should retry and read this page again. Probably the race with
> remove_mapping() is not possible due to page_freeze_refs() logic, but
> afaics at least shmem_writepage()->shmem_delete_from_page_cache() can
> clear ->mapping.
>
> We could change __replace_page() to return -EAGAIN in this case, but
> it would be better to simply use the caller's vaddr and rely on
> page_check_address().
>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 4/5] uprobes: cleanup and document write_opcode()->lock_page(old_page)
2012-06-24 15:00 ` [PATCH v2 4/5] uprobes: cleanup and document write_opcode()->lock_page(old_page) Oleg Nesterov
@ 2012-07-09 10:29 ` Srikar Dronamraju
0 siblings, 0 replies; 19+ messages in thread
From: Srikar Dronamraju @ 2012-07-09 10:29 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Ingo Molnar, Peter Zijlstra, Ananth N Mavinakayanahalli,
Anton Arapov, Hugh Dickins, linux-kernel
* Oleg Nesterov <oleg@redhat.com> [2012-06-24 17:00:53]:
> The comment above write_opcode()->lock_page(old_page) tells about
> the race with do_wp_page(). I don't really understand which exactly
> race it means, but afaics this lock_page() was not enough to close
> all races with do_wp_page().
>
> Anyway, since 77fc4af1 this code is always called with ->mmap_sem
> hold for writing so we can forget about do_wp_page().
>
> However, we can't simply remove this lock_page(), and the only
> (afaics) reason is __replace_page()->try_to_free_swap().
>
> Nothing in write_opcode() needs it, move it into __replace_page()
> and fix the comment.
>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 3/5] uprobes: kill write_opcode()->lock_page(new_page)
2012-06-24 15:00 ` [PATCH v2 3/5] uprobes: kill write_opcode()->lock_page(new_page) Oleg Nesterov
@ 2012-07-09 10:29 ` Srikar Dronamraju
0 siblings, 0 replies; 19+ messages in thread
From: Srikar Dronamraju @ 2012-07-09 10:29 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Ingo Molnar, Peter Zijlstra, Ananth N Mavinakayanahalli,
Anton Arapov, Hugh Dickins, linux-kernel
* Oleg Nesterov <oleg@redhat.com> [2012-06-24 17:00:35]:
> write_opcode() does lock_page(new_page) for no reason. Nobody can
> see this page until __replace_page() exposes it under ptl lock, and
> we do nothing with this page after pte_unmap_unlock().
>
> If nothing else, the similar code in do_wp_page() doesn't lock the
> new page for page_add_new_anon_rmap/set_pte_at_notify.
>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju<srikar@linux.vnet.ibm.com>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 1/5] uprobes: don't recheck vma/f_mapping in write_opcode()
2012-06-24 15:00 ` [PATCH v2 1/5] uprobes: don't recheck vma/f_mapping in write_opcode() Oleg Nesterov
@ 2012-07-09 10:30 ` Srikar Dronamraju
0 siblings, 0 replies; 19+ messages in thread
From: Srikar Dronamraju @ 2012-07-09 10:30 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Ingo Molnar, Peter Zijlstra, Ananth N Mavinakayanahalli,
Anton Arapov, Hugh Dickins, linux-kernel
* Oleg Nesterov <oleg@redhat.com> [2012-06-24 17:00:02]:
> write_opcode() rechecks valid_vma() and ->f_mapping, this is pointless.
> The caller, register_for_each_vma() or uprobe_mmap(), has already done
> these checks under mmap_sem.
>
> To clarify, uprobe_mmap() checks valid_vma() only, but we can rely on
> build_probe_list(vm_file->f_mapping->host).
>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 0/5] uprobes: write_opcode() cleanups
2012-07-09 9:12 ` Ingo Molnar
@ 2012-07-09 13:30 ` Oleg Nesterov
0 siblings, 0 replies; 19+ messages in thread
From: Oleg Nesterov @ 2012-07-09 13:30 UTC (permalink / raw)
To: Ingo Molnar
Cc: Peter Zijlstra, Srikar Dronamraju, Ananth N Mavinakayanahalli,
Anton Arapov, Hugh Dickins, linux-kernel
On 07/09, Ingo Molnar wrote:
>
> Once Srikar's ack is in then it would be nice to update the
> patches with the ack and resend #1-#4, to make sure I have all
> the intended patches and nothing more or less than that.
OK, will do.
I'll wait until Srikar reviews the new series, and then I'll
send everything acked.
Oleg.
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2012-07-09 13:33 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-06-24 14:59 [PATCH v2 0/5] uprobes: write_opcode() cleanups Oleg Nesterov
2012-06-24 15:00 ` [PATCH v2 1/5] uprobes: don't recheck vma/f_mapping in write_opcode() Oleg Nesterov
2012-07-09 10:30 ` Srikar Dronamraju
2012-06-24 15:00 ` [PATCH v2 2/5] uprobes: __replace_page() should not use page_address_in_vma() Oleg Nesterov
2012-07-06 16:18 ` Oleg Nesterov
2012-07-09 10:28 ` Srikar Dronamraju
2012-06-24 15:00 ` [PATCH v2 3/5] uprobes: kill write_opcode()->lock_page(new_page) Oleg Nesterov
2012-07-09 10:29 ` Srikar Dronamraju
2012-06-24 15:00 ` [PATCH v2 4/5] uprobes: cleanup and document write_opcode()->lock_page(old_page) Oleg Nesterov
2012-07-09 10:29 ` Srikar Dronamraju
2012-06-24 15:01 ` [PATCH v2 5/5] uprobes: __replace_page() needs munlock_vma_page() Oleg Nesterov
2012-06-26 11:55 ` Anton Arapov
2012-06-26 15:47 ` Oleg Nesterov
2012-07-06 10:54 ` [PATCH v2 0/5] uprobes: write_opcode() cleanups Ingo Molnar
2012-07-06 16:07 ` Oleg Nesterov
2012-07-06 16:13 ` Oleg Nesterov
2012-07-09 9:12 ` Ingo Molnar
2012-07-09 13:30 ` Oleg Nesterov
2012-07-09 10:27 ` Srikar Dronamraju
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).