All of lore.kernel.org
 help / color / mirror / Atom feed
From: Cyrill Gorcunov <gorcunov@gmail.com>
To: Dave Jones <davej@redhat.com>
Cc: Hugh Dickins <hughd@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Hillf Danton <dhillf@gmail.com>, Linux-MM <linux-mm@kvack.org>,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Pavel Emelyanov <xemul@parallels.com>
Subject: Re: unused swap offset / bad page map.
Date: Tue, 27 Aug 2013 12:37:18 +0400	[thread overview]
Message-ID: <20130827083718.GC7416@moon> (raw)
In-Reply-To: <20130826222833.GA24320@redhat.com>

On Mon, Aug 26, 2013 at 06:28:33PM -0400, Dave Jones wrote:
>  > 
>  > I've not tried matching up bits with Dave's reports, and just going
>  > into a meeting now, but this patch looks worth a try: probably Cyrill
>  > can improve it meanwhile to what he actually wants there (I'm
>  > surprised anything special is needed for just moving a pte).
>  > 
>  > Hugh
>  > 
>  > --- 3.11-rc7/mm/mremap.c	2013-07-14 17:10:16.640003652 -0700
>  > +++ linux/mm/mremap.c	2013-08-26 14:46:14.460027627 -0700
>  > @@ -126,7 +126,7 @@ static void move_ptes(struct vm_area_str
>  >  			continue;
>  >  		pte = ptep_get_and_clear(mm, old_addr, old_pte);
>  >  		pte = move_pte(pte, new_vma->vm_page_prot, old_addr, new_addr);
>  > -		set_pte_at(mm, new_addr, new_pte, pte_mksoft_dirty(pte));
>  > +		set_pte_at(mm, new_addr, new_pte, pte);
>  >  	}
> 
> I'll give this a shot once I'm done with the bisect.

I managed to trigger the issue as well. The patch below fixes it.
Dave, could you please give it a shot once time permit?

Pavel, I kept 'make it dirty on move' logic, but i'm somehow doubt
in it, won't plain pte copying (as in Hugh's patch) work of us?
---
From: Cyrill Gorcunov <gorcunov@gmail.com>
Subject: [PATCH] mm: move_ptes -- Set soft dirty bit depending on pte type

Dave reported corrupted swap entries

 | [ 4588.541886] swap_free: Unused swap offset entry 00002d15
 | [ 4588.541952] BUG: Bad page map in process trinity-kid12  pte:005a2a80 pmd:22c01f067

and Hugh pointed that in move_ptes _PAGE_SOFT_DIRTY bit
set regardless the type of entry pte consists of. The
trick here is that -- when we carry soft dirty status
in swap entries we are to use _PAGE_SWP_SOFT_DIRTY instead,
because this is the only place in pte which can be used
for own needs without intersecting with bits owned by
swap entry type/offset.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 mm/mremap.c |   21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

Index: linux-2.6.git/mm/mremap.c
===================================================================
--- linux-2.6.git.orig/mm/mremap.c
+++ linux-2.6.git/mm/mremap.c
@@ -15,6 +15,7 @@
 #include <linux/swap.h>
 #include <linux/capability.h>
 #include <linux/fs.h>
+#include <linux/swapops.h>
 #include <linux/highmem.h>
 #include <linux/security.h>
 #include <linux/syscalls.h>
@@ -69,6 +70,23 @@ static pmd_t *alloc_new_pmd(struct mm_st
 	return pmd;
 }
 
+static pte_t move_soft_dirty_pte(pte_t pte)
+{
+	/*
+	 * Set soft dirty bit so we can notice
+	 * in userspace the ptes were moved.
+	 */
+#ifdef CONFIG_MEM_SOFT_DIRTY
+	if (pte_present(pte))
+		pte = pte_mksoft_dirty(pte);
+	else if (is_swap_pte(pte))
+		pte = pte_swp_mksoft_dirty(pte);
+	else if (pte_file(pte))
+		pte = pte_file_mksoft_dirty(pte);
+#endif
+	return pte;
+}
+
 static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd,
 		unsigned long old_addr, unsigned long old_end,
 		struct vm_area_struct *new_vma, pmd_t *new_pmd,
@@ -126,7 +144,8 @@ static void move_ptes(struct vm_area_str
 			continue;
 		pte = ptep_get_and_clear(mm, old_addr, old_pte);
 		pte = move_pte(pte, new_vma->vm_page_prot, old_addr, new_addr);
-		set_pte_at(mm, new_addr, new_pte, pte_mksoft_dirty(pte));
+		pte = move_soft_dirty_pte(pte);
+		set_pte_at(mm, new_addr, new_pte, pte);
 	}
 
 	arch_leave_lazy_mmu_mode();

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Cyrill Gorcunov <gorcunov@gmail.com>
To: Dave Jones <davej@redhat.com>
Cc: Hugh Dickins <hughd@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Hillf Danton <dhillf@gmail.com>, Linux-MM <linux-mm@kvack.org>,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Pavel Emelyanov <xemul@parallels.com>
Subject: Re: unused swap offset / bad page map.
Date: Tue, 27 Aug 2013 12:37:18 +0400	[thread overview]
Message-ID: <20130827083718.GC7416@moon> (raw)
In-Reply-To: <20130826222833.GA24320@redhat.com>

On Mon, Aug 26, 2013 at 06:28:33PM -0400, Dave Jones wrote:
>  > 
>  > I've not tried matching up bits with Dave's reports, and just going
>  > into a meeting now, but this patch looks worth a try: probably Cyrill
>  > can improve it meanwhile to what he actually wants there (I'm
>  > surprised anything special is needed for just moving a pte).
>  > 
>  > Hugh
>  > 
>  > --- 3.11-rc7/mm/mremap.c	2013-07-14 17:10:16.640003652 -0700
>  > +++ linux/mm/mremap.c	2013-08-26 14:46:14.460027627 -0700
>  > @@ -126,7 +126,7 @@ static void move_ptes(struct vm_area_str
>  >  			continue;
>  >  		pte = ptep_get_and_clear(mm, old_addr, old_pte);
>  >  		pte = move_pte(pte, new_vma->vm_page_prot, old_addr, new_addr);
>  > -		set_pte_at(mm, new_addr, new_pte, pte_mksoft_dirty(pte));
>  > +		set_pte_at(mm, new_addr, new_pte, pte);
>  >  	}
> 
> I'll give this a shot once I'm done with the bisect.

I managed to trigger the issue as well. The patch below fixes it.
Dave, could you please give it a shot once time permit?

Pavel, I kept 'make it dirty on move' logic, but i'm somehow doubt
in it, won't plain pte copying (as in Hugh's patch) work of us?
---
From: Cyrill Gorcunov <gorcunov@gmail.com>
Subject: [PATCH] mm: move_ptes -- Set soft dirty bit depending on pte type

Dave reported corrupted swap entries

 | [ 4588.541886] swap_free: Unused swap offset entry 00002d15
 | [ 4588.541952] BUG: Bad page map in process trinity-kid12  pte:005a2a80 pmd:22c01f067

and Hugh pointed that in move_ptes _PAGE_SOFT_DIRTY bit
set regardless the type of entry pte consists of. The
trick here is that -- when we carry soft dirty status
in swap entries we are to use _PAGE_SWP_SOFT_DIRTY instead,
because this is the only place in pte which can be used
for own needs without intersecting with bits owned by
swap entry type/offset.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 mm/mremap.c |   21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

Index: linux-2.6.git/mm/mremap.c
===================================================================
--- linux-2.6.git.orig/mm/mremap.c
+++ linux-2.6.git/mm/mremap.c
@@ -15,6 +15,7 @@
 #include <linux/swap.h>
 #include <linux/capability.h>
 #include <linux/fs.h>
+#include <linux/swapops.h>
 #include <linux/highmem.h>
 #include <linux/security.h>
 #include <linux/syscalls.h>
@@ -69,6 +70,23 @@ static pmd_t *alloc_new_pmd(struct mm_st
 	return pmd;
 }
 
+static pte_t move_soft_dirty_pte(pte_t pte)
+{
+	/*
+	 * Set soft dirty bit so we can notice
+	 * in userspace the ptes were moved.
+	 */
+#ifdef CONFIG_MEM_SOFT_DIRTY
+	if (pte_present(pte))
+		pte = pte_mksoft_dirty(pte);
+	else if (is_swap_pte(pte))
+		pte = pte_swp_mksoft_dirty(pte);
+	else if (pte_file(pte))
+		pte = pte_file_mksoft_dirty(pte);
+#endif
+	return pte;
+}
+
 static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd,
 		unsigned long old_addr, unsigned long old_end,
 		struct vm_area_struct *new_vma, pmd_t *new_pmd,
@@ -126,7 +144,8 @@ static void move_ptes(struct vm_area_str
 			continue;
 		pte = ptep_get_and_clear(mm, old_addr, old_pte);
 		pte = move_pte(pte, new_vma->vm_page_prot, old_addr, new_addr);
-		set_pte_at(mm, new_addr, new_pte, pte_mksoft_dirty(pte));
+		pte = move_soft_dirty_pte(pte);
+		set_pte_at(mm, new_addr, new_pte, pte);
 	}
 
 	arch_leave_lazy_mmu_mode();

  reply	other threads:[~2013-08-27  8:37 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-07  5:51 unused swap offset / bad page map Dave Jones
2013-08-07  5:51 ` Dave Jones
2013-08-07 10:04 ` Hillf Danton
2013-08-07 10:04   ` Hillf Danton
2013-08-07 15:30   ` Dave Jones
2013-08-07 15:30     ` Dave Jones
2013-08-08 15:20     ` Hillf Danton
2013-08-08 15:20       ` Hillf Danton
2013-08-08 15:36       ` Dave Jones
2013-08-08 15:36         ` Dave Jones
2013-08-19 23:18       ` Dave Jones
2013-08-19 23:18         ` Dave Jones
2013-08-20  4:39         ` Hillf Danton
2013-08-20  4:39           ` Hillf Danton
2013-08-21 20:49           ` Dave Jones
2013-08-21 20:49             ` Dave Jones
2013-08-22  0:35             ` Hillf Danton
2013-08-22  0:35               ` Hillf Danton
2013-08-22  3:21             ` Hillf Danton
2013-08-22  3:21               ` Hillf Danton
2013-08-23  3:21               ` Dave Jones
2013-08-23  3:21                 ` Dave Jones
2013-08-23  3:27                 ` Hillf Danton
2013-08-23  3:53                   ` Dave Jones
2013-08-23  3:53                     ` Dave Jones
2013-08-26  3:45                     ` Hillf Danton
2013-08-26  3:45                       ` Hillf Danton
2013-08-26 19:08                       ` Dave Jones
2013-08-26 19:08                         ` Dave Jones
2013-08-26 20:15                         ` Linus Torvalds
2013-08-26 20:15                           ` Linus Torvalds
2013-08-26 20:46                           ` Linus Torvalds
2013-08-26 20:46                             ` Linus Torvalds
2013-08-26 22:08                             ` Hugh Dickins
2013-08-26 22:08                               ` Hugh Dickins
2013-08-26 22:28                               ` Dave Jones
2013-08-26 22:28                                 ` Dave Jones
2013-08-27  8:37                                 ` Cyrill Gorcunov [this message]
2013-08-27  8:37                                   ` Cyrill Gorcunov
2013-08-27 16:24                                   ` Dave Jones
2013-08-27 16:24                                     ` Dave Jones
2013-08-27 16:32                                     ` Cyrill Gorcunov
2013-08-27 16:32                                       ` Cyrill Gorcunov
2013-08-26 23:15                               ` Linus Torvalds
2013-08-26 23:15                                 ` Linus Torvalds
2013-08-27  5:44                                 ` Cyrill Gorcunov
2013-08-27  5:44                                   ` Cyrill Gorcunov
2013-08-26 20:18                         ` Cyrill Gorcunov
2013-08-26 20:18                           ` Cyrill Gorcunov
2013-08-26 20:37                           ` Dave Jones
2013-08-26 20:37                             ` Dave Jones
2013-08-26 20:42                             ` Cyrill Gorcunov
2013-08-26 20:42                               ` Cyrill Gorcunov
2013-08-26 21:37                               ` Cyrill Gorcunov
2013-08-26 21:37                                 ` Cyrill Gorcunov
2013-08-26 21:42                                 ` Dave Jones
2013-08-26 21:42                                   ` Dave Jones
2013-08-26 21:49                                   ` Cyrill Gorcunov
2013-08-26 21:49                                     ` Cyrill Gorcunov
2013-08-26 21:59                                     ` Dave Jones
2013-08-26 21:59                                       ` Dave Jones
2013-08-07 15:54   ` Dave Jones
2013-08-07 15:54     ` Dave Jones
  -- strict thread matches above, loose matches on Subject: below --
2013-08-23  9:08 Hillf Danton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130827083718.GC7416@moon \
    --to=gorcunov@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=davej@redhat.com \
    --cc=dhillf@gmail.com \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=torvalds@linux-foundation.org \
    --cc=xemul@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.