linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* unused swap offset / bad page map.
@ 2013-08-07  5:51 Dave Jones
  2013-08-07 10:04 ` Hillf Danton
  0 siblings, 1 reply; 33+ messages in thread
From: Dave Jones @ 2013-08-07  5:51 UTC (permalink / raw)
  To: linux-mm; +Cc: Linux Kernel

Seen while fuzzing with lots of child processes.

swap_free: Unused swap offset entry 001263f5
BUG: Bad page map in process trinity-child29  pte:24c7ea00 pmd:09fec067
addr:00007f9db958d000 vm_flags:00100073 anon_vma:ffff88022c004ba0 mapping:          (null) index:f99
Modules linked in: fuse ipt_ULOG snd_seq_dummy tun sctp scsi_transport_iscsi can_raw can_bcm rfcomm bnep nfnetlink hidp appletalk bluetooth rose can af_802154 phonet x25 af_rxrpc llc2 nfc rfkill af_key pppoe rds pppox ppp_generic slhc caif_socket caif irda crc_ccitt atm netrom ax25 ipx p8023 psnap p8022 llc snd_hda_codec_realtek pcspkr usb_debug snd_seq snd_seq_device snd_hda_intel snd_hda_codec snd_hwdep e1000e snd_pcm ptp pps_core snd_page_alloc snd_timer snd soundcore xfs libcrc32c
CPU: 1 PID: 2624 Comm: trinity-child29 Not tainted 3.11.0-rc4+ #1
 0000000000000000 ffff8801fd7ddc90 ffffffff81700f2c 00007f9db958d000
 ffff8801fd7ddcd8 ffffffff8117cba7 0000000024c7ea00 0000000000000f99
 00007f9db9600000 ffff880009fecc68 0000000024c7ea00 ffff8801fd7dde00
Call Trace:
 [<ffffffff81700f2c>] dump_stack+0x4e/0x82
 [<ffffffff8117cba7>] print_bad_pte+0x187/0x220
 [<ffffffff8117e415>] unmap_single_vma+0x535/0x890
 [<ffffffff8117f719>] unmap_vmas+0x49/0x90
 [<ffffffff81187ef1>] exit_mmap+0xc1/0x170
 [<ffffffff810510ef>] mmput+0x6f/0x100
 [<ffffffff81055818>] do_exit+0x288/0xcd0
 [<ffffffff810c1da5>] ? trace_hardirqs_on_caller+0x115/0x1e0
 [<ffffffff810c1e7d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff810575dc>] do_group_exit+0x4c/0xc0
 [<ffffffff81057664>] SyS_exit_group+0x14/0x20
 [<ffffffff81713dd4>] tracesys+0xdd/0xe2

There were a slew of these. same trace, different addr/anon_vma/index.
mapping always null.

	Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: unused swap offset / bad page map.
  2013-08-07  5:51 unused swap offset / bad page map Dave Jones
@ 2013-08-07 10:04 ` Hillf Danton
  2013-08-07 15:30   ` Dave Jones
  2013-08-07 15:54   ` Dave Jones
  0 siblings, 2 replies; 33+ messages in thread
From: Hillf Danton @ 2013-08-07 10:04 UTC (permalink / raw)
  To: Dave Jones, linux-mm, Linux Kernel

Hello Dave

On Wed, Aug 7, 2013 at 1:51 PM, Dave Jones <davej@redhat.com> wrote:
> Seen while fuzzing with lots of child processes.
>
> swap_free: Unused swap offset entry 001263f5
> BUG: Bad page map in process trinity-child29  pte:24c7ea00 pmd:09fec067
> addr:00007f9db958d000 vm_flags:00100073 anon_vma:ffff88022c004ba0 mapping:          (null) index:f99
> Modules linked in: fuse ipt_ULOG snd_seq_dummy tun sctp scsi_transport_iscsi can_raw can_bcm rfcomm bnep nfnetlink hidp appletalk bluetooth rose can af_802154 phonet x25 af_rxrpc llc2 nfc rfkill af_key pppoe rds pppox ppp_generic slhc caif_socket caif irda crc_ccitt atm netrom ax25 ipx p8023 psnap p8022 llc snd_hda_codec_realtek pcspkr usb_debug snd_seq snd_seq_device snd_hda_intel snd_hda_codec snd_hwdep e1000e snd_pcm ptp pps_core snd_page_alloc snd_timer snd soundcore xfs libcrc32c
> CPU: 1 PID: 2624 Comm: trinity-child29 Not tainted 3.11.0-rc4+ #1
>  0000000000000000 ffff8801fd7ddc90 ffffffff81700f2c 00007f9db958d000
>  ffff8801fd7ddcd8 ffffffff8117cba7 0000000024c7ea00 0000000000000f99
>  00007f9db9600000 ffff880009fecc68 0000000024c7ea00 ffff8801fd7dde00
> Call Trace:
>  [<ffffffff81700f2c>] dump_stack+0x4e/0x82
>  [<ffffffff8117cba7>] print_bad_pte+0x187/0x220
>  [<ffffffff8117e415>] unmap_single_vma+0x535/0x890
>  [<ffffffff8117f719>] unmap_vmas+0x49/0x90
>  [<ffffffff81187ef1>] exit_mmap+0xc1/0x170
>  [<ffffffff810510ef>] mmput+0x6f/0x100
>  [<ffffffff81055818>] do_exit+0x288/0xcd0
>  [<ffffffff810c1da5>] ? trace_hardirqs_on_caller+0x115/0x1e0
>  [<ffffffff810c1e7d>] ? trace_hardirqs_on+0xd/0x10
>  [<ffffffff810575dc>] do_group_exit+0x4c/0xc0
>  [<ffffffff81057664>] SyS_exit_group+0x14/0x20
>  [<ffffffff81713dd4>] tracesys+0xdd/0xe2
>
> There were a slew of these. same trace, different addr/anon_vma/index.
> mapping always null.
>
Would you please run again with the debug info added?
---
--- a/mm/swapfile.c	Wed Aug  7 17:27:22 2013
+++ b/mm/swapfile.c	Wed Aug  7 17:57:20 2013
@@ -509,6 +509,7 @@ static struct swap_info_struct *swap_inf
 {
 	struct swap_info_struct *p;
 	unsigned long offset, type;
+	int race = 0;

 	if (!entry.val)
 		goto out;
@@ -524,10 +525,17 @@ static struct swap_info_struct *swap_inf
 	if (!p->swap_map[offset])
 		goto bad_free;
 	spin_lock(&p->lock);
+	if (!p->swap_map[offset]) {
+		race = 1;
+		spin_unlock(&p->lock);
+		goto bad_free;
+	}
 	return p;

 bad_free:
 	printk(KERN_ERR "swap_free: %s%08lx\n", Unused_offset, entry.val);
+	if (race)
+		printk(KERN_ERR "but due to race\n");
 	goto out;
 bad_offset:
 	printk(KERN_ERR "swap_free: %s%08lx\n", Bad_offset, entry.val);
--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: unused swap offset / bad page map.
  2013-08-07 10:04 ` Hillf Danton
@ 2013-08-07 15:30   ` Dave Jones
  2013-08-08 15:20     ` Hillf Danton
  2013-08-07 15:54   ` Dave Jones
  1 sibling, 1 reply; 33+ messages in thread
From: Dave Jones @ 2013-08-07 15:30 UTC (permalink / raw)
  To: Hillf Danton; +Cc: linux-mm, Linux Kernel

On Wed, Aug 07, 2013 at 06:04:20PM +0800, Hillf Danton wrote:
 > > There were a slew of these. same trace, different addr/anon_vma/index.
 > > mapping always null.
 > >
 > Would you please run again with the debug info added?
 > ---
 > --- a/mm/swapfile.c	Wed Aug  7 17:27:22 2013
 > +++ b/mm/swapfile.c	Wed Aug  7 17:57:20 2013
 > @@ -509,6 +509,7 @@ static struct swap_info_struct *swap_inf
 >  {
 >  	struct swap_info_struct *p;
 >  	unsigned long offset, type;
 > +	int race = 0;
 > 
 >  	if (!entry.val)
 >  		goto out;
 > @@ -524,10 +525,17 @@ static struct swap_info_struct *swap_inf
 >  	if (!p->swap_map[offset])
 >  		goto bad_free;
 >  	spin_lock(&p->lock);
 > +	if (!p->swap_map[offset]) {
 > +		race = 1;
 > +		spin_unlock(&p->lock);
 > +		goto bad_free;
 > +	}
 >  	return p;
 > 
 >  bad_free:
 >  	printk(KERN_ERR "swap_free: %s%08lx\n", Unused_offset, entry.val);
 > +	if (race)
 > +		printk(KERN_ERR "but due to race\n");
 >  	goto out;
 >  bad_offset:
 >  	printk(KERN_ERR "swap_free: %s%08lx\n", Bad_offset, entry.val);
 > --

printk didn't trigger.
This time around the oom killer was going off the same time.
I'm wondering if we have some allocations somewhere in the swap code that
don't handle failure correctly.

	Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: unused swap offset / bad page map.
  2013-08-07 10:04 ` Hillf Danton
  2013-08-07 15:30   ` Dave Jones
@ 2013-08-07 15:54   ` Dave Jones
  1 sibling, 0 replies; 33+ messages in thread
From: Dave Jones @ 2013-08-07 15:54 UTC (permalink / raw)
  To: Hillf Danton; +Cc: linux-mm, Linux Kernel


void __lru_cache_add(struct page *page)
{
        struct pagevec *pvec = &get_cpu_var(lru_add_pvec);

        page_cache_get(page);
        if (!pagevec_space(pvec))
                __pagevec_lru_add(pvec);
        pagevec_add(pvec, page);
        put_cpu_var(lru_add_pvec);
}

I added a printk, and found that pagevec_add frequently returns 0. Is that ok ?

What happens to 'page' in this case ?

	Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: unused swap offset / bad page map.
  2013-08-07 15:30   ` Dave Jones
@ 2013-08-08 15:20     ` Hillf Danton
  2013-08-08 15:36       ` Dave Jones
  2013-08-19 23:18       ` Dave Jones
  0 siblings, 2 replies; 33+ messages in thread
From: Hillf Danton @ 2013-08-08 15:20 UTC (permalink / raw)
  To: Dave Jones, Hillf Danton, linux-mm, Linux Kernel

On Wed, Aug 7, 2013 at 11:30 PM, Dave Jones <davej@redhat.com> wrote:
> printk didn't trigger.
>
Is a corrupted page table entry encountered, according to the
comment of swap_duplicate()?


--- a/mm/swapfile.c	Wed Aug  7 17:27:22 2013
+++ b/mm/swapfile.c	Thu Aug  8 23:12:30 2013
@@ -770,6 +770,7 @@ int free_swap_and_cache(swp_entry_t entr
 		unlock_page(page);
 		page_cache_release(page);
 	}
+	return 1;
 	return p != NULL;
 }

--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: unused swap offset / bad page map.
  2013-08-08 15:20     ` Hillf Danton
@ 2013-08-08 15:36       ` Dave Jones
  2013-08-19 23:18       ` Dave Jones
  1 sibling, 0 replies; 33+ messages in thread
From: Dave Jones @ 2013-08-08 15:36 UTC (permalink / raw)
  To: Hillf Danton; +Cc: linux-mm, Linux Kernel

On Thu, Aug 08, 2013 at 11:20:28PM +0800, Hillf Danton wrote:
 > On Wed, Aug 7, 2013 at 11:30 PM, Dave Jones <davej@redhat.com> wrote:
 > > printk didn't trigger.
 > >
 > Is a corrupted page table entry encountered, according to the
 > comment of swap_duplicate()?
 > 
 > 
 > --- a/mm/swapfile.c	Wed Aug  7 17:27:22 2013
 > +++ b/mm/swapfile.c	Thu Aug  8 23:12:30 2013
 > @@ -770,6 +770,7 @@ int free_swap_and_cache(swp_entry_t entr
 >  		unlock_page(page);
 >  		page_cache_release(page);
 >  	}
 > +	return 1;
 >  	return p != NULL;
 >  }

Travelling for a week, I'll check it out when I get back.

	Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: unused swap offset / bad page map.
  2013-08-08 15:20     ` Hillf Danton
  2013-08-08 15:36       ` Dave Jones
@ 2013-08-19 23:18       ` Dave Jones
  2013-08-20  4:39         ` Hillf Danton
  1 sibling, 1 reply; 33+ messages in thread
From: Dave Jones @ 2013-08-19 23:18 UTC (permalink / raw)
  To: Hillf Danton; +Cc: linux-mm, Linux Kernel

On Thu, Aug 08, 2013 at 11:20:28PM +0800, Hillf Danton wrote:
 > On Wed, Aug 7, 2013 at 11:30 PM, Dave Jones <davej@redhat.com> wrote:
 > > printk didn't trigger.
 > >
 > Is a corrupted page table entry encountered, according to the
 > comment of swap_duplicate()?
 > 
 > 
 > --- a/mm/swapfile.c	Wed Aug  7 17:27:22 2013
 > +++ b/mm/swapfile.c	Thu Aug  8 23:12:30 2013
 > @@ -770,6 +770,7 @@ int free_swap_and_cache(swp_entry_t entr
 >  		unlock_page(page);
 >  		page_cache_release(page);
 >  	}
 > +	return 1;
 >  	return p != NULL;
 >  }
 > 
 > --

[sorry for delay, been travelling]

With this applied, I no longer see the 'bad page' warning, but 
I do still get a bunch of messages like..

[  340.342436] swap_free: Unused swap offset entry 00003bb4
[  340.952980] swap_free: Unused swap offset entry 0000298d
[  340.953016] swap_free: Unused swap offset entry 00002996
[  340.953048] swap_free: Unused swap offset entry 0000299d


btw, anyone have thoughts on a patch something like below ?
It's really annoying to debug stuff like this and have to walk
over to the machine and reboot it by hand after it wedges during swapoff.

	Dave

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 6cf2e60..bbb1192 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1587,6 +1587,10 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
 
+	/* If we have hit memory corruption, we could hang during swapoff, so don't even try. */
+	if (test_taint(TAINT_BAD_PAGE))
+		return -EINVAL;
+
 	BUG_ON(!current->mm);
 
 	pathname = getname(specialfile);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: unused swap offset / bad page map.
  2013-08-19 23:18       ` Dave Jones
@ 2013-08-20  4:39         ` Hillf Danton
  2013-08-21 20:49           ` Dave Jones
  0 siblings, 1 reply; 33+ messages in thread
From: Hillf Danton @ 2013-08-20  4:39 UTC (permalink / raw)
  To: Dave Jones, Hillf Danton, Linux-MM, Linux Kernel

On Tue, Aug 20, 2013 at 7:18 AM, Dave Jones <davej@redhat.com> wrote:
>
> btw, anyone have thoughts on a patch something like below ?

And another(sorry if message is reformatted by the mail agent,
and it took my an hour to get the agent back to the correct format but failed,
and thanks a lot for any howto send plain text message).

Hillf

--- a/mm/memory.c Wed Aug  7 16:29:34 2013
+++ b/mm/memory.c Tue Aug 20 11:13:06 2013
@@ -933,8 +933,10 @@ again:
  if (progress >= 32) {
  progress = 0;
  if (need_resched() ||
-    spin_needbreak(src_ptl) || spin_needbreak(dst_ptl))
+    spin_needbreak(src_ptl) || spin_needbreak(dst_ptl)) {
+     BUG_ON(entry.val);
  break;
+ }
  }
  if (pte_none(*src_pte)) {
  progress++;
--


> It's really annoying to debug stuff like this and have to walk
> over to the machine and reboot it by hand after it wedges during swapoff.
>
>         Dave
>
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 6cf2e60..bbb1192 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -1587,6 +1587,10 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
>         if (!capable(CAP_SYS_ADMIN))
>                 return -EPERM;
>
> +       /* If we have hit memory corruption, we could hang during swapoff, so don't even try. */
> +       if (test_taint(TAINT_BAD_PAGE))
> +               return -EINVAL;
> +
>         BUG_ON(!current->mm);
>
>         pathname = getname(specialfile);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: unused swap offset / bad page map.
  2013-08-20  4:39         ` Hillf Danton
@ 2013-08-21 20:49           ` Dave Jones
  2013-08-22  0:35             ` Hillf Danton
  2013-08-22  3:21             ` Hillf Danton
  0 siblings, 2 replies; 33+ messages in thread
From: Dave Jones @ 2013-08-21 20:49 UTC (permalink / raw)
  To: Hillf Danton; +Cc: Linux-MM, Linux Kernel

On Tue, Aug 20, 2013 at 12:39:05PM +0800, Hillf Danton wrote:
 > On Tue, Aug 20, 2013 at 7:18 AM, Dave Jones <davej@redhat.com> wrote:
 > 
 > --- a/mm/memory.c Wed Aug  7 16:29:34 2013
 > +++ b/mm/memory.c Tue Aug 20 11:13:06 2013
 > @@ -933,8 +933,10 @@ again:
 >   if (progress >= 32) {
 >   progress = 0;
 >   if (need_resched() ||
 > -    spin_needbreak(src_ptl) || spin_needbreak(dst_ptl))
 > +    spin_needbreak(src_ptl) || spin_needbreak(dst_ptl)) {
 > +     BUG_ON(entry.val);
 >   break;
 > + }
 >   }
 >   if (pte_none(*src_pte)) {
 >   progress++;

didn't hit the bug_on, but got a bunch of 

[  424.077993] swap_free: Unused swap offset entry 000187d5
[  439.377194] swap_free: Unused swap offset entry 000187e7
[  441.998411] swap_free: Unused swap offset entry 000187ee
[  446.956551] swap_free: Unused swap offset entry 0000245f

	Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: unused swap offset / bad page map.
  2013-08-21 20:49           ` Dave Jones
@ 2013-08-22  0:35             ` Hillf Danton
  2013-08-22  3:21             ` Hillf Danton
  1 sibling, 0 replies; 33+ messages in thread
From: Hillf Danton @ 2013-08-22  0:35 UTC (permalink / raw)
  To: Dave Jones, Hillf Danton, Linux-MM, Linux Kernel

On Thu, Aug 22, 2013 at 4:49 AM, Dave Jones <davej@redhat.com> wrote:
>
> didn't hit the bug_on, but got a bunch of
>
> [  424.077993] swap_free: Unused swap offset entry 000187d5
> [  439.377194] swap_free: Unused swap offset entry 000187e7
> [  441.998411] swap_free: Unused swap offset entry 000187ee
> [  446.956551] swap_free: Unused swap offset entry 0000245f
>
Related to the regression reported?

Regression: x86/mm: new _PTE_SWP_SOFT_DIRTY bit conflicts with existing use
https://lkml.org/lkml/2013/8/21/294

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: unused swap offset / bad page map.
  2013-08-21 20:49           ` Dave Jones
  2013-08-22  0:35             ` Hillf Danton
@ 2013-08-22  3:21             ` Hillf Danton
  2013-08-23  3:21               ` Dave Jones
  1 sibling, 1 reply; 33+ messages in thread
From: Hillf Danton @ 2013-08-22  3:21 UTC (permalink / raw)
  To: Dave Jones, Hillf Danton, Linux-MM, Linux Kernel

On Thu, Aug 22, 2013 at 4:49 AM, Dave Jones <davej@redhat.com> wrote:
>
> didn't hit the bug_on, but got a bunch of
>
> [  424.077993] swap_free: Unused swap offset entry 000187d5
> [  439.377194] swap_free: Unused swap offset entry 000187e7
> [  441.998411] swap_free: Unused swap offset entry 000187ee
> [  446.956551] swap_free: Unused swap offset entry 0000245f
>
If page is reused, its swap entry is freed.

reuse_swap_page()
  delete_from_swap_cache()
    swapcache_free()
      count = swap_entry_free(p, entry, SWAP_HAS_CACHE);

If count drops to zero, then swap_free() gives warning.


--- a/mm/memory.c Wed Aug  7 16:29:34 2013
+++ b/mm/memory.c Thu Aug 22 10:44:32 2013
@@ -3123,6 +3123,7 @@ static int do_swap_page(struct mm_struct
  /* It's better to call commit-charge after rmap is established */
  mem_cgroup_commit_charge_swapin(page, ptr);

+ if (!exclusive)
  swap_free(entry);
  if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page))
  try_to_free_swap(page);
--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: unused swap offset / bad page map.
  2013-08-22  3:21             ` Hillf Danton
@ 2013-08-23  3:21               ` Dave Jones
  2013-08-23  3:27                 ` Hillf Danton
  0 siblings, 1 reply; 33+ messages in thread
From: Dave Jones @ 2013-08-23  3:21 UTC (permalink / raw)
  To: Hillf Danton; +Cc: Linux-MM, Linux Kernel

On Thu, Aug 22, 2013 at 11:21:28AM +0800, Hillf Danton wrote:
 > On Thu, Aug 22, 2013 at 4:49 AM, Dave Jones <davej@redhat.com> wrote:
 > >
 > > didn't hit the bug_on, but got a bunch of
 > >
 > > [  424.077993] swap_free: Unused swap offset entry 000187d5
 > > [  439.377194] swap_free: Unused swap offset entry 000187e7
 > > [  441.998411] swap_free: Unused swap offset entry 000187ee
 > > [  446.956551] swap_free: Unused swap offset entry 0000245f
 > >
 > If page is reused, its swap entry is freed.
 > 
 > reuse_swap_page()
 >   delete_from_swap_cache()
 >     swapcache_free()
 >       count = swap_entry_free(p, entry, SWAP_HAS_CACHE);
 > 
 > If count drops to zero, then swap_free() gives warning.
 > 
 > 
 > --- a/mm/memory.c Wed Aug  7 16:29:34 2013
 > +++ b/mm/memory.c Thu Aug 22 10:44:32 2013
 > @@ -3123,6 +3123,7 @@ static int do_swap_page(struct mm_struct
 >   /* It's better to call commit-charge after rmap is established */
 >   mem_cgroup_commit_charge_swapin(page, ptr);
 > 
 > + if (!exclusive)
 >   swap_free(entry);
 >   if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page))
 >   try_to_free_swap(page);
 > --

I still see the swap_free messages with this applied.

	Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: unused swap offset / bad page map.
  2013-08-23  3:21               ` Dave Jones
@ 2013-08-23  3:27                 ` Hillf Danton
  2013-08-23  3:53                   ` Dave Jones
  0 siblings, 1 reply; 33+ messages in thread
From: Hillf Danton @ 2013-08-23  3:27 UTC (permalink / raw)
  To: Dave Jones, Hillf Danton, Linux-MM, Linux Kernel

[-- Attachment #1: Type: text/plain, Size: 144 bytes --]

On Fri, Aug 23, 2013 at 11:21 AM, Dave Jones <davej@redhat.com> wrote:
>
> I still see the swap_free messages with this applied.
>
Decremented?

[-- Attachment #2: Type: text/html, Size: 238 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: unused swap offset / bad page map.
  2013-08-23  3:27                 ` Hillf Danton
@ 2013-08-23  3:53                   ` Dave Jones
  2013-08-26  3:45                     ` Hillf Danton
  0 siblings, 1 reply; 33+ messages in thread
From: Dave Jones @ 2013-08-23  3:53 UTC (permalink / raw)
  To: Hillf Danton; +Cc: Linux-MM, Linux Kernel

On Fri, Aug 23, 2013 at 11:27:29AM +0800, Hillf Danton wrote:
 > On Fri, Aug 23, 2013 at 11:21 AM, Dave Jones <davej@redhat.com> wrote:
 > >
 > > I still see the swap_free messages with this applied.
 > >
 > Decremented?

It actually seems worse, seems I can trigger it even easier now, as if
there's a leak.

	Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: unused swap offset / bad page map.
@ 2013-08-23  9:08 Hillf Danton
  0 siblings, 0 replies; 33+ messages in thread
From: Hillf Danton @ 2013-08-23  9:08 UTC (permalink / raw)
  To: Dave Jones, Hillf Danton, LKML, Linux-MM

[-- Attachment #1: Type: text/plain, Size: 782 bytes --]

On Fri, Aug 23, 2013 at 11:53 AM, Dave Jones <davej@redhat.com> wrote:
>
> On Fri, Aug 23, 2013 at 11:27:29AM +0800, Hillf Danton wrote:
>  > On Fri, Aug 23, 2013 at 11:21 AM, Dave Jones <davej@redhat.com> wrote:
>  > >
>  > > I still see the swap_free messages with this applied.
>  > >
>  > Decremented?
>
> It actually seems worse, seems I can trigger it even easier now, as if
> there's a leak.
>
If leak, add missing swap_free() for another case of reused page.


--- a/mm/memory.c Wed Aug  7 16:29:34 2013
+++ b/mm/memory.c Fri Aug 23 16:46:06 2013
@@ -2655,6 +2655,7 @@ static int do_wp_page(struct mm_struct *
  */
  page_move_anon_rmap(old_page, vma, address);
  unlock_page(old_page);
+ swap_free(pte_to_swp_entry(orig_pte));
  goto reuse;
  }
  unlock_page(old_page);
--

[-- Attachment #2: Type: text/html, Size: 1588 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: unused swap offset / bad page map.
  2013-08-23  3:53                   ` Dave Jones
@ 2013-08-26  3:45                     ` Hillf Danton
  2013-08-26 19:08                       ` Dave Jones
  0 siblings, 1 reply; 33+ messages in thread
From: Hillf Danton @ 2013-08-26  3:45 UTC (permalink / raw)
  To: Dave Jones, Hillf Danton, Linux-MM, Linux Kernel

On Fri, Aug 23, 2013 at 11:53 AM, Dave Jones <davej@redhat.com> wrote:
>
> It actually seems worse, seems I can trigger it even easier now, as if
> there's a leak.
>
Can you please try the new fix for TLB flush?

commit  2b047252d087be7f2ba
Fix TLB gather virtual address range invalidation corner cases

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: unused swap offset / bad page map.
  2013-08-26  3:45                     ` Hillf Danton
@ 2013-08-26 19:08                       ` Dave Jones
  2013-08-26 20:15                         ` Linus Torvalds
  2013-08-26 20:18                         ` Cyrill Gorcunov
  0 siblings, 2 replies; 33+ messages in thread
From: Dave Jones @ 2013-08-26 19:08 UTC (permalink / raw)
  To: Hillf Danton; +Cc: Linux-MM, Linux Kernel, Linus Torvalds

On Mon, Aug 26, 2013 at 11:45:53AM +0800, Hillf Danton wrote:
 > On Fri, Aug 23, 2013 at 11:53 AM, Dave Jones <davej@redhat.com> wrote:
 > >
 > > It actually seems worse, seems I can trigger it even easier now, as if
 > > there's a leak.
 > >
 > Can you please try the new fix for TLB flush?
 > 
 > commit  2b047252d087be7f2ba
 > Fix TLB gather virtual address range invalidation corner cases

No luck.

[ 4588.541886] swap_free: Unused swap offset entry 00002d15
[ 4588.541952] BUG: Bad page map in process trinity-kid12  pte:005a2a80 pmd:22c01f067
[ 4588.541979] addr:00007f0e95fa8000 vm_flags:00100073 anon_vma:ffff880217665550 mapping:          (null) index:1a42
[ 4588.542011] Modules linked in: snd_seq_dummy fuse hidp bnep scsi_transport_iscsi rfcomm ipt_ULOG can_bcm can_raw nfnetlink nfc caif_socket caif af_802154 phonet af_rxrpc bluetooth rfkill can llc2 pppoe pppox ppp_generic slhc irda crc_ccitt rds af_key rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 xfs libcrc32c snd_hda_codec_realtek snd_hda_intel e1000e snd_hda_codec snd_hwdep ptp snd_seq snd_seq_device snd_pcm usb_debug pps_core pcspkr snd_page_alloc snd_timer snd soundcore
[ 4588.542245] CPU: 2 PID: 25390 Comm: trinity-kid12 Not tainted 3.11.0-rc7+ #13 
[ 4588.542321]  0000000000000000 ffff88021ba33c98 ffffffff816f9ddf 00007f0e95fa8000
[ 4588.542354]  ffff88021ba33ce0 ffffffff81177047 00000000005a2a80 0000000000001a42
[ 4588.542386]  00007f0e96000000 ffff88022c01fd40 00000000005a2a80 ffff88021ba33e00
[ 4588.542418] Call Trace:
[ 4588.542435]  [<ffffffff816f9ddf>] dump_stack+0x54/0x74
[ 4588.542457]  [<ffffffff81177047>] print_bad_pte+0x187/0x220
[ 4588.542478]  [<ffffffff81178874>] unmap_single_vma+0x524/0x850
[ 4588.542500]  [<ffffffff81179ac9>] unmap_vmas+0x49/0x90
[ 4588.542521]  [<ffffffff811822c5>] exit_mmap+0xc5/0x170
[ 4588.542542]  [<ffffffff8104ffb7>] mmput+0x77/0x100
[ 4588.542562]  [<ffffffff8105465d>] do_exit+0x28d/0xcd0
[ 4588.542583]  [<ffffffff810c0085>] ? trace_hardirqs_on_caller+0x115/0x1e0
[ 4588.542607]  [<ffffffff810c015d>] ? trace_hardirqs_on+0xd/0x10
[ 4588.542629]  [<ffffffff8105643c>] do_group_exit+0x4c/0xc0
[ 4588.543534]  [<ffffffff810564c4>] SyS_exit_group+0x14/0x20
[ 4588.544438]  [<ffffffff8170d554>] tracesys+0xdd/0xe2

I can reproduce this pretty quickly by driving the system into swapping using
a few instances of 'trinity -C64' (this creates 64 threads) 

I'm not sure how far back this bug goes, so I'll try some older kernels
and see if I can bisect it, because we don't seem to be getting closer
to figuring out what's actually happening..

	Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: unused swap offset / bad page map.
  2013-08-26 19:08                       ` Dave Jones
@ 2013-08-26 20:15                         ` Linus Torvalds
  2013-08-26 20:46                           ` Linus Torvalds
  2013-08-26 20:18                         ` Cyrill Gorcunov
  1 sibling, 1 reply; 33+ messages in thread
From: Linus Torvalds @ 2013-08-26 20:15 UTC (permalink / raw)
  To: Dave Jones, Hillf Danton, Linux-MM, Linux Kernel, Hugh Dickins

On Mon, Aug 26, 2013 at 12:08 PM, Dave Jones <davej@redhat.com> wrote:
>
> [ 4588.541886] swap_free: Unused swap offset entry 00002d15
> [ 4588.541952] BUG: Bad page map in process trinity-kid12  pte:005a2a80 pmd:22c01f067
>
> I can reproduce this pretty quickly by driving the system into swapping using
> a few instances of 'trinity -C64' (this creates 64 threads)
>
> I'm not sure how far back this bug goes, so I'll try some older kernels
> and see if I can bisect it, because we don't seem to be getting closer
> to figuring out what's actually happening..

Bisecting would indeed be good. But I get the feeling that you'll need
to go back a *long* time, because the swap_map[] code hasn't changed
in ages.

I'm adding Hugh Dickins to the cc just in case he hasn't seen this on
linux-mm, because the swap_map[] code is complex as hell, and Hugh did
touch some of it last. The whole swap_map[] thing is complicated by:

 - it's a single byte per swap entry
 - it's not even a *structured* byte, but a single counter that has
several "fields" by hand
 - it has a count in the low 6 bits, with a magic "bad" value (which
is also a magic "continuation" value if one of the high bits are set)
 - it has two magic bits: HAS_CACHE and CONTINUED
 - it has a _third_ magic value (SWAP_MAP_SHMEM) which is "CONTINUED+BAD"
 - we increment this nasty pseudo-counter wildly hackily, and and have
magic special case checks for the odd cases

and if we get any of the special cases wrong, we'll
increment/decrement it wrong, and we're screwed.

The *locking* looks pretty simple, though. It's a simple spinlock. We
do some optimistic tests outside the spinlock, but the actual
allocation and modification seem to all be inside the lock and
re-check any optimistic values afaik.

So I'm almost likely to think that we are more likely to have
something wrong in the messy magical special cases. I'm wondering if
we should get rid of the continuation crap, for example, and expand
the "one byte per swap page" to two bytes instead.

Hugh, I think you know this code best, because you added the last
special case (that SWAP_MAP_SHMEM value). Comments?

                  Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: unused swap offset / bad page map.
  2013-08-26 19:08                       ` Dave Jones
  2013-08-26 20:15                         ` Linus Torvalds
@ 2013-08-26 20:18                         ` Cyrill Gorcunov
  2013-08-26 20:37                           ` Dave Jones
  1 sibling, 1 reply; 33+ messages in thread
From: Cyrill Gorcunov @ 2013-08-26 20:18 UTC (permalink / raw)
  To: Dave Jones, Hillf Danton, Linux-MM, Linux Kernel, Linus Torvalds

On Mon, Aug 26, 2013 at 03:08:22PM -0400, Dave Jones wrote:
> On Mon, Aug 26, 2013 at 11:45:53AM +0800, Hillf Danton wrote:
>  > On Fri, Aug 23, 2013 at 11:53 AM, Dave Jones <davej@redhat.com> wrote:
>  > >
>  > > It actually seems worse, seems I can trigger it even easier now, as if
>  > > there's a leak.
>  > >
>  > Can you please try the new fix for TLB flush?
>  > 
>  > commit  2b047252d087be7f2ba
>  > Fix TLB gather virtual address range invalidation corner cases
> 
> No luck.

Hi Dave, could you please put your .config somewhere so i would try
to repeat this problem? (i've tried trinity with -C64 but it didn't
trigger the issue).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: unused swap offset / bad page map.
  2013-08-26 20:18                         ` Cyrill Gorcunov
@ 2013-08-26 20:37                           ` Dave Jones
  2013-08-26 20:42                             ` Cyrill Gorcunov
  0 siblings, 1 reply; 33+ messages in thread
From: Dave Jones @ 2013-08-26 20:37 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Hillf Danton, Linux-MM, Linux Kernel, Linus Torvalds

On Tue, Aug 27, 2013 at 12:18:46AM +0400, Cyrill Gorcunov wrote:
 > On Mon, Aug 26, 2013 at 03:08:22PM -0400, Dave Jones wrote:
 > > On Mon, Aug 26, 2013 at 11:45:53AM +0800, Hillf Danton wrote:
 > >  > On Fri, Aug 23, 2013 at 11:53 AM, Dave Jones <davej@redhat.com> wrote:
 > >  > >
 > >  > > It actually seems worse, seems I can trigger it even easier now, as if
 > >  > > there's a leak.
 > >  > >
 > >  > Can you please try the new fix for TLB flush?
 > >  > 
 > >  > commit  2b047252d087be7f2ba
 > >  > Fix TLB gather virtual address range invalidation corner cases
 > > 
 > > No luck.
 > 
 > Hi Dave, could you please put your .config somewhere so i would try
 > to repeat this problem? (i've tried trinity with -C64 but it didn't
 > trigger the issue).

http://paste.fedoraproject.org/34944/77549285
machine I'm using has 8gb ram, 8gb swap, and 4 cores.

Try adding the -C64 to the invocation in scripts/test-multi.sh,
and perhaps up'ing the NR_PROCESSES variable there too.

	Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: unused swap offset / bad page map.
  2013-08-26 20:37                           ` Dave Jones
@ 2013-08-26 20:42                             ` Cyrill Gorcunov
  2013-08-26 21:37                               ` Cyrill Gorcunov
  0 siblings, 1 reply; 33+ messages in thread
From: Cyrill Gorcunov @ 2013-08-26 20:42 UTC (permalink / raw)
  To: Dave Jones, Hillf Danton, Linux-MM, Linux Kernel, Linus Torvalds

On Mon, Aug 26, 2013 at 04:37:02PM -0400, Dave Jones wrote:
> 
> Try adding the -C64 to the invocation in scripts/test-multi.sh,
> and perhaps up'ing the NR_PROCESSES variable there too.

Thanks! I'll ping you if I manage to crash my instance.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: unused swap offset / bad page map.
  2013-08-26 20:15                         ` Linus Torvalds
@ 2013-08-26 20:46                           ` Linus Torvalds
  2013-08-26 22:08                             ` Hugh Dickins
  0 siblings, 1 reply; 33+ messages in thread
From: Linus Torvalds @ 2013-08-26 20:46 UTC (permalink / raw)
  To: Dave Jones, Hillf Danton, Linux-MM, Linux Kernel, Hugh Dickins

On Mon, Aug 26, 2013 at 1:15 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So I'm almost likely to think that we are more likely to have
> something wrong in the messy magical special cases.

Of course, the good news would be if it actually ends up being the
soft-dirty stuff, and bisection hits something recent.

So maybe I'm overly pessimistic. That messy swap_map[] code really
_is_ messy, but at the same time it should also be pretty well-tested.
I don't think it's been touched in years.

That said, google does find "swap_free: Unused swap offset entry"
reports from over the years. Most of them seem to be single-bit
errors, though (ie when the entry is 00000100 or similar I'm more
inclined to blame a bit error - in contrast your values look like
"real" swap entries).

            Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: unused swap offset / bad page map.
  2013-08-26 20:42                             ` Cyrill Gorcunov
@ 2013-08-26 21:37                               ` Cyrill Gorcunov
  2013-08-26 21:42                                 ` Dave Jones
  0 siblings, 1 reply; 33+ messages in thread
From: Cyrill Gorcunov @ 2013-08-26 21:37 UTC (permalink / raw)
  To: Dave Jones; +Cc: Hillf Danton, Linux-MM, Linux Kernel, Linus Torvalds

On Tue, Aug 27, 2013 at 12:42:03AM +0400, Cyrill Gorcunov wrote:
> On Mon, Aug 26, 2013 at 04:37:02PM -0400, Dave Jones wrote:
> > 
> > Try adding the -C64 to the invocation in scripts/test-multi.sh,
> > and perhaps up'ing the NR_PROCESSES variable there too.
> 
> Thanks! I'll ping you if I manage to crash my instance.

So trinity tained kernel, but definitely not in place I'm interested.

[  320.904506] raw_sendmsg: trinity-child14 forgot to set AF_INET. Fix it!
[  329.570812] ------------[ cut here ]------------
[  329.571650] WARNING: CPU: 0 PID: 1982 at kernel/lockdep.c:3552 check_flags+0x18a/0x1c1()
[  329.571650] DEBUG_LOCKS_WARN_ON(current->softirqs_enabled)
[  329.571650] Modules linked in:
[  329.571650] CPU: 0 PID: 1982 Comm: trinity-child4 Not tainted 3.11.0-rc6-dirty #386
[  329.571650] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  329.571650]  0000000000000009 ffff88001ee03b10 ffffffff8157ac8a 0000000000000006
[  329.571650]  ffff88001ee03b60 ffff88001ee03b50 ffffffff81045bb2 ffffffff81583840
[  329.571650]  ffffffff81092620 ffff880002b48000 0000000000000046 ffffffff81a2f750
[  329.571650] Call Trace:
[  329.571650]  <IRQ>  [<ffffffff8157ac8a>] dump_stack+0x4f/0x84
[  329.571650]  [<ffffffff81045bb2>] warn_slowpath_common+0x81/0x9b
[  329.571650]  [<ffffffff81583840>] ? ftrace_call+0x5/0x2f
[  329.571650]  [<ffffffff81092620>] ? check_flags+0x18a/0x1c1
[  329.571650]  [<ffffffff81045c6f>] warn_slowpath_fmt+0x46/0x48
[  329.571650]  [<ffffffff81045c2e>] ? warn_slowpath_fmt+0x5/0x48
[  329.571650]  [<ffffffff81092620>] check_flags+0x18a/0x1c1
[  329.571650]  [<ffffffff81093595>] lock_is_held+0x30/0x5f
[  329.571650]  [<ffffffff810eb19e>] rcu_read_lock_held+0x36/0x38
[  329.571650]  [<ffffffff810f1b92>] perf_tp_event+0x92/0x220
[  329.571650]  [<ffffffff810f1d0e>] ? perf_tp_event+0x20e/0x220
[  329.571650]  [<ffffffff81049f6c>] ? __local_bh_enable+0x9a/0x9e
[  329.571650]  [<ffffffff810712f3>] ? get_parent_ip+0x3f/0x3f
[  329.571650]  [<ffffffff81049f6c>] ? __local_bh_enable+0x9a/0x9e
[  329.571650]  [<ffffffff810e3af1>] perf_ftrace_function_call+0xce/0xdc

	...

(since my config pretty similar to yours I tried to run trinity without
 kernel recompilation. At first i loaded swap space with crap data

[root@ovz trinity]# free 
             total       used       free     shared    buffers     cached
Mem:        493228     480188      13040          0       2912      12112
-/+ buffers/cache:     465164      28064
Swap:      2063356    1741304     322052

then run it as

[root@ovz trinity]# ./trinity -C64 --dangerous)

I'll continue tomorrow with your config and test-multi.sh.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: unused swap offset / bad page map.
  2013-08-26 21:37                               ` Cyrill Gorcunov
@ 2013-08-26 21:42                                 ` Dave Jones
  2013-08-26 21:49                                   ` Cyrill Gorcunov
  0 siblings, 1 reply; 33+ messages in thread
From: Dave Jones @ 2013-08-26 21:42 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Hillf Danton, Linux-MM, Linux Kernel, Linus Torvalds

On Tue, Aug 27, 2013 at 01:37:54AM +0400, Cyrill Gorcunov wrote:
 > On Tue, Aug 27, 2013 at 12:42:03AM +0400, Cyrill Gorcunov wrote:
 > > On Mon, Aug 26, 2013 at 04:37:02PM -0400, Dave Jones wrote:
 > > > 
 > > > Try adding the -C64 to the invocation in scripts/test-multi.sh,
 > > > and perhaps up'ing the NR_PROCESSES variable there too.
 > > 
 > > Thanks! I'll ping you if I manage to crash my instance.
 > 
 > So trinity tained kernel, but definitely not in place I'm interested.
 > 
 > [  320.904506] raw_sendmsg: trinity-child14 forgot to set AF_INET. Fix it!
 > [  329.570812] ------------[ cut here ]------------
 > [  329.571650] WARNING: CPU: 0 PID: 1982 at kernel/lockdep.c:3552 check_flags+0x18a/0x1c1()
 > [  329.571650] DEBUG_LOCKS_WARN_ON(current->softirqs_enabled)
 > [  329.571650] Modules linked in:
 > [  329.571650] CPU: 0 PID: 1982 Comm: trinity-child4 Not tainted 3.11.0-rc6-dirty #386
 > [  329.571650] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
 > [  329.571650]  0000000000000009 ffff88001ee03b10 ffffffff8157ac8a 0000000000000006
 > [  329.571650]  ffff88001ee03b60 ffff88001ee03b50 ffffffff81045bb2 ffffffff81583840
 > [  329.571650]  ffffffff81092620 ffff880002b48000 0000000000000046 ffffffff81a2f750
 > [  329.571650] Call Trace:
 > [  329.571650]  <IRQ>  [<ffffffff8157ac8a>] dump_stack+0x4f/0x84
 > [  329.571650]  [<ffffffff81045bb2>] warn_slowpath_common+0x81/0x9b
 > [  329.571650]  [<ffffffff81583840>] ? ftrace_call+0x5/0x2f
 > [  329.571650]  [<ffffffff81092620>] ? check_flags+0x18a/0x1c1
 > [  329.571650]  [<ffffffff81045c6f>] warn_slowpath_fmt+0x46/0x48
 > [  329.571650]  [<ffffffff81045c2e>] ? warn_slowpath_fmt+0x5/0x48
 > [  329.571650]  [<ffffffff81092620>] check_flags+0x18a/0x1c1
 > [  329.571650]  [<ffffffff81093595>] lock_is_held+0x30/0x5f
 > [  329.571650]  [<ffffffff810eb19e>] rcu_read_lock_held+0x36/0x38
 > [  329.571650]  [<ffffffff810f1b92>] perf_tp_event+0x92/0x220
 > [  329.571650]  [<ffffffff810f1d0e>] ? perf_tp_event+0x20e/0x220
 > [  329.571650]  [<ffffffff81049f6c>] ? __local_bh_enable+0x9a/0x9e
 > [  329.571650]  [<ffffffff810712f3>] ? get_parent_ip+0x3f/0x3f
 > [  329.571650]  [<ffffffff81049f6c>] ? __local_bh_enable+0x9a/0x9e
 > [  329.571650]  [<ffffffff810e3af1>] perf_ftrace_function_call+0xce/0xdc

when it rains, it pours.. 
 
 > (since my config pretty similar to yours I tried to run trinity without
 >  kernel recompilation. At first i loaded swap space with crap data
 > 
 > [root@ovz trinity]# free 
 >              total       used       free     shared    buffers     cached
 > Mem:        493228     480188      13040          0       2912      12112
 > -/+ buffers/cache:     465164      28064
 > Swap:      2063356    1741304     322052
 > 
 > then run it as
 > 
 > [root@ovz trinity]# ./trinity -C64 --dangerous)

Yeah, for reproducing this bug, I'd stick to running it as a user, without --dangerous.
you might still hit a few fairly-easy to trigger warn-on/printks. I run with
this applied: http://paste.fedoraproject.org/34960/55323613/raw/ to make things
a little less noisy.

	Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: unused swap offset / bad page map.
  2013-08-26 21:42                                 ` Dave Jones
@ 2013-08-26 21:49                                   ` Cyrill Gorcunov
  2013-08-26 21:59                                     ` Dave Jones
  0 siblings, 1 reply; 33+ messages in thread
From: Cyrill Gorcunov @ 2013-08-26 21:49 UTC (permalink / raw)
  To: Dave Jones; +Cc: Hillf Danton, Linux-MM, Linux Kernel, Linus Torvalds

On Mon, Aug 26, 2013 at 05:42:44PM -0400, Dave Jones wrote:
> 
> Yeah, for reproducing this bug, I'd stick to running it as a user, without --dangerous.
> you might still hit a few fairly-easy to trigger warn-on/printks. I run with
> this applied: http://paste.fedoraproject.org/34960/55323613/raw/ to make things
> a little less noisy.

Ah, thanks, pulling it in. Btw, have you seen this problem earlier than -rc4 at all?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: unused swap offset / bad page map.
  2013-08-26 21:49                                   ` Cyrill Gorcunov
@ 2013-08-26 21:59                                     ` Dave Jones
  0 siblings, 0 replies; 33+ messages in thread
From: Dave Jones @ 2013-08-26 21:59 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Hillf Danton, Linux-MM, Linux Kernel, Linus Torvalds

On Tue, Aug 27, 2013 at 01:49:40AM +0400, Cyrill Gorcunov wrote:
 > On Mon, Aug 26, 2013 at 05:42:44PM -0400, Dave Jones wrote:
 > > 
 > > Yeah, for reproducing this bug, I'd stick to running it as a user, without --dangerous.
 > > you might still hit a few fairly-easy to trigger warn-on/printks. I run with
 > > this applied: http://paste.fedoraproject.org/34960/55323613/raw/ to make things
 > > a little less noisy.
 > 
 > Ah, thanks, pulling it in. Btw, have you seen this problem earlier than -rc4 at all?

I just hit it on 3.11rc1. Couldn't reproduce on 3.10.

	Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: unused swap offset / bad page map.
  2013-08-26 20:46                           ` Linus Torvalds
@ 2013-08-26 22:08                             ` Hugh Dickins
  2013-08-26 22:28                               ` Dave Jones
  2013-08-26 23:15                               ` Linus Torvalds
  0 siblings, 2 replies; 33+ messages in thread
From: Hugh Dickins @ 2013-08-26 22:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Cyrill Gorcunov, Hillf Danton, Linux-MM, Linux Kernel

On Mon, 26 Aug 2013, Linus Torvalds wrote:
> On Mon, Aug 26, 2013 at 1:15 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > So I'm almost likely to think that we are more likely to have
> > something wrong in the messy magical special cases.
> 
> Of course, the good news would be if it actually ends up being the
> soft-dirty stuff, and bisection hits something recent.

I suspect so.

> 
> So maybe I'm overly pessimistic. That messy swap_map[] code really
> _is_ messy, but at the same time it should also be pretty well-tested.
> I don't think it's been touched in years.

Blame me for the byte-instead-of-short continuation stuff.
But it's never yet shown any problem (okay, perhaps that's
because it's so rare to need any continuation anyway).

> 
> That said, google does find "swap_free: Unused swap offset entry"
> reports from over the years. Most of them seem to be single-bit
> errors, though (ie when the entry is 00000100 or similar I'm more
> inclined to blame a bit error

Yes, historically they have usually represented either single-bit
errors, or corruption of page tables by other kernel data.  The
swap subsystem discovers it, but it's rarely an error of swap.

So I don't care for Dave's suggestion much earlier in this thread,
that swapoff should fail with -EINVAL if there has been a bad page
taint: that doesn't necessarily interfere with swapoff at all.

And besides, swapoff is killable: yes, if counts go wrong, it
can cycle around endlessly, but it checks for signal_pending()
each time around the loop.

> - in contrast your values look like "real" swap entries).

Indeed they do.

I just did a quick diff of 3.11-rc7/mm against 3.10, and here's
a line in mremap which worries me.  That set_pte_at() is operating
on anything that isn't pte_none(), so the pte_mksoft_dirty() looks
prone to corrupt a swap entry.

I've not tried matching up bits with Dave's reports, and just going
into a meeting now, but this patch looks worth a try: probably Cyrill
can improve it meanwhile to what he actually wants there (I'm
surprised anything special is needed for just moving a pte).

Hugh

--- 3.11-rc7/mm/mremap.c	2013-07-14 17:10:16.640003652 -0700
+++ linux/mm/mremap.c	2013-08-26 14:46:14.460027627 -0700
@@ -126,7 +126,7 @@ static void move_ptes(struct vm_area_str
 			continue;
 		pte = ptep_get_and_clear(mm, old_addr, old_pte);
 		pte = move_pte(pte, new_vma->vm_page_prot, old_addr, new_addr);
-		set_pte_at(mm, new_addr, new_pte, pte_mksoft_dirty(pte));
+		set_pte_at(mm, new_addr, new_pte, pte);
 	}
 
 	arch_leave_lazy_mmu_mode();

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: unused swap offset / bad page map.
  2013-08-26 22:08                             ` Hugh Dickins
@ 2013-08-26 22:28                               ` Dave Jones
  2013-08-27  8:37                                 ` Cyrill Gorcunov
  2013-08-26 23:15                               ` Linus Torvalds
  1 sibling, 1 reply; 33+ messages in thread
From: Dave Jones @ 2013-08-26 22:28 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Cyrill Gorcunov, Hillf Danton, Linux-MM,
	Linux Kernel

On Mon, Aug 26, 2013 at 03:08:45PM -0700, Hugh Dickins wrote:
 
 > > That said, google does find "swap_free: Unused swap offset entry"
 > > reports from over the years. Most of them seem to be single-bit
 > > errors, though (ie when the entry is 00000100 or similar I'm more
 > > inclined to blame a bit error
 > 
 > Yes, historically they have usually represented either single-bit
 > errors, or corruption of page tables by other kernel data.  The
 > swap subsystem discovers it, but it's rarely an error of swap.
 
Just to rule out bad hardware, I've seen this on two systems
(admittedly the exact same spec, but still..)

 > So I don't care for Dave's suggestion much earlier in this thread,
 > that swapoff should fail with -EINVAL if there has been a bad page
 > taint: that doesn't necessarily interfere with swapoff at all.
 > 
 > And besides, swapoff is killable: yes, if counts go wrong, it
 > can cycle around endlessly, but it checks for signal_pending()
 > each time around the loop.

It might be killable, but if I've done /sbin/reboot, and the
kernel dies in sys_swapoff because of the corruption, I won't
get a chance to kill it, because at that point the shutdown process
has killed my shell, sshd, and just about everything else.
It mieans a grumpy walk to the other side of the house to prod a
reset button.  So yeah, it might not be a mergable thing, but
at least while bisecting it's pretty much a must-have.

 > I just did a quick diff of 3.11-rc7/mm against 3.10, and here's
 > a line in mremap which worries me.  That set_pte_at() is operating
 > on anything that isn't pte_none(), so the pte_mksoft_dirty() looks
 > prone to corrupt a swap entry.
 > 
 > I've not tried matching up bits with Dave's reports, and just going
 > into a meeting now, but this patch looks worth a try: probably Cyrill
 > can improve it meanwhile to what he actually wants there (I'm
 > surprised anything special is needed for just moving a pte).
 > 
 > Hugh
 > 
 > --- 3.11-rc7/mm/mremap.c	2013-07-14 17:10:16.640003652 -0700
 > +++ linux/mm/mremap.c	2013-08-26 14:46:14.460027627 -0700
 > @@ -126,7 +126,7 @@ static void move_ptes(struct vm_area_str
 >  			continue;
 >  		pte = ptep_get_and_clear(mm, old_addr, old_pte);
 >  		pte = move_pte(pte, new_vma->vm_page_prot, old_addr, new_addr);
 > -		set_pte_at(mm, new_addr, new_pte, pte_mksoft_dirty(pte));
 > +		set_pte_at(mm, new_addr, new_pte, pte);
 >  	}

I'll give this a shot once I'm done with the bisect.

	Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: unused swap offset / bad page map.
  2013-08-26 22:08                             ` Hugh Dickins
  2013-08-26 22:28                               ` Dave Jones
@ 2013-08-26 23:15                               ` Linus Torvalds
  2013-08-27  5:44                                 ` Cyrill Gorcunov
  1 sibling, 1 reply; 33+ messages in thread
From: Linus Torvalds @ 2013-08-26 23:15 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Dave Jones, Cyrill Gorcunov, Hillf Danton, Linux-MM, Linux Kernel

On Mon, Aug 26, 2013 at 3:08 PM, Hugh Dickins <hughd@google.com> wrote:
>
> I just did a quick diff of 3.11-rc7/mm against 3.10, and here's
> a line in mremap which worries me.  That set_pte_at() is operating
> on anything that isn't pte_none(), so the pte_mksoft_dirty() looks
> prone to corrupt a swap entry.

Uhhuh. I think you hit the nail on the head here.

I checked all the pte_swp_*soft_dirty() users (they should be used on
swp entries), because that came up in another thread. But you're
right, the non-swp ones only work on present pte entries (or on
file-offset entries, I guess), and at least that mremap() case seems
bogus.

I'm not seeing the point of marking the thing soft-dirty at all,
although I guess it's "dirty" in the sense that it changed the
contents at that virtual address. But for that code to work, it would
have to have the same bit for swap entries as for present pages (and
for file mapping entries), and that's not true. They are two different
bits (_PAGE_SOFT_DIRTY is bit #11 vs _PAGE_SWP_SOFT_DIRTY is bit #7).

Ugh. Cyrill, this is a mess.

            Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: unused swap offset / bad page map.
  2013-08-26 23:15                               ` Linus Torvalds
@ 2013-08-27  5:44                                 ` Cyrill Gorcunov
  0 siblings, 0 replies; 33+ messages in thread
From: Cyrill Gorcunov @ 2013-08-27  5:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Hugh Dickins, Dave Jones, Hillf Danton, Linux-MM, Linux Kernel,
	Pavel Emelyanov

On Mon, Aug 26, 2013 at 04:15:00PM -0700, Linus Torvalds wrote:
> On Mon, Aug 26, 2013 at 3:08 PM, Hugh Dickins <hughd@google.com> wrote:
> >
> > I just did a quick diff of 3.11-rc7/mm against 3.10, and here's
> > a line in mremap which worries me.  That set_pte_at() is operating
> > on anything that isn't pte_none(), so the pte_mksoft_dirty() looks
> > prone to corrupt a swap entry.
> 
> Uhhuh. I think you hit the nail on the head here.
> 
> I checked all the pte_swp_*soft_dirty() users (they should be used on
> swp entries), because that came up in another thread. But you're
> right, the non-swp ones only work on present pte entries (or on
> file-offset entries, I guess), and at least that mremap() case seems
> bogus.

Oh my :( Indeed it sets _PAGE_SOFT_DIRTY unconditionally, sigh. This
nit comes from former soft-dirty commit. Let me check all other places
we set soft dirty bit (Pavel CC'ed).

> I'm not seeing the point of marking the thing soft-dirty at all,
> although I guess it's "dirty" in the sense that it changed the
> contents at that virtual address. But for that code to work, it would
> have to have the same bit for swap entries as for present pages (and
> for file mapping entries), and that's not true. They are two different
> bits (_PAGE_SOFT_DIRTY is bit #11 vs _PAGE_SWP_SOFT_DIRTY is bit #7).
> 
> Ugh. Cyrill, this is a mess.

Linus, I simply had no place in pte entry to carry soft-dirty status
when pte incoded in swap format, so it was unpleasant but necessary
decision. That's why bits access are wrapped in own macros with
'swp' prefix thus reader would easily grep for them.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: unused swap offset / bad page map.
  2013-08-26 22:28                               ` Dave Jones
@ 2013-08-27  8:37                                 ` Cyrill Gorcunov
  2013-08-27 16:24                                   ` Dave Jones
  0 siblings, 1 reply; 33+ messages in thread
From: Cyrill Gorcunov @ 2013-08-27  8:37 UTC (permalink / raw)
  To: Dave Jones
  Cc: Hugh Dickins, Linus Torvalds, Hillf Danton, Linux-MM,
	Linux Kernel, Andrew Morton, Pavel Emelyanov

On Mon, Aug 26, 2013 at 06:28:33PM -0400, Dave Jones wrote:
>  > 
>  > I've not tried matching up bits with Dave's reports, and just going
>  > into a meeting now, but this patch looks worth a try: probably Cyrill
>  > can improve it meanwhile to what he actually wants there (I'm
>  > surprised anything special is needed for just moving a pte).
>  > 
>  > Hugh
>  > 
>  > --- 3.11-rc7/mm/mremap.c	2013-07-14 17:10:16.640003652 -0700
>  > +++ linux/mm/mremap.c	2013-08-26 14:46:14.460027627 -0700
>  > @@ -126,7 +126,7 @@ static void move_ptes(struct vm_area_str
>  >  			continue;
>  >  		pte = ptep_get_and_clear(mm, old_addr, old_pte);
>  >  		pte = move_pte(pte, new_vma->vm_page_prot, old_addr, new_addr);
>  > -		set_pte_at(mm, new_addr, new_pte, pte_mksoft_dirty(pte));
>  > +		set_pte_at(mm, new_addr, new_pte, pte);
>  >  	}
> 
> I'll give this a shot once I'm done with the bisect.

I managed to trigger the issue as well. The patch below fixes it.
Dave, could you please give it a shot once time permit?

Pavel, I kept 'make it dirty on move' logic, but i'm somehow doubt
in it, won't plain pte copying (as in Hugh's patch) work of us?
---
From: Cyrill Gorcunov <gorcunov@gmail.com>
Subject: [PATCH] mm: move_ptes -- Set soft dirty bit depending on pte type

Dave reported corrupted swap entries

 | [ 4588.541886] swap_free: Unused swap offset entry 00002d15
 | [ 4588.541952] BUG: Bad page map in process trinity-kid12  pte:005a2a80 pmd:22c01f067

and Hugh pointed that in move_ptes _PAGE_SOFT_DIRTY bit
set regardless the type of entry pte consists of. The
trick here is that -- when we carry soft dirty status
in swap entries we are to use _PAGE_SWP_SOFT_DIRTY instead,
because this is the only place in pte which can be used
for own needs without intersecting with bits owned by
swap entry type/offset.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 mm/mremap.c |   21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

Index: linux-2.6.git/mm/mremap.c
===================================================================
--- linux-2.6.git.orig/mm/mremap.c
+++ linux-2.6.git/mm/mremap.c
@@ -15,6 +15,7 @@
 #include <linux/swap.h>
 #include <linux/capability.h>
 #include <linux/fs.h>
+#include <linux/swapops.h>
 #include <linux/highmem.h>
 #include <linux/security.h>
 #include <linux/syscalls.h>
@@ -69,6 +70,23 @@ static pmd_t *alloc_new_pmd(struct mm_st
 	return pmd;
 }
 
+static pte_t move_soft_dirty_pte(pte_t pte)
+{
+	/*
+	 * Set soft dirty bit so we can notice
+	 * in userspace the ptes were moved.
+	 */
+#ifdef CONFIG_MEM_SOFT_DIRTY
+	if (pte_present(pte))
+		pte = pte_mksoft_dirty(pte);
+	else if (is_swap_pte(pte))
+		pte = pte_swp_mksoft_dirty(pte);
+	else if (pte_file(pte))
+		pte = pte_file_mksoft_dirty(pte);
+#endif
+	return pte;
+}
+
 static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd,
 		unsigned long old_addr, unsigned long old_end,
 		struct vm_area_struct *new_vma, pmd_t *new_pmd,
@@ -126,7 +144,8 @@ static void move_ptes(struct vm_area_str
 			continue;
 		pte = ptep_get_and_clear(mm, old_addr, old_pte);
 		pte = move_pte(pte, new_vma->vm_page_prot, old_addr, new_addr);
-		set_pte_at(mm, new_addr, new_pte, pte_mksoft_dirty(pte));
+		pte = move_soft_dirty_pte(pte);
+		set_pte_at(mm, new_addr, new_pte, pte);
 	}
 
 	arch_leave_lazy_mmu_mode();

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: unused swap offset / bad page map.
  2013-08-27  8:37                                 ` Cyrill Gorcunov
@ 2013-08-27 16:24                                   ` Dave Jones
  2013-08-27 16:32                                     ` Cyrill Gorcunov
  0 siblings, 1 reply; 33+ messages in thread
From: Dave Jones @ 2013-08-27 16:24 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Hugh Dickins, Linus Torvalds, Hillf Danton, Linux-MM,
	Linux Kernel, Andrew Morton, Pavel Emelyanov

On Tue, Aug 27, 2013 at 12:37:18PM +0400, Cyrill Gorcunov wrote:
 > On Mon, Aug 26, 2013 at 06:28:33PM -0400, Dave Jones wrote:
 > >  > 
 > >  > I've not tried matching up bits with Dave's reports, and just going
 > >  > into a meeting now, but this patch looks worth a try: probably Cyrill
 > >  > can improve it meanwhile to what he actually wants there (I'm
 > >  > surprised anything special is needed for just moving a pte).
 > >  > 
 > >  > Hugh
 > >  > 
 > >  > --- 3.11-rc7/mm/mremap.c	2013-07-14 17:10:16.640003652 -0700
 > >  > +++ linux/mm/mremap.c	2013-08-26 14:46:14.460027627 -0700
 > >  > @@ -126,7 +126,7 @@ static void move_ptes(struct vm_area_str
 > >  >  			continue;
 > >  >  		pte = ptep_get_and_clear(mm, old_addr, old_pte);
 > >  >  		pte = move_pte(pte, new_vma->vm_page_prot, old_addr, new_addr);
 > >  > -		set_pte_at(mm, new_addr, new_pte, pte_mksoft_dirty(pte));
 > >  > +		set_pte_at(mm, new_addr, new_pte, pte);
 > >  >  	}
 > > 
 > > I'll give this a shot once I'm done with the bisect.
 > 
 > I managed to trigger the issue as well. The patch below fixes it.
 > Dave, could you please give it a shot once time permit?

Seems to do the trick.

Tested-by: Dave Jones <davej@fedoraproject.org>

	Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: unused swap offset / bad page map.
  2013-08-27 16:24                                   ` Dave Jones
@ 2013-08-27 16:32                                     ` Cyrill Gorcunov
  0 siblings, 0 replies; 33+ messages in thread
From: Cyrill Gorcunov @ 2013-08-27 16:32 UTC (permalink / raw)
  To: Dave Jones
  Cc: Hugh Dickins, Linus Torvalds, Hillf Danton, Linux-MM,
	Linux Kernel, Andrew Morton, Pavel Emelyanov

On Tue, Aug 27, 2013 at 12:24:27PM -0400, Dave Jones wrote:
>  > 
>  > I managed to trigger the issue as well. The patch below fixes it.
>  > Dave, could you please give it a shot once time permit?
> 
> Seems to do the trick.
> 
> Tested-by: Dave Jones <davej@fedoraproject.org>

Thanks a lot, Dave!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2013-08-27 16:32 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-07  5:51 unused swap offset / bad page map Dave Jones
2013-08-07 10:04 ` Hillf Danton
2013-08-07 15:30   ` Dave Jones
2013-08-08 15:20     ` Hillf Danton
2013-08-08 15:36       ` Dave Jones
2013-08-19 23:18       ` Dave Jones
2013-08-20  4:39         ` Hillf Danton
2013-08-21 20:49           ` Dave Jones
2013-08-22  0:35             ` Hillf Danton
2013-08-22  3:21             ` Hillf Danton
2013-08-23  3:21               ` Dave Jones
2013-08-23  3:27                 ` Hillf Danton
2013-08-23  3:53                   ` Dave Jones
2013-08-26  3:45                     ` Hillf Danton
2013-08-26 19:08                       ` Dave Jones
2013-08-26 20:15                         ` Linus Torvalds
2013-08-26 20:46                           ` Linus Torvalds
2013-08-26 22:08                             ` Hugh Dickins
2013-08-26 22:28                               ` Dave Jones
2013-08-27  8:37                                 ` Cyrill Gorcunov
2013-08-27 16:24                                   ` Dave Jones
2013-08-27 16:32                                     ` Cyrill Gorcunov
2013-08-26 23:15                               ` Linus Torvalds
2013-08-27  5:44                                 ` Cyrill Gorcunov
2013-08-26 20:18                         ` Cyrill Gorcunov
2013-08-26 20:37                           ` Dave Jones
2013-08-26 20:42                             ` Cyrill Gorcunov
2013-08-26 21:37                               ` Cyrill Gorcunov
2013-08-26 21:42                                 ` Dave Jones
2013-08-26 21:49                                   ` Cyrill Gorcunov
2013-08-26 21:59                                     ` Dave Jones
2013-08-07 15:54   ` Dave Jones
  -- strict thread matches above, loose matches on Subject: below --
2013-08-23  9:08 Hillf Danton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).