linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: 4.10-rc2 list_lru_isolate list corruption
       [not found] <20170106052056.jihy5denyxsnfuo5@codemonkey.org.uk>
@ 2017-01-06 16:59 ` Johannes Weiner
  2017-01-06 19:58   ` Dave Jones
  0 siblings, 1 reply; 7+ messages in thread
From: Johannes Weiner @ 2017-01-06 16:59 UTC (permalink / raw)
  To: Dave Jones; +Cc: Jan Kara, linux-mm

Dave, can you reproduce this by any chance with this patch applied?

diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 6f382e07de77..0783af1c0ebb 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -640,6 +640,8 @@ static inline void radix_tree_shrink(struct radix_tree_root *root,
 				update_node(node, private);
 		}
 
+		WARN_ON_ONCE(!list_empty(&node->private_list));
+
 		radix_tree_node_free(node);
 	}
 }
@@ -666,6 +668,8 @@ static void delete_node(struct radix_tree_root *root,
 			root->rnode = NULL;
 		}
 
+		WARN_ON_ONCE(!list_empty(&node->private_list));
+
 		radix_tree_node_free(node);
 
 		node = parent;
@@ -767,6 +771,7 @@ static void radix_tree_free_nodes(struct radix_tree_node *node)
 			struct radix_tree_node *old = child;
 			offset = child->offset + 1;
 			child = child->parent;
+			WARN_ON_ONCE(!list_empty(&node->private_list));
 			radix_tree_node_free(old);
 			if (old == entry_to_node(node))
 				return;

On Fri, Jan 06, 2017 at 12:20:56AM -0500, Dave Jones wrote:
> While fuzzing today, I triggered list corruption in the mm code twice.
> 
> Exhibit a:
> 
> WARNING: CPU: 1 PID: 53 at lib/list_debug.c:55 __list_del_entry_valid+0x5c/0xc0
> list_del corruption. next->prev should be ffff8804c31b8e60, but was ffffffff813d2dc0
> CPU: 1 PID: 53 Comm: kswapd0 Not tainted 4.10.0-rc2-think+ #2 
> Call Trace:
>  dump_stack+0x4f/0x73
>  __warn+0xcb/0xf0
>  warn_slowpath_fmt+0x5f/0x80
>  ? warn_slowpath_fmt+0x5/0x80
>  ? radix_tree_free_nodes+0xa0/0xa0
>  __list_del_entry_valid+0x5c/0xc0
>  list_lru_isolate+0x1a/0x40
>  shadow_lru_isolate+0x3e/0x220
>  __list_lru_walk_one.isra.4+0x9b/0x190
>  ? memcg_drain_all_list_lrus+0x1d0/0x1d0
>  list_lru_walk_one+0x23/0x30
>  scan_shadow_nodes+0x2e/0x40
>  shrink_slab.part.44+0x23d/0x5d0
>  ? 0xffffffffa0285077
>  shrink_node+0x22c/0x330
>  kswapd+0x392/0x8f0
>  kthread+0x10f/0x150
>  ? mem_cgroup_shrink_node+0x2e0/0x2e0
>  ? kthread_create_on_node+0x60/0x60
>  ret_from_fork+0x22/0x30
> 
> 
> Exhibit b:
> 
> 
> WARNING: CPU: 0 PID: 17728 at lib/list_debug.c:55 __list_del_entry_valid+0x5c/0xc0
> list_del corruption. next->prev should be ffff8804f8972030, but was ffffffff813d2dc0
> CPU: 0 PID: 17728 Comm: trinity-c28 Not tainted 4.10.0-rc2-think+ #2 
> Call Trace:
>  dump_stack+0x4f/0x73
>  __warn+0xcb/0xf0
>  warn_slowpath_fmt+0x5f/0x80
>  ? warn_slowpath_fmt+0x5/0x80
>  ? radix_tree_free_nodes+0xa0/0xa0
>  __list_del_entry_valid+0x5c/0xc0
>  list_lru_isolate+0x1a/0x40
>  shadow_lru_isolate+0x3e/0x220
>  __list_lru_walk_one.isra.4+0x9b/0x190
>  ? memcg_drain_all_list_lrus+0x1d0/0x1d0
>  list_lru_walk_one+0x23/0x30
>  scan_shadow_nodes+0x2e/0x40
>  shrink_slab.part.44+0x23d/0x5d0
>  ? 0xffffffffa0333077
>  shrink_node+0x22c/0x330
>  do_try_to_free_pages+0xf5/0x330
>  try_to_free_pages+0x132/0x310
>  __alloc_pages_slowpath+0x357/0xaa0
>  __alloc_pages_nodemask+0x3cc/0x460
>  __do_page_cache_readahead+0x165/0x370
>  ? __do_page_cache_readahead+0xed/0x370
>  ? __do_page_cache_readahead+0x5/0x370
>  ondemand_readahead+0x112/0x350
>  ? page_cache_sync_readahead+0x5/0x50
>  page_cache_sync_readahead+0x31/0x50
>  generic_file_read_iter+0x724/0x960
>  ? rw_copy_check_uvector+0x8e/0x190
>  ? generic_file_read_iter+0x5/0x960
>  do_iter_readv_writev+0xb8/0x120
>  do_readv_writev+0x1a4/0x250
>  ? do_readv_writev+0x5/0x250
>  ? vfs_readv+0x5/0x50
>  vfs_readv+0x3c/0x50
>  do_preadv+0xb5/0xd0
>  SyS_preadv+0x11/0x20
>  do_syscall_64+0x61/0x170
>  entry_SYSCALL64_slow_path+0x25/0x25
> RIP: 0033:0x7f5cb7c1e119
> RSP: 002b:00007ffc7e7d2758 EFLAGS: 00000246
> [CONT START]  ORIG_RAX: 0000000000000127
> RAX: ffffffffffffffda RBX: 0000000000000127 RCX: 00007f5cb7c1e119
> RDX: 0000000000000037 RSI: 00005561d7798a70 RDI: 000000000000000c
> RBP: 00007f5cb8228000 R08: 00000000a0000033 R09: 0000000000000030
> R10: 0000000000400000 R11: 0000000000000246 R12: 0000000000000002
> R13: 00007f5cb8228048 R14: 00007f5cb82f3ad8 R15: 00007f5cb8228000
> 
> 
> Interesting that the 'but was' value is the same on two seperate boots.
> 
> 
> It looks like mm/list_lru.c didn't change recently, but mm/workingset.c did,
> which calls into this..  Johannes ?
> 
> 	Dave
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: 4.10-rc2 list_lru_isolate list corruption
  2017-01-06 16:59 ` 4.10-rc2 list_lru_isolate list corruption Johannes Weiner
@ 2017-01-06 19:58   ` Dave Jones
  2017-01-07  1:19     ` Johannes Weiner
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Jones @ 2017-01-06 19:58 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: Jan Kara, linux-mm

On Fri, Jan 06, 2017 at 11:59:41AM -0500, Johannes Weiner wrote:
 > Dave, can you reproduce this by any chance with this patch applied?

yep.

 > diff --git a/lib/radix-tree.c b/lib/radix-tree.c
 > index 6f382e07de77..0783af1c0ebb 100644
 > --- a/lib/radix-tree.c
 > +++ b/lib/radix-tree.c
 > @@ -640,6 +640,8 @@ static inline void radix_tree_shrink(struct radix_tree_root *root,
 >  				update_node(node, private);
 >  		}
 >  
 > +		WARN_ON_ONCE(!list_empty(&node->private_list));
 > +
 >  		radix_tree_node_free(node);
 >  	}
 >  }

[ 8467.462878] WARNING: CPU: 2 PID: 53 at lib/radix-tree.c:643 delete_node+0x1e4/0x200
[ 8467.468770] CPU: 2 PID: 53 Comm: kswapd0 Not tainted 4.10.0-rc2-think+ #3 
[ 8467.480436] Call Trace:
[ 8467.486213]  dump_stack+0x4f/0x73
[ 8467.491999]  __warn+0xcb/0xf0
[ 8467.497769]  warn_slowpath_null+0x1d/0x20
[ 8467.503566]  delete_node+0x1e4/0x200
[ 8467.509468]  __radix_tree_delete_node+0xd/0x10
[ 8467.515425]  shadow_lru_isolate+0xe6/0x220
[ 8467.521337]  __list_lru_walk_one.isra.4+0x9b/0x190
[ 8467.527176]  ? memcg_drain_all_list_lrus+0x1d0/0x1d0
[ 8467.533066]  list_lru_walk_one+0x23/0x30
[ 8467.538953]  scan_shadow_nodes+0x2e/0x40
[ 8467.544840]  shrink_slab.part.44+0x23d/0x5d0
[ 8467.550751]  ? 0xffffffffa023a077
[ 8467.556639]  shrink_node+0x22c/0x330
[ 8467.562542]  kswapd+0x392/0x8f0
[ 8467.568422]  kthread+0x10f/0x150
[ 8467.574313]  ? mem_cgroup_shrink_node+0x2e0/0x2e0
[ 8467.580266]  ? kthread_create_on_node+0x60/0x60
[ 8467.586203]  ret_from_fork+0x29/0x40
[ 8467.592109] ---[ end trace f790bafb683609d5 ]---

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 4.10-rc2 list_lru_isolate list corruption
  2017-01-06 19:58   ` Dave Jones
@ 2017-01-07  1:19     ` Johannes Weiner
  2017-01-08  0:07       ` Dave Jones
  0 siblings, 1 reply; 7+ messages in thread
From: Johannes Weiner @ 2017-01-07  1:19 UTC (permalink / raw)
  To: Dave Jones; +Cc: Jan Kara, linux-mm

On Fri, Jan 06, 2017 at 02:58:51PM -0500, Dave Jones wrote:
> On Fri, Jan 06, 2017 at 11:59:41AM -0500, Johannes Weiner wrote:
>  > diff --git a/lib/radix-tree.c b/lib/radix-tree.c
>  > index 6f382e07de77..0783af1c0ebb 100644
>  > --- a/lib/radix-tree.c
>  > +++ b/lib/radix-tree.c
>  > @@ -640,6 +640,8 @@ static inline void radix_tree_shrink(struct radix_tree_root *root,
>  >  				update_node(node, private);
>  >  		}
>  >  
>  > +		WARN_ON_ONCE(!list_empty(&node->private_list));
>  > +
>  >  		radix_tree_node_free(node);
>  >  	}
>  >  }
> 
> [ 8467.462878] WARNING: CPU: 2 PID: 53 at lib/radix-tree.c:643 delete_node+0x1e4/0x200
> [ 8467.468770] CPU: 2 PID: 53 Comm: kswapd0 Not tainted 4.10.0-rc2-think+ #3 
> [ 8467.480436] Call Trace:
> [ 8467.486213]  dump_stack+0x4f/0x73
> [ 8467.491999]  __warn+0xcb/0xf0
> [ 8467.497769]  warn_slowpath_null+0x1d/0x20
> [ 8467.503566]  delete_node+0x1e4/0x200
> [ 8467.509468]  __radix_tree_delete_node+0xd/0x10
> [ 8467.515425]  shadow_lru_isolate+0xe6/0x220
> [ 8467.521337]  __list_lru_walk_one.isra.4+0x9b/0x190
> [ 8467.527176]  ? memcg_drain_all_list_lrus+0x1d0/0x1d0
> [ 8467.533066]  list_lru_walk_one+0x23/0x30
> [ 8467.538953]  scan_shadow_nodes+0x2e/0x40
> [ 8467.544840]  shrink_slab.part.44+0x23d/0x5d0
> [ 8467.550751]  ? 0xffffffffa023a077
> [ 8467.556639]  shrink_node+0x22c/0x330
> [ 8467.562542]  kswapd+0x392/0x8f0
> [ 8467.568422]  kthread+0x10f/0x150
> [ 8467.574313]  ? mem_cgroup_shrink_node+0x2e0/0x2e0
> [ 8467.580266]  ? kthread_create_on_node+0x60/0x60
> [ 8467.586203]  ret_from_fork+0x29/0x40
> [ 8467.592109] ---[ end trace f790bafb683609d5 ]---

Argh, __radix_tree_delete_node() makes the flawed assumption that only
the immediate branch it's mucking with can collapse. But this warning
points out that a sibling branch can collapse too, including its leaf.

Can you try if this patch fixes the problem?

---

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 4.10-rc2 list_lru_isolate list corruption
  2017-01-07  1:19     ` Johannes Weiner
@ 2017-01-08  0:07       ` Dave Jones
  2017-01-08  0:37         ` Hugh Dickins
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Jones @ 2017-01-08  0:07 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: Jan Kara, linux-mm

On Fri, Jan 06, 2017 at 08:19:31PM -0500, Johannes Weiner wrote:

 > Argh, __radix_tree_delete_node() makes the flawed assumption that only
 > the immediate branch it's mucking with can collapse. But this warning
 > points out that a sibling branch can collapse too, including its leaf.
 > 
 > Can you try if this patch fixes the problem?

18 hours and still running.. I think we can call it good.

	Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 4.10-rc2 list_lru_isolate list corruption
  2017-01-08  0:07       ` Dave Jones
@ 2017-01-08  0:37         ` Hugh Dickins
  2017-01-08  2:02           ` Johannes Weiner
  0 siblings, 1 reply; 7+ messages in thread
From: Hugh Dickins @ 2017-01-08  0:37 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: Dave Jones, Jan Kara, linux-mm

On Sat, 7 Jan 2017, Dave Jones wrote:
> On Fri, Jan 06, 2017 at 08:19:31PM -0500, Johannes Weiner wrote:
> 
>  > Argh, __radix_tree_delete_node() makes the flawed assumption that only
>  > the immediate branch it's mucking with can collapse. But this warning
>  > points out that a sibling branch can collapse too, including its leaf.
>  > 
>  > Can you try if this patch fixes the problem?
> 
> 18 hours and still running.. I think we can call it good.

I'm inclined to agree, though I haven't had it running long enough
(on a load like when it hit me a few times before) to be sure yet myself.
I'd rather see the proposed fix go in than wait longer for me:
I've certainly seen nothing bad from it yet.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 4.10-rc2 list_lru_isolate list corruption
  2017-01-08  0:37         ` Hugh Dickins
@ 2017-01-08  2:02           ` Johannes Weiner
  2017-01-08 20:30             ` Hugh Dickins
  0 siblings, 1 reply; 7+ messages in thread
From: Johannes Weiner @ 2017-01-08  2:02 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Dave Jones, Jan Kara, linux-mm

On Sat, Jan 07, 2017 at 04:37:43PM -0800, Hugh Dickins wrote:
> On Sat, 7 Jan 2017, Dave Jones wrote:
> > On Fri, Jan 06, 2017 at 08:19:31PM -0500, Johannes Weiner wrote:
> > 
> >  > Argh, __radix_tree_delete_node() makes the flawed assumption that only
> >  > the immediate branch it's mucking with can collapse. But this warning
> >  > points out that a sibling branch can collapse too, including its leaf.
> >  > 
> >  > Can you try if this patch fixes the problem?
> > 
> > 18 hours and still running.. I think we can call it good.
> 
> I'm inclined to agree, though I haven't had it running long enough
> (on a load like when it hit me a few times before) to be sure yet myself.
> I'd rather see the proposed fix go in than wait longer for me:
> I've certainly seen nothing bad from it yet.

Thank you both!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 4.10-rc2 list_lru_isolate list corruption
  2017-01-08  2:02           ` Johannes Weiner
@ 2017-01-08 20:30             ` Hugh Dickins
  0 siblings, 0 replies; 7+ messages in thread
From: Hugh Dickins @ 2017-01-08 20:30 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: Hugh Dickins, Dave Jones, Jan Kara, linux-mm

On Sat, 7 Jan 2017, Johannes Weiner wrote:
> On Sat, Jan 07, 2017 at 04:37:43PM -0800, Hugh Dickins wrote:
> > On Sat, 7 Jan 2017, Dave Jones wrote:
> > > On Fri, Jan 06, 2017 at 08:19:31PM -0500, Johannes Weiner wrote:
> > > 
> > >  > Argh, __radix_tree_delete_node() makes the flawed assumption that only
> > >  > the immediate branch it's mucking with can collapse. But this warning
> > >  > points out that a sibling branch can collapse too, including its leaf.
> > >  > 
> > >  > Can you try if this patch fixes the problem?
> > > 
> > > 18 hours and still running.. I think we can call it good.
> > 
> > I'm inclined to agree, though I haven't had it running long enough
> > (on a load like when it hit me a few times before) to be sure yet myself.
> > I'd rather see the proposed fix go in than wait longer for me:
> > I've certainly seen nothing bad from it yet.
> 
> Thank you both!

Been running successfully for 36 and 24 hours on two machines, each with
a different load that showed it much sooner before: I too call it good,
and thanks to Dave and you and Linus for getting the fix in for -rc3.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-01-08 20:30 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20170106052056.jihy5denyxsnfuo5@codemonkey.org.uk>
2017-01-06 16:59 ` 4.10-rc2 list_lru_isolate list corruption Johannes Weiner
2017-01-06 19:58   ` Dave Jones
2017-01-07  1:19     ` Johannes Weiner
2017-01-08  0:07       ` Dave Jones
2017-01-08  0:37         ` Hugh Dickins
2017-01-08  2:02           ` Johannes Weiner
2017-01-08 20:30             ` Hugh Dickins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).