* Re: 4.10-rc2 list_lru_isolate list corruption
[not found] <20170106052056.jihy5denyxsnfuo5@codemonkey.org.uk>
@ 2017-01-06 16:59 ` Johannes Weiner
2017-01-06 19:58 ` Dave Jones
0 siblings, 1 reply; 7+ messages in thread
From: Johannes Weiner @ 2017-01-06 16:59 UTC (permalink / raw)
To: Dave Jones; +Cc: Jan Kara, linux-mm
Dave, can you reproduce this by any chance with this patch applied?
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 6f382e07de77..0783af1c0ebb 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -640,6 +640,8 @@ static inline void radix_tree_shrink(struct radix_tree_root *root,
update_node(node, private);
}
+ WARN_ON_ONCE(!list_empty(&node->private_list));
+
radix_tree_node_free(node);
}
}
@@ -666,6 +668,8 @@ static void delete_node(struct radix_tree_root *root,
root->rnode = NULL;
}
+ WARN_ON_ONCE(!list_empty(&node->private_list));
+
radix_tree_node_free(node);
node = parent;
@@ -767,6 +771,7 @@ static void radix_tree_free_nodes(struct radix_tree_node *node)
struct radix_tree_node *old = child;
offset = child->offset + 1;
child = child->parent;
+ WARN_ON_ONCE(!list_empty(&node->private_list));
radix_tree_node_free(old);
if (old == entry_to_node(node))
return;
On Fri, Jan 06, 2017 at 12:20:56AM -0500, Dave Jones wrote:
> While fuzzing today, I triggered list corruption in the mm code twice.
>
> Exhibit a:
>
> WARNING: CPU: 1 PID: 53 at lib/list_debug.c:55 __list_del_entry_valid+0x5c/0xc0
> list_del corruption. next->prev should be ffff8804c31b8e60, but was ffffffff813d2dc0
> CPU: 1 PID: 53 Comm: kswapd0 Not tainted 4.10.0-rc2-think+ #2
> Call Trace:
> dump_stack+0x4f/0x73
> __warn+0xcb/0xf0
> warn_slowpath_fmt+0x5f/0x80
> ? warn_slowpath_fmt+0x5/0x80
> ? radix_tree_free_nodes+0xa0/0xa0
> __list_del_entry_valid+0x5c/0xc0
> list_lru_isolate+0x1a/0x40
> shadow_lru_isolate+0x3e/0x220
> __list_lru_walk_one.isra.4+0x9b/0x190
> ? memcg_drain_all_list_lrus+0x1d0/0x1d0
> list_lru_walk_one+0x23/0x30
> scan_shadow_nodes+0x2e/0x40
> shrink_slab.part.44+0x23d/0x5d0
> ? 0xffffffffa0285077
> shrink_node+0x22c/0x330
> kswapd+0x392/0x8f0
> kthread+0x10f/0x150
> ? mem_cgroup_shrink_node+0x2e0/0x2e0
> ? kthread_create_on_node+0x60/0x60
> ret_from_fork+0x22/0x30
>
>
> Exhibit b:
>
>
> WARNING: CPU: 0 PID: 17728 at lib/list_debug.c:55 __list_del_entry_valid+0x5c/0xc0
> list_del corruption. next->prev should be ffff8804f8972030, but was ffffffff813d2dc0
> CPU: 0 PID: 17728 Comm: trinity-c28 Not tainted 4.10.0-rc2-think+ #2
> Call Trace:
> dump_stack+0x4f/0x73
> __warn+0xcb/0xf0
> warn_slowpath_fmt+0x5f/0x80
> ? warn_slowpath_fmt+0x5/0x80
> ? radix_tree_free_nodes+0xa0/0xa0
> __list_del_entry_valid+0x5c/0xc0
> list_lru_isolate+0x1a/0x40
> shadow_lru_isolate+0x3e/0x220
> __list_lru_walk_one.isra.4+0x9b/0x190
> ? memcg_drain_all_list_lrus+0x1d0/0x1d0
> list_lru_walk_one+0x23/0x30
> scan_shadow_nodes+0x2e/0x40
> shrink_slab.part.44+0x23d/0x5d0
> ? 0xffffffffa0333077
> shrink_node+0x22c/0x330
> do_try_to_free_pages+0xf5/0x330
> try_to_free_pages+0x132/0x310
> __alloc_pages_slowpath+0x357/0xaa0
> __alloc_pages_nodemask+0x3cc/0x460
> __do_page_cache_readahead+0x165/0x370
> ? __do_page_cache_readahead+0xed/0x370
> ? __do_page_cache_readahead+0x5/0x370
> ondemand_readahead+0x112/0x350
> ? page_cache_sync_readahead+0x5/0x50
> page_cache_sync_readahead+0x31/0x50
> generic_file_read_iter+0x724/0x960
> ? rw_copy_check_uvector+0x8e/0x190
> ? generic_file_read_iter+0x5/0x960
> do_iter_readv_writev+0xb8/0x120
> do_readv_writev+0x1a4/0x250
> ? do_readv_writev+0x5/0x250
> ? vfs_readv+0x5/0x50
> vfs_readv+0x3c/0x50
> do_preadv+0xb5/0xd0
> SyS_preadv+0x11/0x20
> do_syscall_64+0x61/0x170
> entry_SYSCALL64_slow_path+0x25/0x25
> RIP: 0033:0x7f5cb7c1e119
> RSP: 002b:00007ffc7e7d2758 EFLAGS: 00000246
> [CONT START] ORIG_RAX: 0000000000000127
> RAX: ffffffffffffffda RBX: 0000000000000127 RCX: 00007f5cb7c1e119
> RDX: 0000000000000037 RSI: 00005561d7798a70 RDI: 000000000000000c
> RBP: 00007f5cb8228000 R08: 00000000a0000033 R09: 0000000000000030
> R10: 0000000000400000 R11: 0000000000000246 R12: 0000000000000002
> R13: 00007f5cb8228048 R14: 00007f5cb82f3ad8 R15: 00007f5cb8228000
>
>
> Interesting that the 'but was' value is the same on two seperate boots.
>
>
> It looks like mm/list_lru.c didn't change recently, but mm/workingset.c did,
> which calls into this.. Johannes ?
>
> Dave
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: 4.10-rc2 list_lru_isolate list corruption
2017-01-06 16:59 ` 4.10-rc2 list_lru_isolate list corruption Johannes Weiner
@ 2017-01-06 19:58 ` Dave Jones
2017-01-07 1:19 ` Johannes Weiner
0 siblings, 1 reply; 7+ messages in thread
From: Dave Jones @ 2017-01-06 19:58 UTC (permalink / raw)
To: Johannes Weiner; +Cc: Jan Kara, linux-mm
On Fri, Jan 06, 2017 at 11:59:41AM -0500, Johannes Weiner wrote:
> Dave, can you reproduce this by any chance with this patch applied?
yep.
> diff --git a/lib/radix-tree.c b/lib/radix-tree.c
> index 6f382e07de77..0783af1c0ebb 100644
> --- a/lib/radix-tree.c
> +++ b/lib/radix-tree.c
> @@ -640,6 +640,8 @@ static inline void radix_tree_shrink(struct radix_tree_root *root,
> update_node(node, private);
> }
>
> + WARN_ON_ONCE(!list_empty(&node->private_list));
> +
> radix_tree_node_free(node);
> }
> }
[ 8467.462878] WARNING: CPU: 2 PID: 53 at lib/radix-tree.c:643 delete_node+0x1e4/0x200
[ 8467.468770] CPU: 2 PID: 53 Comm: kswapd0 Not tainted 4.10.0-rc2-think+ #3
[ 8467.480436] Call Trace:
[ 8467.486213] dump_stack+0x4f/0x73
[ 8467.491999] __warn+0xcb/0xf0
[ 8467.497769] warn_slowpath_null+0x1d/0x20
[ 8467.503566] delete_node+0x1e4/0x200
[ 8467.509468] __radix_tree_delete_node+0xd/0x10
[ 8467.515425] shadow_lru_isolate+0xe6/0x220
[ 8467.521337] __list_lru_walk_one.isra.4+0x9b/0x190
[ 8467.527176] ? memcg_drain_all_list_lrus+0x1d0/0x1d0
[ 8467.533066] list_lru_walk_one+0x23/0x30
[ 8467.538953] scan_shadow_nodes+0x2e/0x40
[ 8467.544840] shrink_slab.part.44+0x23d/0x5d0
[ 8467.550751] ? 0xffffffffa023a077
[ 8467.556639] shrink_node+0x22c/0x330
[ 8467.562542] kswapd+0x392/0x8f0
[ 8467.568422] kthread+0x10f/0x150
[ 8467.574313] ? mem_cgroup_shrink_node+0x2e0/0x2e0
[ 8467.580266] ? kthread_create_on_node+0x60/0x60
[ 8467.586203] ret_from_fork+0x29/0x40
[ 8467.592109] ---[ end trace f790bafb683609d5 ]---
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: 4.10-rc2 list_lru_isolate list corruption
2017-01-06 19:58 ` Dave Jones
@ 2017-01-07 1:19 ` Johannes Weiner
2017-01-08 0:07 ` Dave Jones
0 siblings, 1 reply; 7+ messages in thread
From: Johannes Weiner @ 2017-01-07 1:19 UTC (permalink / raw)
To: Dave Jones; +Cc: Jan Kara, linux-mm
On Fri, Jan 06, 2017 at 02:58:51PM -0500, Dave Jones wrote:
> On Fri, Jan 06, 2017 at 11:59:41AM -0500, Johannes Weiner wrote:
> > diff --git a/lib/radix-tree.c b/lib/radix-tree.c
> > index 6f382e07de77..0783af1c0ebb 100644
> > --- a/lib/radix-tree.c
> > +++ b/lib/radix-tree.c
> > @@ -640,6 +640,8 @@ static inline void radix_tree_shrink(struct radix_tree_root *root,
> > update_node(node, private);
> > }
> >
> > + WARN_ON_ONCE(!list_empty(&node->private_list));
> > +
> > radix_tree_node_free(node);
> > }
> > }
>
> [ 8467.462878] WARNING: CPU: 2 PID: 53 at lib/radix-tree.c:643 delete_node+0x1e4/0x200
> [ 8467.468770] CPU: 2 PID: 53 Comm: kswapd0 Not tainted 4.10.0-rc2-think+ #3
> [ 8467.480436] Call Trace:
> [ 8467.486213] dump_stack+0x4f/0x73
> [ 8467.491999] __warn+0xcb/0xf0
> [ 8467.497769] warn_slowpath_null+0x1d/0x20
> [ 8467.503566] delete_node+0x1e4/0x200
> [ 8467.509468] __radix_tree_delete_node+0xd/0x10
> [ 8467.515425] shadow_lru_isolate+0xe6/0x220
> [ 8467.521337] __list_lru_walk_one.isra.4+0x9b/0x190
> [ 8467.527176] ? memcg_drain_all_list_lrus+0x1d0/0x1d0
> [ 8467.533066] list_lru_walk_one+0x23/0x30
> [ 8467.538953] scan_shadow_nodes+0x2e/0x40
> [ 8467.544840] shrink_slab.part.44+0x23d/0x5d0
> [ 8467.550751] ? 0xffffffffa023a077
> [ 8467.556639] shrink_node+0x22c/0x330
> [ 8467.562542] kswapd+0x392/0x8f0
> [ 8467.568422] kthread+0x10f/0x150
> [ 8467.574313] ? mem_cgroup_shrink_node+0x2e0/0x2e0
> [ 8467.580266] ? kthread_create_on_node+0x60/0x60
> [ 8467.586203] ret_from_fork+0x29/0x40
> [ 8467.592109] ---[ end trace f790bafb683609d5 ]---
Argh, __radix_tree_delete_node() makes the flawed assumption that only
the immediate branch it's mucking with can collapse. But this warning
points out that a sibling branch can collapse too, including its leaf.
Can you try if this patch fixes the problem?
---
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: 4.10-rc2 list_lru_isolate list corruption
2017-01-07 1:19 ` Johannes Weiner
@ 2017-01-08 0:07 ` Dave Jones
2017-01-08 0:37 ` Hugh Dickins
0 siblings, 1 reply; 7+ messages in thread
From: Dave Jones @ 2017-01-08 0:07 UTC (permalink / raw)
To: Johannes Weiner; +Cc: Jan Kara, linux-mm
On Fri, Jan 06, 2017 at 08:19:31PM -0500, Johannes Weiner wrote:
> Argh, __radix_tree_delete_node() makes the flawed assumption that only
> the immediate branch it's mucking with can collapse. But this warning
> points out that a sibling branch can collapse too, including its leaf.
>
> Can you try if this patch fixes the problem?
18 hours and still running.. I think we can call it good.
Dave
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: 4.10-rc2 list_lru_isolate list corruption
2017-01-08 0:07 ` Dave Jones
@ 2017-01-08 0:37 ` Hugh Dickins
2017-01-08 2:02 ` Johannes Weiner
0 siblings, 1 reply; 7+ messages in thread
From: Hugh Dickins @ 2017-01-08 0:37 UTC (permalink / raw)
To: Johannes Weiner; +Cc: Dave Jones, Jan Kara, linux-mm
On Sat, 7 Jan 2017, Dave Jones wrote:
> On Fri, Jan 06, 2017 at 08:19:31PM -0500, Johannes Weiner wrote:
>
> > Argh, __radix_tree_delete_node() makes the flawed assumption that only
> > the immediate branch it's mucking with can collapse. But this warning
> > points out that a sibling branch can collapse too, including its leaf.
> >
> > Can you try if this patch fixes the problem?
>
> 18 hours and still running.. I think we can call it good.
I'm inclined to agree, though I haven't had it running long enough
(on a load like when it hit me a few times before) to be sure yet myself.
I'd rather see the proposed fix go in than wait longer for me:
I've certainly seen nothing bad from it yet.
Hugh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: 4.10-rc2 list_lru_isolate list corruption
2017-01-08 0:37 ` Hugh Dickins
@ 2017-01-08 2:02 ` Johannes Weiner
2017-01-08 20:30 ` Hugh Dickins
0 siblings, 1 reply; 7+ messages in thread
From: Johannes Weiner @ 2017-01-08 2:02 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Dave Jones, Jan Kara, linux-mm
On Sat, Jan 07, 2017 at 04:37:43PM -0800, Hugh Dickins wrote:
> On Sat, 7 Jan 2017, Dave Jones wrote:
> > On Fri, Jan 06, 2017 at 08:19:31PM -0500, Johannes Weiner wrote:
> >
> > > Argh, __radix_tree_delete_node() makes the flawed assumption that only
> > > the immediate branch it's mucking with can collapse. But this warning
> > > points out that a sibling branch can collapse too, including its leaf.
> > >
> > > Can you try if this patch fixes the problem?
> >
> > 18 hours and still running.. I think we can call it good.
>
> I'm inclined to agree, though I haven't had it running long enough
> (on a load like when it hit me a few times before) to be sure yet myself.
> I'd rather see the proposed fix go in than wait longer for me:
> I've certainly seen nothing bad from it yet.
Thank you both!
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: 4.10-rc2 list_lru_isolate list corruption
2017-01-08 2:02 ` Johannes Weiner
@ 2017-01-08 20:30 ` Hugh Dickins
0 siblings, 0 replies; 7+ messages in thread
From: Hugh Dickins @ 2017-01-08 20:30 UTC (permalink / raw)
To: Johannes Weiner; +Cc: Hugh Dickins, Dave Jones, Jan Kara, linux-mm
On Sat, 7 Jan 2017, Johannes Weiner wrote:
> On Sat, Jan 07, 2017 at 04:37:43PM -0800, Hugh Dickins wrote:
> > On Sat, 7 Jan 2017, Dave Jones wrote:
> > > On Fri, Jan 06, 2017 at 08:19:31PM -0500, Johannes Weiner wrote:
> > >
> > > > Argh, __radix_tree_delete_node() makes the flawed assumption that only
> > > > the immediate branch it's mucking with can collapse. But this warning
> > > > points out that a sibling branch can collapse too, including its leaf.
> > > >
> > > > Can you try if this patch fixes the problem?
> > >
> > > 18 hours and still running.. I think we can call it good.
> >
> > I'm inclined to agree, though I haven't had it running long enough
> > (on a load like when it hit me a few times before) to be sure yet myself.
> > I'd rather see the proposed fix go in than wait longer for me:
> > I've certainly seen nothing bad from it yet.
>
> Thank you both!
Been running successfully for 36 and 24 hours on two machines, each with
a different load that showed it much sooner before: I too call it good,
and thanks to Dave and you and Linus for getting the fix in for -rc3.
Hugh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2017-01-08 20:30 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20170106052056.jihy5denyxsnfuo5@codemonkey.org.uk>
2017-01-06 16:59 ` 4.10-rc2 list_lru_isolate list corruption Johannes Weiner
2017-01-06 19:58 ` Dave Jones
2017-01-07 1:19 ` Johannes Weiner
2017-01-08 0:07 ` Dave Jones
2017-01-08 0:37 ` Hugh Dickins
2017-01-08 2:02 ` Johannes Weiner
2017-01-08 20:30 ` Hugh Dickins
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).