linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: Glauber Costa <glommer@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>
Subject: Re: linux-next: slab shrinkers: BUG at mm/list_lru.c:92
Date: Wed, 3 Jul 2013 21:24:03 +1000	[thread overview]
Message-ID: <20130703112403.GP14996@dastard> (raw)
In-Reply-To: <20130702124427.GG16815@dhcp22.suse.cz>

On Tue, Jul 02, 2013 at 02:44:27PM +0200, Michal Hocko wrote:
> On Tue 02-07-13 22:19:47, Dave Chinner wrote:
> [...]
> > Ok, so it's been leaked from a dispose list somehow. Thanks for the
> > info, Michal, it's time to go look at the code....
> 
> OK, just in case we will need it, I am keeping the machine in this state
> for now. So we still can play with crash and check all the juicy
> internals.

My current suspect is the LRU_RETRY code. I don't think what it is
doing is at all valid - list_for_each_safe() is not safe if you drop
the lock that protects the list. i.e. there is nothing that protects
the stored next pointer from being removed from the list by someone
else. Hence what I think is occurring is this:


thread 1			thread 2
lock(lru)
list_for_each_safe(lru)		lock(lru)
  isolate			......
    lock(i_lock)
    has buffers
      __iget
      unlock(i_lock)
      unlock(lru)
      .....			(gets lru lock)
      				list_for_each_safe(lru)
				  walks all the inodes
				  finds inode being isolated by other thread
				  isolate
				    i_count > 0
				      list_del_init(i_lru)
				      return LRU_REMOVED;
				   moves to next inode, inode that
				   other thread has stored as next
				   isolate
				     i_state |= I_FREEING
				     list_move(dispose_list)
				     return LRU_REMOVED
				 ....
				 unlock(lru)
      lock(lru)
      return LRU_RETRY;
  if (!first_pass)
    ....
  --nr_to_scan
  (loop again using next, which has already been removed from the
  LRU by the other thread!)
  isolate
    lock(i_lock)
    if (i_state & ~I_REFERENCED)
      list_del_init(i_lru)	<<<<< inode is on dispose list!
				<<<<< inode is now isolated, with I_FREEING set
      return LRU_REMOVED;

That fits the corpse left on your machine, Michal. One thread has
moved the inode to a dispose list, the other thread thinks it is
still on the LRU and should be removed, and removes it.

This also explains the lru item count going negative - the same item
is being removed from the lru twice. So it seems like all the
problems you've been seeing are caused by this one problem....

Patch below that should fix this.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

list_lru: fix broken LRU_RETRY behaviour

From: Dave Chinner <dchinner@redhat.com>

The LRU_RETRY code assumes that the list traversal status after we
have dropped and regained the list lock. Unfortunately, this is not
a valid assumption, and that can lead to racing traversals isolating
objects that the other traversal expects to be the next item on the
list.

This is causing problems with the inode cache shrinker isolation,
with races resulting in an inode on a dispose list being "isolated"
because a racing traversal still thinks it is on the LRU. The inode
is then never reclaimed and that causes hangs if a subsequent lookup
on that inode occurs.

Fix it by always restarting the list walk on a LRU_RETRY return from
the isolate callback. Avoid the possibility of livelocks the current
code was trying to aavoid by always decrementing the nr_to_walk
counter on retries so that even if we keep hitting the same item on
the list we'll eventually stop trying to walk and exit out of the
situation causing the problem.

Reported-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 mm/list_lru.c |   29 ++++++++++++-----------------
 1 file changed, 12 insertions(+), 17 deletions(-)

diff --git a/mm/list_lru.c b/mm/list_lru.c
index dc71659..7246791 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -71,19 +71,19 @@ list_lru_walk_node(struct list_lru *lru, int nid, list_lru_walk_cb isolate,
 	struct list_lru_node	*nlru = &lru->node[nid];
 	struct list_head *item, *n;
 	unsigned long isolated = 0;
-	/*
-	 * If we don't keep state of at which pass we are, we can loop at
-	 * LRU_RETRY, since we have no guarantees that the caller will be able
-	 * to do something other than retry on the next pass. We handle this by
-	 * allowing at most one retry per object. This should not be altered
-	 * by any condition other than LRU_RETRY.
-	 */
-	bool first_pass = true;
 
 	spin_lock(&nlru->lock);
 restart:
 	list_for_each_safe(item, n, &nlru->list) {
 		enum lru_status ret;
+
+		/*
+		 * decrement nr_to_walk first so that we don't livelock if we
+		 * get stuck on large numbesr of LRU_RETRY items
+		 */
+		if (--(*nr_to_walk) == 0)
+			break;
+
 		ret = isolate(item, &nlru->lock, cb_arg);
 		switch (ret) {
 		case LRU_REMOVED:
@@ -98,19 +98,14 @@ restart:
 		case LRU_SKIP:
 			break;
 		case LRU_RETRY:
-			if (!first_pass) {
-				first_pass = true;
-				break;
-			}
-			first_pass = false;
+			/*
+			 * The lru lock has been dropped, our list traversal is
+			 * now invalid and so we have to restart from scratch.
+			 */
 			goto restart;
 		default:
 			BUG();
 		}
-
-		if ((*nr_to_walk)-- == 0)
-			break;
-
 	}
 
 	spin_unlock(&nlru->lock);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-07-03 11:24 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-17 14:18 linux-next: slab shrinkers: BUG at mm/list_lru.c:92 Michal Hocko
2013-06-17 15:14 ` Glauber Costa
2013-06-17 15:33   ` Michal Hocko
2013-06-17 16:54     ` Glauber Costa
2013-06-18  7:42       ` Michal Hocko
2013-06-17 21:35   ` Andrew Morton
2013-06-17 22:30     ` Glauber Costa
2013-06-18  2:46       ` Dave Chinner
2013-06-18  6:31         ` Glauber Costa
2013-06-18  8:24           ` Michal Hocko
2013-06-18 10:44             ` Michal Hocko
2013-06-18 13:50               ` Michal Hocko
2013-06-25  2:27                 ` Dave Chinner
2013-06-26  8:15                   ` Michal Hocko
2013-06-26 23:24                     ` Dave Chinner
2013-06-27 14:54                       ` Michal Hocko
2013-06-28  8:39                         ` Michal Hocko
2013-06-28 14:31                           ` Glauber Costa
2013-06-28 15:12                             ` Michal Hocko
2013-06-29  2:55                         ` Dave Chinner
2013-06-30 18:33                           ` Michal Hocko
2013-07-01  1:25                             ` Dave Chinner
2013-07-01  7:50                               ` Michal Hocko
2013-07-01  8:10                                 ` Dave Chinner
2013-07-02  9:22                                   ` Michal Hocko
2013-07-02 12:19                                     ` Dave Chinner
2013-07-02 12:44                                       ` Michal Hocko
2013-07-03 11:24                                         ` Dave Chinner [this message]
2013-07-03 14:08                                           ` Glauber Costa
2013-07-04 16:36                                           ` Michal Hocko
2013-07-08 12:53                                             ` Michal Hocko
2013-07-08 21:04                                               ` Andrew Morton
2013-07-09 17:34                                                 ` Glauber Costa
2013-07-09 17:51                                                   ` Andrew Morton
2013-07-09 17:32                                               ` Glauber Costa
2013-07-09 17:50                                                 ` Andrew Morton
2013-07-09 17:57                                                   ` Glauber Costa
2013-07-09 17:57                                                 ` Michal Hocko
2013-07-09 21:39                                                   ` Andrew Morton
2013-07-10  2:31                                               ` Dave Chinner
2013-07-10  7:34                                                 ` Michal Hocko
2013-07-10  8:06                                                 ` Michal Hocko
2013-07-11  2:26                                                   ` Dave Chinner
2013-07-11  3:03                                                     ` Andrew Morton
2013-07-11 13:23                                                     ` Michal Hocko
2013-07-12  1:42                                                       ` Hugh Dickins
2013-07-13  3:29                                                         ` Dave Chinner
2013-07-15  9:14                                             ` Michal Hocko
2013-06-18  6:26       ` Glauber Costa
2013-06-18  8:25         ` Michal Hocko
2013-06-19  7:13         ` Michal Hocko
2013-06-19  7:35           ` Glauber Costa
2013-06-19  8:52             ` Glauber Costa
2013-06-19 13:57             ` Michal Hocko
2013-06-19 14:02               ` Glauber Costa
2013-06-19 14:28           ` Michal Hocko
2013-06-20 14:11             ` Glauber Costa
2013-06-20 15:12               ` Michal Hocko
2013-06-20 15:16                 ` Michal Hocko
2013-06-21  9:00                 ` Michal Hocko
2013-06-23 11:51                   ` Glauber Costa
2013-06-23 11:55                     ` Glauber Costa
2013-06-25  2:29                     ` Dave Chinner
2013-06-26  8:22                     ` Michal Hocko
2013-06-18  8:19       ` Michal Hocko
2013-06-18  8:21         ` Glauber Costa
2013-06-18  8:26           ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130703112403.GP14996@dastard \
    --to=david@fromorbit.com \
    --cc=akpm@linux-foundation.org \
    --cc=glommer@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).