[PATCH v2] fs/mbcache: make sure mb_cache_count() not return negative value.

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2] fs/mbcache: make sure mb_cache_count() not return negative value.
@ 2018-01-08 23:38 Jiang Biao
  2018-01-09  0:13 ` Andrew Morton
  2018-01-10  4:58 ` Theodore Ts'o
  0 siblings, 2 replies; 7+ messages in thread
From: Jiang Biao @ 2018-01-08 23:38 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-ext4, tytso, ebiggers, akpm, jack, jiang.biao2,
	zhong.weidong

When running ltp stress test for 7*24 hours, vmscan occasionally emits the
following warning continuously:

mb_cache_scan+0x0/0x3f0 negative objects to delete
nr=-9232265467809300450
....

Trace info shows the freeable(mb_cache_count returns) is -1, which causes
the continuous accumulation and overflow of total_scan.

This patch makes sure that mb_cache_count() not return a negative value,
which makes the mbcache shrinker more robust.

Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn>
CC: "Theodore Ts'o" <tytso@mit.edu>
CC: Eric Biggers <ebiggers@google.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Jan Kara <jack@suse.cz>
---
 fs/mbcache.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/mbcache.c b/fs/mbcache.c
index b8b8b9c..2a800e3 100644
--- a/fs/mbcache.c
+++ b/fs/mbcache.c
@@ -238,7 +238,11 @@ void mb_cache_entry_delete(struct mb_cache *cache, u32 key, u64 value)
 			spin_lock(&cache->c_list_lock);
 			if (!list_empty(&entry->e_list)) {
 				list_del_init(&entry->e_list);
-				cache->c_entry_count--;
+				if (cache->c_entry_count > 0)
+					cache->c_entry_count--;
+				else
+					WARN_ONCE(1, "mbcache: Entry count "
+                          "going negative!\n");
 				atomic_dec(&entry->e_refcnt);
 			}
 			spin_unlock(&cache->c_list_lock);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] fs/mbcache: make sure mb_cache_count() not return negative value.
  2018-01-08 23:38 [PATCH v2] fs/mbcache: make sure mb_cache_count() not return negative value Jiang Biao
@ 2018-01-09  0:13 ` Andrew Morton
  2018-01-10  4:26   ` Theodore Ts'o
  2018-01-10  4:58 ` Theodore Ts'o
  1 sibling, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2018-01-09  0:13 UTC (permalink / raw)
  To: Jiang Biao
  Cc: linux-fsdevel, linux-ext4, tytso, ebiggers, jack, zhong.weidong

On Tue, 9 Jan 2018 07:38:11 +0800 Jiang Biao <jiang.biao2@zte.com.cn> wrote:

> When running ltp stress test for 7*24 hours, vmscan occasionally emits the
> following warning continuously:
> 
> mb_cache_scan+0x0/0x3f0 negative objects to delete
> nr=-9232265467809300450
> ....
> 
> Trace info shows the freeable(mb_cache_count returns) is -1, which causes
> the continuous accumulation and overflow of total_scan.
> 
> This patch makes sure that mb_cache_count() not return a negative value,
> which makes the mbcache shrinker more robust.
> 
> ...
>
> --- a/fs/mbcache.c
> +++ b/fs/mbcache.c
> @@ -238,7 +238,11 @@ void mb_cache_entry_delete(struct mb_cache *cache, u32 key, u64 value)
>  			spin_lock(&cache->c_list_lock);
>  			if (!list_empty(&entry->e_list)) {
>  				list_del_init(&entry->e_list);
> -				cache->c_entry_count--;
> +				if (cache->c_entry_count > 0)
> +					cache->c_entry_count--;
> +				else
> +					WARN_ONCE(1, "mbcache: Entry count "
> +                          "going negative!\n");
>  				atomic_dec(&entry->e_refcnt);
>  			}
>  			spin_unlock(&cache->c_list_lock);

I agree with Jan's comment.  We need to figure out how ->c_entry_count
went negative.  mb_cache_count() says this state is "Unlikely, but not
impossible", but from a quick read I can't see how this happens - it
appears that coherency between ->c_list and ->c_entry_count is always
maintained under ->c_list_lock?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] fs/mbcache: make sure mb_cache_count() not return negative value.
  2018-01-09  0:13 ` Andrew Morton
@ 2018-01-10  4:26   ` Theodore Ts'o
  2018-01-10 15:02     ` Jan Kara
  0 siblings, 1 reply; 7+ messages in thread
From: Theodore Ts'o @ 2018-01-10  4:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jiang Biao, linux-fsdevel, linux-ext4, ebiggers, jack,
	zhong.weidong

On Mon, Jan 08, 2018 at 04:13:04PM -0800, Andrew Morton wrote:
> I agree with Jan's comment.  We need to figure out how ->c_entry_count
> went negative.  mb_cache_count() says this state is "Unlikely, but not
> impossible", but from a quick read I can't see how this happens - it
> appears that coherency between ->c_list and ->c_entry_count is always
> maintained under ->c_list_lock?

I think I see the problem; and I think this should fix it.  Andrew,
Jan, can you review and double check my analysis?

Thanks,

     	     	    	       	     	- Ted

commit 18fb3649c7cd9e70f05045656c1888459d96dfe4
Author: Theodore Ts'o <tytso@mit.edu>
Date:   Tue Jan 9 23:24:53 2018 -0500

    mbcache: fix potential double counting when removing entry
    
    Entries are removed from the mb_cache entry in two places:
    mb_cache_shrink() and mb_cache_entry_delete().  The mb_cache_shrink()
    function finds the entry to delete via the cache->c_list pointer,
    while mb_cache_entry_delete() finds the entry via the hash lists.
    
    If the two functions race with each other, trying to delete an entry
    at the same time, it's possible for cache->c_entry_count to get
    decremented twice for that one entry.  Fix this by checking to see if
    entry is still on the cache list before removing it and dropping
    c_entry_count.
    
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>

diff --git a/fs/mbcache.c b/fs/mbcache.c
index 49c5b25bfa8c..0851af5c1c3d 100644
--- a/fs/mbcache.c
+++ b/fs/mbcache.c
@@ -290,8 +290,10 @@ static unsigned long mb_cache_shrink(struct mb_cache *cache,
 			list_move_tail(&entry->e_list, &cache->c_list);
 			continue;
 		}
-		list_del_init(&entry->e_list);
-		cache->c_entry_count--;
+		if (!list_empty(&entry->e_list)) {
+			list_del_init(&entry->e_list);
+			cache->c_entry_count--;
+		}
 		/*
 		 * We keep LRU list reference so that entry doesn't go away
 		 * from under us.

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] fs/mbcache: make sure mb_cache_count() not return negative value.
  2018-01-10  4:26   ` Theodore Ts'o
@ 2018-01-10 15:02     ` Jan Kara
  2018-01-10 20:11       ` Theodore Ts'o
  0 siblings, 1 reply; 7+ messages in thread
From: Jan Kara @ 2018-01-10 15:02 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Andrew Morton, Jiang Biao, linux-fsdevel, linux-ext4, ebiggers,
	jack, zhong.weidong

On Tue 09-01-18 23:26:01, Theodore Ts'o wrote:
> On Mon, Jan 08, 2018 at 04:13:04PM -0800, Andrew Morton wrote:
> > I agree with Jan's comment.  We need to figure out how ->c_entry_count
> > went negative.  mb_cache_count() says this state is "Unlikely, but not
> > impossible", but from a quick read I can't see how this happens - it
> > appears that coherency between ->c_list and ->c_entry_count is always
> > maintained under ->c_list_lock?
> 
> I think I see the problem; and I think this should fix it.  Andrew,
> Jan, can you review and double check my analysis?
> 
> Thanks,
> 
>      	     	    	       	     	- Ted
> 
> commit 18fb3649c7cd9e70f05045656c1888459d96dfe4
> Author: Theodore Ts'o <tytso@mit.edu>
> Date:   Tue Jan 9 23:24:53 2018 -0500
> 
>     mbcache: fix potential double counting when removing entry
>     
>     Entries are removed from the mb_cache entry in two places:
>     mb_cache_shrink() and mb_cache_entry_delete().  The mb_cache_shrink()
>     function finds the entry to delete via the cache->c_list pointer,
>     while mb_cache_entry_delete() finds the entry via the hash lists.
>     
>     If the two functions race with each other, trying to delete an entry
>     at the same time, it's possible for cache->c_entry_count to get
>     decremented twice for that one entry.  Fix this by checking to see if
>     entry is still on the cache list before removing it and dropping
>     c_entry_count.

So I don't think this can be a problem. Look, mb_cache_shrink() holds
c_list_lock. It will take first entry from cache->c_list - this list is
using list_head entry->e_list and so we are guaranteed entry->e_list is
non-empty.

The other place deleting entry - mb_cache_entry_delete() - which is using
different list to grab the entry is properly checking for
!list_empty(entry->e_list) after acquiring c_list_lock.

									Honza

>     
>     Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> 
> diff --git a/fs/mbcache.c b/fs/mbcache.c
> index 49c5b25bfa8c..0851af5c1c3d 100644
> --- a/fs/mbcache.c
> +++ b/fs/mbcache.c
> @@ -290,8 +290,10 @@ static unsigned long mb_cache_shrink(struct mb_cache *cache,
>  			list_move_tail(&entry->e_list, &cache->c_list);
>  			continue;
>  		}
> -		list_del_init(&entry->e_list);
> -		cache->c_entry_count--;
> +		if (!list_empty(&entry->e_list)) {
> +			list_del_init(&entry->e_list);
> +			cache->c_entry_count--;
> +		}
>  		/*
>  		 * We keep LRU list reference so that entry doesn't go away
>  		 * from under us.
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] fs/mbcache: make sure mb_cache_count() not return negative value.
  2018-01-10 15:02     ` Jan Kara
@ 2018-01-10 20:11       ` Theodore Ts'o
  2018-01-11  9:04         ` Jan Kara
  0 siblings, 1 reply; 7+ messages in thread
From: Theodore Ts'o @ 2018-01-10 20:11 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andrew Morton, Jiang Biao, linux-fsdevel, linux-ext4, ebiggers,
	zhong.weidong

On Wed, Jan 10, 2018 at 04:02:23PM +0100, Jan Kara wrote:
> So I don't think this can be a problem. Look, mb_cache_shrink() holds
> c_list_lock. It will take first entry from cache->c_list - this list is
> using list_head entry->e_list and so we are guaranteed entry->e_list is
> non-empty.
> 
> The other place deleting entry - mb_cache_entry_delete() - which is using
> different list to grab the entry is properly checking for
> !list_empty(entry->e_list) after acquiring c_list_lock.

Hmm... you're right.  How we handle the hlist_bl_lock and c_list_lock
still creeps me out a bit, but it's not going to cause the potential
problem.  I think there is a problem if mb_cache_entry_create() races
with mb_cache_delete(), but that will result in an entry being on the
c_list while not being on the hash list, and it doesn't cause the
c_entry_count to get out of sync with reality.

Drat....

						- Ted

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] fs/mbcache: make sure mb_cache_count() not return negative value.
  2018-01-10 20:11       ` Theodore Ts'o
@ 2018-01-11  9:04         ` Jan Kara
  0 siblings, 0 replies; 7+ messages in thread
From: Jan Kara @ 2018-01-11  9:04 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Jan Kara, Andrew Morton, Jiang Biao, linux-fsdevel, linux-ext4,
	ebiggers, zhong.weidong

[-- Attachment #1: Type: text/plain, Size: 2772 bytes --]

On Wed 10-01-18 15:11:37, Theodore Ts'o wrote:
> On Wed, Jan 10, 2018 at 04:02:23PM +0100, Jan Kara wrote:
> > So I don't think this can be a problem. Look, mb_cache_shrink() holds
> > c_list_lock. It will take first entry from cache->c_list - this list is
> > using list_head entry->e_list and so we are guaranteed entry->e_list is
> > non-empty.
> > 
> > The other place deleting entry - mb_cache_entry_delete() - which is using
> > different list to grab the entry is properly checking for
> > !list_empty(entry->e_list) after acquiring c_list_lock.
> 
> Hmm... you're right.  How we handle the hlist_bl_lock and c_list_lock
> still creeps me out a bit, but it's not going to cause the potential
> problem.  I think there is a problem if mb_cache_entry_create() races
> with mb_cache_delete(), but that will result in an entry being on the
> c_list while not being on the hash list, and it doesn't cause the
> c_entry_count to get out of sync with reality.

So that is actually an interesting scenario. If mb_cache_entry_delete() for
(key, value) could happen at the same moment as mb_cache_entry_create() for
the same (key, value) pair, the whole thing could race like:

mb_cache_entry_create()			mb_cache_entry_delete()
  alloc and init 'entry'
  hlist_bl_lock(head);
  search hash, found nothing
  hlist_bl_add_head(&entry->e_hash_list, head);
  hlist_bl_unlock(head);
					hlist_bl_lock(head);
					  search hash, found 'entry'
					  hlist_bl_del_init(&entry->e_hash_list);
					  hlist_bl_unlock(head);
					  spin_lock(&cache->c_list_lock);
					  if (!list_empty(&entry->e_list))
					    false
					  spin_unlock(&cache->c_list_lock);
					  mb_cache_entry_put(cache, entry);
					    - drops last reference, entry
					      gets freed
  spin_lock(&cache->c_list_lock);
  list_add_tail(&entry->e_list, &cache->c_list);
  atomic_inc(&entry->e_refcnt);
  ...
  - and we have added freed entry to LRU, from this point on anything could
    happen.

Now I don't see how this could really happen with the way how ext4 uses
mbache as we call mb_cache_entry_delete() only when xattr block refcount
would drop to 0 (i.e., the last inode referencing the block deletes its
xattr) and then there's nobody to try to insert the same block into the
cache at the same time (xattr_sem protects this) but then ext4 locking in
this area is hairy enough that I could be missing something.

This is relatively easy for mbcache to tolerate but first I'd like to know
whether ext4 indeed can trigger the behavior as that might indicate ext4
bug which would be just hidden by the mbcache fix.

Jiang, can you please run with the attached patch and see whether the
WARN_ON triggers before the entry count goes wrong? Thanks!

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

[-- Attachment #2: 0001-mbcache-WARN-if-entry-was-already-freed-when-adding-.patch --]
[-- Type: text/x-patch, Size: 866 bytes --]

>From 920a6679143d0a00673c7452a4352fbf74859474 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Thu, 11 Jan 2018 09:59:34 +0100
Subject: [PATCH] mbcache: WARN if entry was already freed when adding to LRU
 list

DO NOT MERGE THIS! This is just a debug patch.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/mbcache.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/mbcache.c b/fs/mbcache.c
index b8b8b9ced9f8..63b039c99ec8 100644
--- a/fs/mbcache.c
+++ b/fs/mbcache.c
@@ -109,7 +109,7 @@ int mb_cache_entry_create(struct mb_cache *cache, gfp_t mask, u32 key,
 	spin_lock(&cache->c_list_lock);
 	list_add_tail(&entry->e_list, &cache->c_list);
 	/* Grab ref for LRU list */
-	atomic_inc(&entry->e_refcnt);
+	WARN_ON(!atomic_inc_not_zero(&entry->e_refcnt));
 	cache->c_entry_count++;
 	spin_unlock(&cache->c_list_lock);
 
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] fs/mbcache: make sure mb_cache_count() not return negative value.
  2018-01-08 23:38 [PATCH v2] fs/mbcache: make sure mb_cache_count() not return negative value Jiang Biao
  2018-01-09  0:13 ` Andrew Morton
@ 2018-01-10  4:58 ` Theodore Ts'o
  1 sibling, 0 replies; 7+ messages in thread
From: Theodore Ts'o @ 2018-01-10  4:58 UTC (permalink / raw)
  To: Jiang Biao; +Cc: linux-fsdevel, linux-ext4, ebiggers, akpm, jack, zhong.weidong

I think I've found the cause of it, but having a sanity check is a
good idea.  I've simplified the patch and its description, though.
This is what I have in my tree.

     	    	 	    	      - Ted

commit 252194e48f00d146de303822bba8c3568ca127cd
Author: Jiang Biao <jiang.biao2@zte.com.cn>
Date:   Tue Jan 9 23:57:52 2018 -0500

    mbcache: make sure c_entry_count is not decremented past zero
    
    Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn>
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    CC: Eric Biggers <ebiggers@google.com>
    CC: Andrew Morton <akpm@linux-foundation.org>
    CC: Jan Kara <jack@suse.cz>

diff --git a/fs/mbcache.c b/fs/mbcache.c
index 0851af5c1c3d..f2f15b747bed 100644
--- a/fs/mbcache.c
+++ b/fs/mbcache.c
@@ -239,7 +239,9 @@ void mb_cache_entry_delete(struct mb_cache *cache, u32 key, u64 value)
 			spin_lock(&cache->c_list_lock);
 			if (!list_empty(&entry->e_list)) {
 				list_del_init(&entry->e_list);
-				cache->c_entry_count--;
+				if (!WARN_ONCE(cache->c_entry_count == 0,
+		"mbcache: attempt to decrement c_entry_count past zero"))
+					cache->c_entry_count--;
 				atomic_dec(&entry->e_refcnt);
 			}
 			spin_unlock(&cache->c_list_lock);

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-01-11  9:04 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-08 23:38 [PATCH v2] fs/mbcache: make sure mb_cache_count() not return negative value Jiang Biao
2018-01-09  0:13 ` Andrew Morton
2018-01-10  4:26   ` Theodore Ts'o
2018-01-10 15:02     ` Jan Kara
2018-01-10 20:11       ` Theodore Ts'o
2018-01-11  9:04         ` Jan Kara
2018-01-10  4:58 ` Theodore Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).