linux-mtd.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] UBIFS: Remove incorrect assertion in shrink_tnc()
@ 2014-05-31  3:39 hujianyang
  2014-05-31  7:47 ` hujianyang
  2014-06-02  8:30 ` Artem Bityutskiy
  0 siblings, 2 replies; 5+ messages in thread
From: hujianyang @ 2014-05-31  3:39 UTC (permalink / raw)
  To: Artem Bityutskiy, Dolev Raviv; +Cc: linux-mtd

I hit the same assert failed as Dolev Raviv reported in Kernel v3.10
shows like this:

[ 9641.164028] UBIFS assert failed in shrink_tnc at 131 (pid 13297)
[ 9641.234078] CPU: 1 PID: 13297 Comm: mmap.test Tainted: G           O 3.10.40 #1
[ 9641.234116] [<c0011a6c>] (unwind_backtrace+0x0/0x12c) from [<c000d0b0>] (show_stack+0x20/0x24)
[ 9641.234137] [<c000d0b0>] (show_stack+0x20/0x24) from [<c0311134>] (dump_stack+0x20/0x28)
[ 9641.234188] [<c0311134>] (dump_stack+0x20/0x28) from [<bf22425c>] (shrink_tnc_trees+0x25c/0x350 [ubifs])
[ 9641.234265] [<bf22425c>] (shrink_tnc_trees+0x25c/0x350 [ubifs]) from [<bf2245ac>] (ubifs_shrinker+0x25c/0x310 [ubifs])
[ 9641.234307] [<bf2245ac>] (ubifs_shrinker+0x25c/0x310 [ubifs]) from [<c00cdad8>] (shrink_slab+0x1d4/0x2f8)
[ 9641.234327] [<c00cdad8>] (shrink_slab+0x1d4/0x2f8) from [<c00d03d0>] (do_try_to_free_pages+0x300/0x544)
[ 9641.234344] [<c00d03d0>] (do_try_to_free_pages+0x300/0x544) from [<c00d0a44>] (try_to_free_pages+0x2d0/0x398)
[ 9641.234363] [<c00d0a44>] (try_to_free_pages+0x2d0/0x398) from [<c00c6a60>] (__alloc_pages_nodemask+0x494/0x7e8)
[ 9641.234382] [<c00c6a60>] (__alloc_pages_nodemask+0x494/0x7e8) from [<c00f62d8>] (new_slab+0x78/0x238)
[ 9641.234400] [<c00f62d8>] (new_slab+0x78/0x238) from [<c031081c>] (__slab_alloc.constprop.42+0x1a4/0x50c)
[ 9641.234419] [<c031081c>] (__slab_alloc.constprop.42+0x1a4/0x50c) from [<c00f80e8>] (kmem_cache_alloc_trace+0x54/0x188)
[ 9641.234459] [<c00f80e8>] (kmem_cache_alloc_trace+0x54/0x188) from [<bf227908>] (do_readpage+0x168/0x468 [ubifs])
[ 9641.234553] [<bf227908>] (do_readpage+0x168/0x468 [ubifs]) from [<bf2296a0>] (ubifs_readpage+0x424/0x464 [ubifs])
[ 9641.234606] [<bf2296a0>] (ubifs_readpage+0x424/0x464 [ubifs]) from [<c00c17c0>] (filemap_fault+0x304/0x418)
[ 9641.234638] [<c00c17c0>] (filemap_fault+0x304/0x418) from [<c00de694>] (__do_fault+0xd4/0x530)
[ 9641.234665] [<c00de694>] (__do_fault+0xd4/0x530) from [<c00e10c0>] (handle_pte_fault+0x480/0xf54)
[ 9641.234690] [<c00e10c0>] (handle_pte_fault+0x480/0xf54) from [<c00e2bf8>] (handle_mm_fault+0x140/0x184)
[ 9641.234716] [<c00e2bf8>] (handle_mm_fault+0x140/0x184) from [<c0316688>] (do_page_fault+0x150/0x3ac)
[ 9641.234737] [<c0316688>] (do_page_fault+0x150/0x3ac) from [<c000842c>] (do_DataAbort+0x3c/0xa0)
[ 9641.234759] [<c000842c>] (do_DataAbort+0x3c/0xa0) from [<c0314e38>] (__dabt_usr+0x38/0x40)

After analyzing the code, I found a condition that may cause this failed
in correct operations. Thus, I think this assertion is wrong and should be
removed.

Suppose there are two clean znodes and one dirty znode in TNC. So the
per-filesystem atomic_t @clean_zn_cnt is (2). If commit start, dirty_znode
is set to COW_ZNODE in get_znodes_to_commit() in case of potentially ops
on this znode. We clear COW bit and DIRTY bit in write_index() without
@tnc_mutex locked. We don't increase @clean_zn_cnt in this place. As the
comments in write_index() shows, if another process hold @tnc_mutex and
dirty this znode after we clean it, @clean_zn_cnt would be decreased to (1).
We will increase @clean_zn_cnt to (2) with @tnc_mutex locked in
free_obsolete_znodes() to keep it right.

If shrink_tnc() performs between decrease and increase, it will release
other 2 clean znodes it holds and found @clean_zn_cnt is less than zero
(1 - 2 = -1), then hit the assertion. Because free_obsolete_znodes() will
soon correct @clean_zn_cnt and no harm to fs in this case, I think this
assertion could be removed.


2 clean zondes and 1 dirty znode, @clean_zn_cnt == 2

Thread A (commit)         Thread B (write or others)       Thread C (shrinker)
->write_index
   ->clear_bit(DIRTY_NODE)
   ->clear_bit(COW_ZNODE)

            @clean_zn_cnt == 2
                          ->mutex_locked(&tnc_mutex)
                          ->dirty_cow_znode
                              ->!ubifs_zn_cow(znode)
                              ->!test_and_set_bit(DIRTY_NODE)
                              ->atomic_dec(&clean_zn_cnt)
                          ->mutex_unlocked(&tnc_mutex)

            @clean_zn_cnt == 1
                                                           ->mutex_locked(&tnc_mutex)
                                                           ->shrink_tnc
                                                             ->destroy_tnc_subtree
                                                             ->atomic_sub(&clean_zn_cnt, 2)
                                                             ->ubifs_assert  <- hit
                                                           ->mutex_unlocked(&tnc_mutex)

            @clean_zn_cnt == -1
->mutex_lock(&tnc_mutex)
->free_obsolete_znodes
   ->atomic_inc(&clean_zn_cnt)
->mutux_unlock(&tnc_mutex)

            @clean_zn_cnt == 0 (correct after shrink)


Signed-off-by: hujianyang <hujianyang@huawei.com>
---
 fs/ubifs/shrinker.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/ubifs/shrinker.c b/fs/ubifs/shrinker.c
index f35135e..9a9fb94 100644
--- a/fs/ubifs/shrinker.c
+++ b/fs/ubifs/shrinker.c
@@ -128,7 +128,6 @@ static int shrink_tnc(struct ubifs_info *c, int nr, int age, int *contention)
 			freed = ubifs_destroy_tnc_subtree(znode);
 			atomic_long_sub(freed, &ubifs_clean_zn_cnt);
 			atomic_long_sub(freed, &c->clean_zn_cnt);
-			ubifs_assert(atomic_long_read(&c->clean_zn_cnt) >= 0);
 			total_freed += freed;
 			znode = zprev;
 		}
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] UBIFS: Remove incorrect assertion in shrink_tnc()
  2014-05-31  3:39 [PATCH] UBIFS: Remove incorrect assertion in shrink_tnc() hujianyang
@ 2014-05-31  7:47 ` hujianyang
  2014-06-02  8:30   ` Artem Bityutskiy
  2014-06-02  8:30 ` Artem Bityutskiy
  1 sibling, 1 reply; 5+ messages in thread
From: hujianyang @ 2014-05-31  7:47 UTC (permalink / raw)
  To: Artem Bityutskiy; +Cc: Dolev Raviv, linux-mtd

Hi Artem,

In previous mail, I introduce a condition I found that may hit assert
failed in shrink_tnc(). So I think this assertion is useless. But I
don't know if there are any leaks in @c->clean_zn_cnt. Although I've
done some tests and found everything is OK, I'm still not sure of that.

As you said, there might be some problems in TNC sub-sys. I would like
to add a new ubifs_assert in ubifs_tnc_close().

Patch 83707237 solves a problem if one per-fs @c->clean_zn_cnt is wrong,
it will not harm the global @ubifs_clean_zn_cnt to ensure other ubifs
partitions work well. Suppose there is only one ubifs partition, because
we have decreased global @ubifs_clean_zn_cnt with @c->clean_zn_cnt in
ubifs_tnc_close(), assertion in ubifs_exit() will not be hitted whether
@c->clean_zn_cnt is wrong or not.

diff --git a/fs/ubifs/tnc.c b/fs/ubifs/tnc.c
index 9083bc7..61914f0 100644
--- a/fs/ubifs/tnc.c
+++ b/fs/ubifs/tnc.c
@@ -2859,10 +2859,11 @@ void ubifs_tnc_close(struct ubifs_info *c)
 {
 	tnc_destroy_cnext(c);
 	if (c->zroot.znode) {
-		long n;
+		long n, freed;

-		ubifs_destroy_tnc_subtree(c->zroot.znode);
+		freed = ubifs_destroy_tnc_subtree(c->zroot.znode);
 		n = atomic_long_read(&c->clean_zn_cnt);
+		ubifs_assert(freed == n);
 		atomic_long_sub(n, &ubifs_clean_zn_cnt);
 	}
 	kfree(c->gap_lebs);


How about adding a new ubifs_assert like this? We can know if there are
any leaks in @c->clean_cn_cnt by it but still not know where it happens. I
think if we are confident of TNC sub-sys, this seems not bad.

What's your opinions? If you are not conflict with it. I would like to add
this to the previous patch and resend it.

Thanks!

Hu

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] UBIFS: Remove incorrect assertion in shrink_tnc()
  2014-05-31  3:39 [PATCH] UBIFS: Remove incorrect assertion in shrink_tnc() hujianyang
  2014-05-31  7:47 ` hujianyang
@ 2014-06-02  8:30 ` Artem Bityutskiy
  2014-06-02 14:14   ` Dolev Raviv
  1 sibling, 1 reply; 5+ messages in thread
From: Artem Bityutskiy @ 2014-06-02  8:30 UTC (permalink / raw)
  To: hujianyang; +Cc: Dolev Raviv, linux-mtd

On Sat, 2014-05-31 at 11:39 +0800, hujianyang wrote:
> I hit the same assert failed as Dolev Raviv reported in Kernel v3.10
> shows like this:

...

> Signed-off-by: hujianyang <hujianyang@huawei.com>

hujianyang,

looks like you got it right, thanks for your excellent analysis.

I think we want to include this patch to the stable tree, so I add

Cc: stable@vger.kernel.org

I've pushed it to the master branch of the UBIFS tree, thanks!

Also, having a

Tested-by: Dolev Raviv <draviv@codeaurora.org>

if possible, would increase our confidence in sending this patch to the
stable tree.

Thank you!

-- 
Best Regards,
Artem Bityutskiy

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] UBIFS: Remove incorrect assertion in shrink_tnc()
  2014-05-31  7:47 ` hujianyang
@ 2014-06-02  8:30   ` Artem Bityutskiy
  0 siblings, 0 replies; 5+ messages in thread
From: Artem Bityutskiy @ 2014-06-02  8:30 UTC (permalink / raw)
  To: hujianyang; +Cc: Dolev Raviv, linux-mtd

On Sat, 2014-05-31 at 15:47 +0800, hujianyang wrote:
> Hi Artem,
> 
> In previous mail, I introduce a condition I found that may hit assert
> failed in shrink_tnc(). So I think this assertion is useless. But I
> don't know if there are any leaks in @c->clean_zn_cnt. Although I've
> done some tests and found everything is OK, I'm still not sure of that.

Yes, this sounds like a great idea. Could you please send a proper patch
containing this change (S-o-b, etc). Many thanks!

-- 
Best Regards,
Artem Bityutskiy

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] UBIFS: Remove incorrect assertion in shrink_tnc()
  2014-06-02  8:30 ` Artem Bityutskiy
@ 2014-06-02 14:14   ` Dolev Raviv
  0 siblings, 0 replies; 5+ messages in thread
From: Dolev Raviv @ 2014-06-02 14:14 UTC (permalink / raw)
  To: dedekind1; +Cc: Dolev Raviv, linux-mtd, hujianyang


> On Sat, 2014-05-31 at 11:39 +0800, hujianyang wrote:
>> I hit the same assert failed as Dolev Raviv reported in Kernel v3.10
>> shows like this:
>
> ...
>
>> Signed-off-by: hujianyang <hujianyang@huawei.com>
>
> hujianyang,
>
> looks like you got it right, thanks for your excellent analysis.
>
> I think we want to include this patch to the stable tree, so I add
>
> Cc: stable@vger.kernel.org
>
> I've pushed it to the master branch of the UBIFS tree, thanks!
>
> Also, having a
>
> Tested-by: Dolev Raviv <draviv@codeaurora.org>
>
I'm actually testing it as we speak :)

> if possible, would increase our confidence in sending this patch to the
> stable tree.
>
> Thank you!
>
> --
> Best Regards,
> Artem Bityutskiy
>
>


-- 
QUALCOMM ISRAEL, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-06-02 14:15 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-31  3:39 [PATCH] UBIFS: Remove incorrect assertion in shrink_tnc() hujianyang
2014-05-31  7:47 ` hujianyang
2014-06-02  8:30   ` Artem Bityutskiy
2014-06-02  8:30 ` Artem Bityutskiy
2014-06-02 14:14   ` Dolev Raviv

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).