[PATCH 1/3] jffs2: fix unbalanced locking

linux-mtd.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/3] jffs2: fix unbalanced locking
@ 2014-02-08  2:14 Li Zefan
  2014-02-08  2:15 ` [PATCH 2/3] jffs2: avoid soft-lockup in jffs2_reserve_space_gc() Li Zefan
  2014-02-08  2:16 ` [PATCH 3/3] jffs2: remove wait queue after schedule() Li Zefan
  0 siblings, 2 replies; 6+ messages in thread
From: Li Zefan @ 2014-02-08  2:14 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mtd, dwmw2, LKML

This was found by our internal debugging feature on runtime, but this
bug won't lead to deadlock, as the structure that this lock is embedded
in is freed on error.

Cc: <stable@vger.kernel.org>
Signed-off-by: Li Zefan <lizefan@huawei.com>
---
 fs/jffs2/readinode.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/jffs2/readinode.c b/fs/jffs2/readinode.c
index 386303d..8261021 100644
--- a/fs/jffs2/readinode.c
+++ b/fs/jffs2/readinode.c
@@ -1143,6 +1143,7 @@ static int jffs2_do_read_inode_internal(struct jffs2_sb_info *c,
 		JFFS2_ERROR("cannot read nodes for ino %u, returned error is %d\n", f->inocache->ino, ret);
 		if (f->inocache->state == INO_STATE_READING)
 			jffs2_set_inocache_state(c, f->inocache, INO_STATE_CHECKEDABSENT);
+		mutex_unlock(&f->sem);
 		return ret;
 	}
 
@@ -1159,6 +1160,7 @@ static int jffs2_do_read_inode_internal(struct jffs2_sb_info *c,
 			jffs2_free_tmp_dnode_info(rii.mdata_tn);
 			rii.mdata_tn = NULL;
 		}
+		mutex_unlock(&f->sem);
 		return ret;
 	}
 
@@ -1183,6 +1185,7 @@ static int jffs2_do_read_inode_internal(struct jffs2_sb_info *c,
 			if (!rii.fds) {
 				if (f->inocache->state == INO_STATE_READING)
 					jffs2_set_inocache_state(c, f->inocache, INO_STATE_CHECKEDABSENT);
+				mutex_unlock(&f->sem);
 				return -EIO;
 			}
 			JFFS2_NOTICE("but it has children so we fake some modes for it\n");
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/3] jffs2: avoid soft-lockup in jffs2_reserve_space_gc()
  2014-02-08  2:14 [PATCH 1/3] jffs2: fix unbalanced locking Li Zefan
@ 2014-02-08  2:15 ` Li Zefan
  2014-02-11 23:54   ` Andrew Morton
  2014-02-08  2:16 ` [PATCH 3/3] jffs2: remove wait queue after schedule() Li Zefan
  1 sibling, 1 reply; 6+ messages in thread
From: Li Zefan @ 2014-02-08  2:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mtd, dwmw2, LKML

We triggered soft-lockup under stress test on 2.6.34 kernel.

BUG: soft lockup - CPU#1 stuck for 60009ms! [lockf2.test:14488]
...
[<bf09a4d4>] (jffs2_do_reserve_space+0x420/0x440 [jffs2])
[<bf09a528>] (jffs2_reserve_space_gc+0x34/0x78 [jffs2])
[<bf0a1350>] (jffs2_garbage_collect_dnode.isra.3+0x264/0x478 [jffs2])
[<bf0a2078>] (jffs2_garbage_collect_pass+0x9c0/0xe4c [jffs2])
[<bf09a670>] (jffs2_reserve_space+0x104/0x2a8 [jffs2])
[<bf09dc48>] (jffs2_write_inode_range+0x5c/0x4d4 [jffs2])
[<bf097d8c>] (jffs2_write_end+0x198/0x2c0 [jffs2])
[<c00e00a4>] (generic_file_buffered_write+0x158/0x200)
[<c00e14f4>] (__generic_file_aio_write+0x3a4/0x414)
[<c00e15c0>] (generic_file_aio_write+0x5c/0xbc)
[<c012334c>] (do_sync_write+0x98/0xd4)
[<c0123a84>] (vfs_write+0xa8/0x150)
[<c0123d74>] (sys_write+0x3c/0xc0)]

Fix this by adding a cond_resched() in the while loop.

Cc: <stable@vger.kernel.org>
Signed-off-by: Li Zefan <lizefan@huawei.com>
---
 fs/jffs2/nodemgmt.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/fs/jffs2/nodemgmt.c b/fs/jffs2/nodemgmt.c
index 0331072..fb30161 100644
--- a/fs/jffs2/nodemgmt.c
+++ b/fs/jffs2/nodemgmt.c
@@ -216,15 +216,20 @@ int jffs2_reserve_space_gc(struct jffs2_sb_info *c, uint32_t minsize,
 
 	jffs2_dbg(1, "%s(): Requested 0x%x bytes\n", __func__, minsize);
 
-	spin_lock(&c->erase_completion_lock);
-	while(ret == -EAGAIN) {
+	while (true) {
+		spin_lock(&c->erase_completion_lock);
 		ret = jffs2_do_reserve_space(c, minsize, len, sumsize);
 		if (ret) {
 			jffs2_dbg(1, "%s(): looping, ret is %d\n",
 				  __func__, ret);
 		}
+		spin_unlock(&c->erase_completion_lock);
+
+		if (ret == -EAGAIN)
+			cond_resched();
+		else
+			break;
 	}
-	spin_unlock(&c->erase_completion_lock);
 	if (!ret)
 		ret = jffs2_prealloc_raw_node_refs(c, c->nextblock, 1);
 
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/3] jffs2: remove wait queue after schedule()
  2014-02-08  2:14 [PATCH 1/3] jffs2: fix unbalanced locking Li Zefan
  2014-02-08  2:15 ` [PATCH 2/3] jffs2: avoid soft-lockup in jffs2_reserve_space_gc() Li Zefan
@ 2014-02-08  2:16 ` Li Zefan
  1 sibling, 0 replies; 6+ messages in thread
From: Li Zefan @ 2014-02-08  2:16 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mtd, dwmw2, LKML

@wait is a local variable, so if we don't remove it from the wait queue
list, later wake_up() may end up accessing invalid memory.

This was spotted by eyes.

Cc: <stable@vger.kernel.org>
Signed-off-by: Li Zefan <lizefan@huawei.com>
---
 fs/jffs2/nodemgmt.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/jffs2/nodemgmt.c b/fs/jffs2/nodemgmt.c
index fb30161..610a22c 100644
--- a/fs/jffs2/nodemgmt.c
+++ b/fs/jffs2/nodemgmt.c
@@ -179,6 +179,7 @@ int jffs2_reserve_space(struct jffs2_sb_info *c, uint32_t minsize,
 					spin_unlock(&c->erase_completion_lock);
 
 					schedule();
+					remove_wait_queue(&c->erase_wait, &wait);
 				} else
 					spin_unlock(&c->erase_completion_lock);
 			} else if (ret)
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/3] jffs2: avoid soft-lockup in jffs2_reserve_space_gc()
  2014-02-08  2:15 ` [PATCH 2/3] jffs2: avoid soft-lockup in jffs2_reserve_space_gc() Li Zefan
@ 2014-02-11 23:54   ` Andrew Morton
  2014-02-12  1:00     ` Brian Norris
  2014-02-12  1:42     ` Li Zefan
  0 siblings, 2 replies; 6+ messages in thread
From: Andrew Morton @ 2014-02-11 23:54 UTC (permalink / raw)
  To: Li Zefan; +Cc: linux-mtd, dwmw2, LKML

On Sat, 8 Feb 2014 10:15:39 +0800 Li Zefan <lizefan@huawei.com> wrote:

> We triggered soft-lockup under stress test on 2.6.34 kernel.
> 
> BUG: soft lockup - CPU#1 stuck for 60009ms! [lockf2.test:14488]
> ...
> [<bf09a4d4>] (jffs2_do_reserve_space+0x420/0x440 [jffs2])
> [<bf09a528>] (jffs2_reserve_space_gc+0x34/0x78 [jffs2])
> [<bf0a1350>] (jffs2_garbage_collect_dnode.isra.3+0x264/0x478 [jffs2])
> [<bf0a2078>] (jffs2_garbage_collect_pass+0x9c0/0xe4c [jffs2])
> [<bf09a670>] (jffs2_reserve_space+0x104/0x2a8 [jffs2])
> [<bf09dc48>] (jffs2_write_inode_range+0x5c/0x4d4 [jffs2])
> [<bf097d8c>] (jffs2_write_end+0x198/0x2c0 [jffs2])
> [<c00e00a4>] (generic_file_buffered_write+0x158/0x200)
> [<c00e14f4>] (__generic_file_aio_write+0x3a4/0x414)
> [<c00e15c0>] (generic_file_aio_write+0x5c/0xbc)
> [<c012334c>] (do_sync_write+0x98/0xd4)
> [<c0123a84>] (vfs_write+0xa8/0x150)
> [<c0123d74>] (sys_write+0x3c/0xc0)]
> 
> Fix this by adding a cond_resched() in the while loop.
> 
> ...
>
> --- a/fs/jffs2/nodemgmt.c
> +++ b/fs/jffs2/nodemgmt.c
> @@ -216,15 +216,20 @@ int jffs2_reserve_space_gc(struct jffs2_sb_info *c, uint32_t minsize,
>  
>  	jffs2_dbg(1, "%s(): Requested 0x%x bytes\n", __func__, minsize);
>  
> -	spin_lock(&c->erase_completion_lock);
> -	while(ret == -EAGAIN) {
> +	while (true) {
> +		spin_lock(&c->erase_completion_lock);
>  		ret = jffs2_do_reserve_space(c, minsize, len, sumsize);
>  		if (ret) {
>  			jffs2_dbg(1, "%s(): looping, ret is %d\n",
>  				  __func__, ret);
>  		}
> +		spin_unlock(&c->erase_completion_lock);
> +
> +		if (ret == -EAGAIN)
> +			cond_resched();
> +		else
> +			break;
>  	}
> -	spin_unlock(&c->erase_completion_lock);
>  	if (!ret)
>  		ret = jffs2_prealloc_raw_node_refs(c, c->nextblock, 1);

Looks OK.  We can do this:

--- a/fs/jffs2/nodemgmt.c~jffs2-avoid-soft-lockup-in-jffs2_reserve_space_gc-fix
+++ a/fs/jffs2/nodemgmt.c
@@ -211,7 +211,7 @@ out:
 int jffs2_reserve_space_gc(struct jffs2_sb_info *c, uint32_t minsize,
 			   uint32_t *len, uint32_t sumsize)
 {
-	int ret = -EAGAIN;
+	int ret;
 	minsize = PAD(minsize);
 
 	jffs2_dbg(1, "%s(): Requested 0x%x bytes\n", __func__, minsize);
_


I now have four jffs2 bugfixes but cannot unload them on anyone. 
Waddup?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/3] jffs2: avoid soft-lockup in jffs2_reserve_space_gc()
  2014-02-11 23:54   ` Andrew Morton
@ 2014-02-12  1:00     ` Brian Norris
  2014-02-12  1:42     ` Li Zefan
  1 sibling, 0 replies; 6+ messages in thread
From: Brian Norris @ 2014-02-12  1:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Woodhouse, Li Zefan, linux-mtd@lists.infradead.org, LKML,
	Artem Bityutskiy

Hi Andrew,

On Tue, Feb 11, 2014 at 3:54 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Sat, 8 Feb 2014 10:15:39 +0800 Li Zefan <lizefan@huawei.com> wrote:
> I now have four jffs2 bugfixes but cannot unload them on anyone.
> Waddup?

Well, at best we have 3 "maintainers" involved in MTD (David, Artem,
and me), but David is often quite unresponsive unless you yell, and
Artem has more or less left non-UBI/UBIFS stuff to me. I personally
have little knowledge of JFFS2, and I have seen a fair number of
dubious JFFS2 patches from people with automated tools and no testing.
So I'm understandably cautious to merge them. But if you have properly
tested (or at least reviewed) patches sitting around, I can take a
look at them and merge them. Are you referring to the top 4 here?

http://www.spinics.net/linux/lists/mm-commits/

Brian

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/3] jffs2: avoid soft-lockup in jffs2_reserve_space_gc()
  2014-02-11 23:54   ` Andrew Morton
  2014-02-12  1:00     ` Brian Norris
@ 2014-02-12  1:42     ` Li Zefan
  1 sibling, 0 replies; 6+ messages in thread
From: Li Zefan @ 2014-02-12  1:42 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mtd, dwmw2, LKML

>> --- a/fs/jffs2/nodemgmt.c
>> +++ b/fs/jffs2/nodemgmt.c
>> @@ -216,15 +216,20 @@ int jffs2_reserve_space_gc(struct jffs2_sb_info *c, uint32_t minsize,
>>  
>>  	jffs2_dbg(1, "%s(): Requested 0x%x bytes\n", __func__, minsize);
>>  
>> -	spin_lock(&c->erase_completion_lock);
>> -	while(ret == -EAGAIN) {
>> +	while (true) {
>> +		spin_lock(&c->erase_completion_lock);
>>  		ret = jffs2_do_reserve_space(c, minsize, len, sumsize);
>>  		if (ret) {
>>  			jffs2_dbg(1, "%s(): looping, ret is %d\n",
>>  				  __func__, ret);
>>  		}
>> +		spin_unlock(&c->erase_completion_lock);
>> +
>> +		if (ret == -EAGAIN)
>> +			cond_resched();
>> +		else
>> +			break;
>>  	}
>> -	spin_unlock(&c->erase_completion_lock);
>>  	if (!ret)
>>  		ret = jffs2_prealloc_raw_node_refs(c, c->nextblock, 1);
> 
> Looks OK.  We can do this:
> 

Yeah, thanks for the cleanup.

> --- a/fs/jffs2/nodemgmt.c~jffs2-avoid-soft-lockup-in-jffs2_reserve_space_gc-fix
> +++ a/fs/jffs2/nodemgmt.c
> @@ -211,7 +211,7 @@ out:
>  int jffs2_reserve_space_gc(struct jffs2_sb_info *c, uint32_t minsize,
>  			   uint32_t *len, uint32_t sumsize)
>  {
> -	int ret = -EAGAIN;
> +	int ret;
>  	minsize = PAD(minsize);
>  
>  	jffs2_dbg(1, "%s(): Requested 0x%x bytes\n", __func__, minsize);
> _

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-02-12  1:43 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-08  2:14 [PATCH 1/3] jffs2: fix unbalanced locking Li Zefan
2014-02-08  2:15 ` [PATCH 2/3] jffs2: avoid soft-lockup in jffs2_reserve_space_gc() Li Zefan
2014-02-11 23:54   ` Andrew Morton
2014-02-12  1:00     ` Brian Norris
2014-02-12  1:42     ` Li Zefan
2014-02-08  2:16 ` [PATCH 3/3] jffs2: remove wait queue after schedule() Li Zefan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).