JFFS2 deadlock with alloc

linux-mtd.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* JFFS2 deadlock with alloc_sem
@ 2007-04-30 19:41 Roberts Nathan-mcg31137
  2007-05-05  8:23 ` David Woodhouse
  2007-06-02 17:42 ` David Woodhouse
  0 siblings, 2 replies; 23+ messages in thread
From: Roberts Nathan-mcg31137 @ 2007-04-30 19:41 UTC (permalink / raw)
  To: linux-mtd

Has anyone seen this deadlock before? It seems to be a classic deadlock 
situation so I'm not sure if maybe I'm misinterpreting things or 
the use case (several postmark tests running in parallel on a
preemptible
kernel) is especially vulnerable. 

If the lock at NOTE2 is aquired followed by the lock at NOTE4, it seems 
like deadlock is inevitable. 

Kernel is 2.6.10 based. (although I've looked at 2.6.18 sources and it 
appears that the scenario is possible there as well.)

Kernel preemption is enabled. 

Flash is large page NAND (we've also got it to happen on small page NAND

but it's more difficult). 

It's not always the GC thread which is the other party.  We've also 
gotten other callers of garbage_collect_pass() to end up in the same 
situation.

Thread1
--
[<c023fd60>] (__schedule+0x0/0x5b0) from [<c024047c>]
(schedule+0xec/0x124)
[<c0240390>] (schedule+0x0/0x124) from [<c023f930>]
(__compat_down+0xe0/0x178)
 r4 = C0EA4000 
[<c023f850>] (__compat_down+0x0/0x178) from [<c023f7ec>]
(__compat_down_failed+0xc/0x20)
 r8 = C001462C  r7 = C2B7C2E8  r6 = C2B7C2E8  r5 = C0014600
 r4 = C2B7C30C 
[<c00d5b60>] (jffs2_reserve_space+0x0/0x268) from [<c00d80b4>]
(jffs2_write_inode_range+0x5c/0x468)
NOTE1: jffs2_reserve_space() releases and reaquires alloc_sem

[<c00d8058>] (jffs2_write_inode_range+0x0/0x468) from [<c00d30ac>]
(jffs2_commit_write+0x1b0/0x31c)
[<c00d2efc>] (jffs2_commit_write+0x0/0x31c) from [<c00638d8>]
(generic_file_buffered_write+0x3e4/0x64c)
[<c00634f4>] (generic_file_buffered_write+0x0/0x64c) from [<c00641c8>]
(__generic_file_aio_write_nolock+0x488/0x4b4)
NOTE2: generic_file_buffered_write() gets page_lock

[<c0063d40>] (__generic_file_aio_write_nolock+0x0/0x4b4) from
[<c0064274>] (__generic_file_write_nolock+0x80/0xac)
[<c00641f4>] (__generic_file_write_nolock+0x0/0xac) from [<c00643c4>]
(generic_file_write+0x58/0xdc)
 r9 = C0EA4000  r8 = C1CC3A00  r6 = 40030008  r5 = C1CC3A9C
 r4 = C1CC3A6C 
[<c006436c>] (generic_file_write+0x0/0xdc) from [<c007eeb4>]
(vfs_write+0xec/0x170)
[<c007edc8>] (vfs_write+0x0/0x170) from [<c007eff4>]
(sys_write+0x48/0x74)
 r8 = C0025154  r7 = 00000004  r6 = C26E7960  r5 = 00000000
 r4 = 0002D000 
[<c007efac>] (sys_write+0x0/0x74) from [<c00249a0>]
(ret_fast_syscall+0x0/0x34)
 r6 = 00000004  r5 = 0002D000  r4 = 0008A0FF 


Thread2
--
[<c023fd60>] (__schedule+0x0/0x5b0) from [<c024047c>]
(schedule+0xec/0x124)
[<c0240390>] (schedule+0x0/0x124) from [<c0241040>]
(io_schedule+0x2c/0x48)
 r4 = C02E01A8 
[<c0241014>] (io_schedule+0x0/0x48) from [<c0060bbc>]
(sync_page+0x40/0x48)
 r5 = 00000000  r4 = C34F7CF8 
[<c0060b7c>] (sync_page+0x0/0x48) from [<c024140c>]
(__wait_on_bit_lock+0x54/0x88)
[<c02413b8>] (__wait_on_bit_lock+0x0/0x88) from [<c00614a4>]
(__lock_page+0x88/0x98)
[<c006141c>] (__lock_page+0x0/0x98) from [<c006301c>]
(read_cache_page+0x21c/0x324)
 r5 = 00000000  r4 = C0336FE0 
[<c0062e00>] (read_cache_page+0x0/0x324) from [<c00df8f8>]
(jffs2_gc_fetch_page+0x2c/0x64)
[<c00df8cc>] (jffs2_gc_fetch_page+0x0/0x64) from [<c00dc664>]
(jffs2_garbage_collect_pass+0x14a4/0x1c28)
NOTE3: jffs2_gc_fetch_page() attempts to get page_lock

[<c00db1c0>] (jffs2_garbage_collect_pass+0x0/0x1c28) from [<c00de684>]
(jffs2_garbage_collect_thread+0x148/0x19c)
NOTE4: garbage_collect_pass() aquires alloc_sem

[<c00de53c>] (jffs2_garbage_collect_thread+0x0/0x19c) from [<c00444e8>]
(do_exit+0x0/0xd88)
 r7 = 00000000  r6 = 00000000  r5 = 00000000  r4 = 00000000

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: JFFS2 deadlock with alloc_sem
  2007-04-30 19:41 JFFS2 deadlock with alloc_sem Roberts Nathan-mcg31137
@ 2007-05-05  8:23 ` David Woodhouse
  2007-06-02 17:42 ` David Woodhouse
  1 sibling, 0 replies; 23+ messages in thread
From: David Woodhouse @ 2007-05-05  8:23 UTC (permalink / raw)
  To: Roberts Nathan-mcg31137; +Cc: linux-mtd

On Mon, 2007-04-30 at 15:41 -0400, Roberts Nathan-mcg31137 wrote:
> [<c006141c>] (__lock_page+0x0/0x98) from [<c006301c>] (read_cache_page+0x21c/0x324)
>  r5 = 00000000  r4 = C0336FE0 
> [<c0062e00>] (read_cache_page+0x0/0x324) from [<c00df8f8>] (jffs2_gc_fetch_page+0x2c/0x64)
> [<c00df8cc>] (jffs2_gc_fetch_page+0x0/0x64) from [<c00dc664>] (jffs2_garbage_collect_pass+0x14a4/0x1c28)
> NOTE3: jffs2_gc_fetch_page() attempts to get page_lock

This bit confuses me.

In jffs2_commit_write() we deliberately mark the page up to date, in
order to avoid this situation -- if the page is up to date,
read_cache_page() won't attempt to lock it.

(Note that we only set it up to date manually there if we're writing the
whole page. If we're writing less than a whole page, then
jffs2_prepare_write() will have read it and marked it up to date anyway.

So it's possible that read_cache_page() will try to lock a page when
called from jffs2_gc_fetch_page() -- but it _shouldn't_ be a page which
is already locked for writing. It should be a _different_ page.

Can you add a WARN_ON(!PageUptodate(pg)) into jffs2_commit_write(), just
before the call to jffs2_write_inode_range(). And/or otherwise try to
check which page each one is locking?

-- 
dwmw2

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: JFFS2 deadlock with alloc_sem
  2007-04-30 19:41 JFFS2 deadlock with alloc_sem Roberts Nathan-mcg31137
  2007-05-05  8:23 ` David Woodhouse
@ 2007-06-02 17:42 ` David Woodhouse
  2007-06-05 14:21   ` Roberts Nathan-mcg31137
  2007-06-07 14:29   ` Josh Boyer
  1 sibling, 2 replies; 23+ messages in thread
From: David Woodhouse @ 2007-06-02 17:42 UTC (permalink / raw)
  To: Roberts Nathan-mcg31137; +Cc: linux-mtd

On Mon, 2007-04-30 at 15:41 -0400, Roberts Nathan-mcg31137 wrote:
> Has anyone seen this deadlock before? It seems to be a classic deadlock 
> situation so I'm not sure if maybe I'm misinterpreting things or 
> the use case (several postmark tests running in parallel on a
> preemptible kernel) is especially vulnerable. 

I think Josh has spotted the real problem here. Does this help? If so,
as better fix will be forthcoming....

diff --git a/fs/jffs2/gc.c b/fs/jffs2/gc.c
index 2d99e06..1066120 100644
--- a/fs/jffs2/gc.c
+++ b/fs/jffs2/gc.c
@@ -1218,7 +1218,9 @@ static int jffs2_garbage_collect_dnode(struct jffs2_sb_info *c, struct jffs2_era
 	 *    page OK. We'll actually write it out again in commit_write, which is a little
 	 *    suboptimal, but at least we're correct.
 	 */
+	up(&f->sem);
 	pg_ptr = jffs2_gc_fetch_page(c, f, start, &pg);
+	down(&f->sem);
 
 	if (IS_ERR(pg_ptr)) {
 		printk(KERN_WARNING "read_cache_page() returned error: %ld\n", PTR_ERR(pg_ptr));

-- 
dwmw2

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* RE: JFFS2 deadlock with alloc_sem
  2007-06-02 17:42 ` David Woodhouse
@ 2007-06-05 14:21   ` Roberts Nathan-mcg31137
  2007-06-07 14:29   ` Josh Boyer
  1 sibling, 0 replies; 23+ messages in thread
From: Roberts Nathan-mcg31137 @ 2007-06-05 14:21 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-mtd

> > Has anyone seen this deadlock before? It seems to be a classic 
> > deadlock situation so I'm not sure if maybe I'm misinterpreting
things 
> > or the use case (several postmark tests running in parallel on a 
> > preemptible kernel) is especially vulnerable.
>
> I think Josh has spotted the real problem here. Does this help? If so,
as better fix will be forthcoming....

Thanks for the information. We tried the suggested modification and it
actually makes it much easier to reproduce the problem (so on one hand
it's a good thing.)

The steps we are using are:
1) Start postmark tests (usually 1 copy is sufficient but it is a little
easier to reproduce with multiple tests in parallel)
2) Wait for filesystem to get around 90% full 
3) Kill postmark (simple CTRL-C)
4) If filesystem is 100% full, delete postmark working directories
4) Restart postmark
5) goto 2

Normally within just a few iterations, we hang.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: JFFS2 deadlock with alloc_sem
  2007-06-02 17:42 ` David Woodhouse
  2007-06-05 14:21   ` Roberts Nathan-mcg31137
@ 2007-06-07 14:29   ` Josh Boyer
  1 sibling, 0 replies; 23+ messages in thread
From: Josh Boyer @ 2007-06-07 14:29 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Roberts Nathan-mcg31137, linux-mtd

On Sat, 2007-06-02 at 18:42 +0100, David Woodhouse wrote:
> On Mon, 2007-04-30 at 15:41 -0400, Roberts Nathan-mcg31137 wrote:
> > Has anyone seen this deadlock before? It seems to be a classic deadlock 
> > situation so I'm not sure if maybe I'm misinterpreting things or 
> > the use case (several postmark tests running in parallel on a
> > preemptible kernel) is especially vulnerable. 
> 
> I think Josh has spotted the real problem here. Does this help? If so,
> as better fix will be forthcoming....

Hm... so the deadlock I was discussing looks a bit different than the
one Nathan reported.  There, we were thinking that the race was between
page locking and the locking of f->sem by GC.  Below are the stack
traces:

PID: 75     TASK: c22b0000  CPU: 0   COMMAND: "jffs2_gcd_mtd17"
 #0 [c22b1dc0] crash_save_current_state at c00190d4
 #1 [c22b1e00] __lock_page at c0038be0
 #2 [c22b1e30] read_cache_page at c003b5dc
 #3 [c22b1e60] jffs2_garbage_collect_dnode at c00960fc
 #4 [c22b1f10] jffs2_garbage_collect_pass at c00951b0
 #5 [c22b1f50] jffs2_garbage_collect_thread at c0097994
 #6 [c22b1ff0] original_kernel_thread at c00055ac
PID: 17169  TASK: c171a000  CPU: 0   COMMAND: "application"
 #0 [c171bdf0] crash_save_current_state at c00190d4
 #1 [c171be30] __down at c0008150
 #2 [c171be60] jffs2_readpage at c008f658
 #3 [c171be80] do_generic_file_read at c00394fc
 #4 [c171bed0] generic_file_read at c0039b28
 #5 [c171bf10] sys_read at c0048bd0
 #6 [c171bf30] ret_from_syscall_1 at c0002b48
syscall [c00] exception frame:
R0:  00000003   R1:  7f5ff1b0   R2:  00000000   R3:  0000000e
R4:  3002a008   R5:  00020000   R6:  1002d8a0   R7:  00000001
R8:  0000002c   R9:  0fdcd918   R10: 7f5ffc00   R11: 7f5fffff
R12: 28428884   R13: 1001ba28   R14: 0000439a   R15: 00fbce5c
R16: 80a00701   R17: 00000000   R18: 7f5ff3d0   R19: 7f5ff71c
R20: 7f5ff390   R21: 00000000   R22: 3002a008   R23: 7f5ff370
R24: 7f5ff2f0   R25: 3002a008   R26: 00020000   R27: 00020000
R28: 3002a008   R29: 0000000e   R30: 0fb8ef24   R31: 00000000
NIP: 0fb1e310   MSR: 0002d030   OR3: 0000000e   CTR: 0faccd0c
LR:  0fdb48cc   XER: 20000000   CCR: 28424884   MQ:  00000000
DAR: 3002a000 DSISR: 00800000        Syscall Result: 00000000
Switching to user space stack (no more symbol info).

What I can't immediately determine is if it's just a general race
between GC and reading/writing threads when it comes to locking the
pages before locking alloc_sem and/or f->sem, or if these are two
separate races.

The patch you had should fix the above condition, but will it aggravate
the condition Nathan reported?

/me is now confused

josh

^ permalink raw reply	[flat|nested] 23+ messages in thread

* JFFS2 deadlock with alloc_sem
@ 2007-06-08 19:26 Dave Kleikamp
  2007-06-11 12:14 ` David Woodhouse
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Kleikamp @ 2007-06-08 19:26 UTC (permalink / raw)
  To: dwmw2; +Cc: Nathan.Roberts, linux-mtd

Forgive me for not following up properly, but I'm not on the mailing
list, and I'm following up from the archives.

> On Mon, 2007-04-30 at 15:41 -0400, Roberts Nathan-mcg31137 wrote:
> > Has anyone seen this deadlock before? It seems to be a classic deadlock 
> > situation so I'm not sure if maybe I'm misinterpreting things or 
> > the use case (several postmark tests running in parallel on a
> > preemptible kernel) is especially vulnerable. 
> 
> I think Josh has spotted the real problem here. Does this help? If so,
> as better fix will be forthcoming....
> 
> diff --git a/fs/jffs2/gc.c b/fs/jffs2/gc.c
> index 2d99e06..1066120 100644
> --- a/fs/jffs2/gc.c
> +++ b/fs/jffs2/gc.c
> @@ -1218,7 +1218,9 @@ static int jffs2_garbage_collect_dnode(struct jffs2_sb_info *c, struct jffs2_era
>  	 *    page OK. We'll actually write it out again in commit_write, which is a little
>  	 *    suboptimal, but at least we're correct.
>  	 */
> +	up(&f->sem);
>  	pg_ptr = jffs2_gc_fetch_page(c, f, start, &pg);
> +	down(&f->sem);
>  
>  	if (IS_ERR(pg_ptr)) {
>  		printk(KERN_WARNING "read_cache_page() returned error: %ld\n", PTR_ERR(pg_ptr));

As you know, jffs2_gc_fetch_page() causes read_cache_page() to call
jffs2_do_readpage_unlock().  jffs2_do_readpage_unlock() expects f->sem
to be held, but in no longer is.  I think, with this change,
jffs2_do_readpage_unlock() is the right place to take f->sem.  What do
you think of this patch, and does it have any affect on Nathan's
deadlock?

Note: If this works out, jffs2_readpage and jffs2_do_readpage_unlock can
probably be collapsed into one function.

diff -Nurp linux-2.6.22-rc4/fs/jffs2/file.c linux/fs/jffs2/file.c
--- linux-2.6.22-rc4/fs/jffs2/file.c	2007-06-05 08:10:59.000000000 -0500
+++ linux/fs/jffs2/file.c	2007-06-08 13:18:18.000000000 -0500
@@ -102,21 +102,20 @@ static int jffs2_do_readpage_nolock (str
 
 int jffs2_do_readpage_unlock(struct inode *inode, struct page *pg)
 {
-	int ret = jffs2_do_readpage_nolock(inode, pg);
+	struct jffs2_inode_info *f = JFFS2_INODE_INFO(inode);
+	int ret;
+
+	down(&f->sem);
+	ret = jffs2_do_readpage_nolock(inode, pg);
 	unlock_page(pg);
+	up(&f->sem);
 	return ret;
 }
 
 
 static int jffs2_readpage (struct file *filp, struct page *pg)
 {
-	struct jffs2_inode_info *f = JFFS2_INODE_INFO(pg->mapping->host);
-	int ret;
-
-	down(&f->sem);
-	ret = jffs2_do_readpage_unlock(pg->mapping->host, pg);
-	up(&f->sem);
-	return ret;
+	return jffs2_do_readpage_unlock(pg->mapping->host, pg);
 }
 
 static int jffs2_prepare_write (struct file *filp, struct page *pg,
diff -Nurp linux-2.6.22-rc4/fs/jffs2/gc.c linux/fs/jffs2/gc.c
--- linux-2.6.22-rc4/fs/jffs2/gc.c	2007-06-05 08:10:59.000000000 -0500
+++ linux/fs/jffs2/gc.c	2007-06-08 13:16:29.000000000 -0500
@@ -1218,7 +1218,9 @@ static int jffs2_garbage_collect_dnode(s
 	 *    page OK. We'll actually write it out again in commit_write, which is a little
 	 *    suboptimal, but at least we're correct.
 	 */
+	up(&f->sem);
 	pg_ptr = jffs2_gc_fetch_page(c, f, start, &pg);
+	down(&f->sem);
 
 	if (IS_ERR(pg_ptr)) {
 		printk(KERN_WARNING "read_cache_page() returned error: %ld\n", PTR_ERR(pg_ptr));

Thanks,
Shaggy
-- 
David Kleikamp
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: JFFS2 deadlock with alloc_sem
  2007-06-08 19:26 Dave Kleikamp
@ 2007-06-11 12:14 ` David Woodhouse
  2007-06-12  1:45   ` Roberts Nathan-mcg31137
  0 siblings, 1 reply; 23+ messages in thread
From: David Woodhouse @ 2007-06-11 12:14 UTC (permalink / raw)
  To: Dave Kleikamp; +Cc: Nathan.Roberts, linux-mtd

On Fri, 2007-06-08 at 14:26 -0500, Dave Kleikamp wrote:
> Forgive me for not following up properly, but I'm not on the mailing
> list, and I'm following up from the archives. 

Hm, sorry about that. The archives should be fixed (again) now -- if you
use the mailto: link at the top of an archived mail such as the one at
http://lists.infradead.org/pipermail/linux-mtd/2007-June/018477.html
your response should have a correct In-Reply-To: header and be part of
the thread.

> What do you think of this patch, and does it have any affect on
> Nathan's deadlock?

It looks sensible. Nathan?

> Note: If this works out, jffs2_readpage and jffs2_do_readpage_unlock
> can probably be collapsed into one function.

Yeah. And the ordering of f->sem vs. page locking should be documented
in README.Locking, now that we actually have rules for it :)

-- 
dwmw2

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: JFFS2 deadlock with alloc_sem
  2007-06-11 12:14 ` David Woodhouse
@ 2007-06-12  1:45   ` Roberts Nathan-mcg31137
  2007-06-19 16:11     ` Dave Kleikamp
  0 siblings, 1 reply; 23+ messages in thread
From: Roberts Nathan-mcg31137 @ 2007-06-12  1:45 UTC (permalink / raw)
  To: David Woodhouse, Dave Kleikamp; +Cc: linux-mtd

 

> On Fri, 2007-06-08 at 14:26 -0500, Dave Kleikamp wrote:
> > Forgive me for not following up properly, but I'm not on the mailing

> > list, and I'm following up from the archives.
>
> Hm, sorry about that. The archives should be fixed (again) now -- if
you use the mailto: link at the top of an archived mail such as the >
one at
http://lists.infradead.org/pipermail/linux-mtd/2007-June/018477.html
> your response should have a correct In-Reply-To: header and be part of
the thread.
>
> > What do you think of this patch, and does it have any affect on 
> > Nathan's deadlock?
>
> It looks sensible. Nathan?

We tried the patch at the bottom of this email(please let us know if
this doesn't look correct). We're now able to reproduce the hang even
quicker than before. Within a few seconds it will hang with these
backtraces. Is there any additional data we can provide that would help
narrow this down? 


[<c02403e0>] (__schedule+0x0/0x5b0) from [<c0240afc>]
(schedule+0xec/0x124)
[<c0240a10>] (schedule+0x0/0x124) from [<c023ffb0>]
(__compat_down+0xe0/0x178)
 r4 = C0A4A000 
[<c023fed0>] (__compat_down+0x0/0x178) from [<c023fe6c>]
(__compat_down_failed+0xc
/0x20)
 r8 = C2C2A42C  r7 = C32262E8  r6 = C32262E8  r5 = C2C2A400
 r4 = C322630C 
[<c00d6090>] (jffs2_reserve_space+0x0/0x268) from [<c00d85e4>]
(jffs2_write_inode_
range+0x5c/0x468)
[<c00d8588>] (jffs2_write_inode_range+0x0/0x468) from [<c00d35dc>]
(jffs2_commit_w
rite+0x1b0/0x31c)
[<c00d342c>] (jffs2_commit_write+0x0/0x31c) from [<c0063e10>]
(generic_file_buffer
ed_write+0x3e4/0x64c)
[<c0063a2c>] (generic_file_buffered_write+0x0/0x64c) from [<c0064700>]
(__generic_
file_aio_write_nolock+0x488/0x4b4)
[<c0064278>] (__generic_file_aio_write_nolock+0x0/0x4b4) from
[<c00647ac>] (__gene
ric_file_write_nolock+0x80/0xac)
[<c006472c>] (__generic_file_write_nolock+0x0/0xac) from [<c00648fc>]
(generic_fil
e_write+0x58/0xdc)
 r9 = C0A4A000  r8 = C245F820  r6 = 4001E008  r5 = C245F8BC
 r4 = C245F88C 
[<c00648a4>] (generic_file_write+0x0/0xdc) from [<c007f3ec>]
(vfs_write+0xec/0x170
)
[<c007f300>] (vfs_write+0x0/0x170) from [<c007f52c>]
(sys_write+0x48/0x74)
 r8 = C0025154  r7 = 00000004  r6 = C1C560E0  r5 = 00000000
 r4 = 0001B000 
[<c007f4e4>] (sys_write+0x0/0x74) from [<c00249a0>]
(ret_fast_syscall+0x0/0x34)
 r6 = 00000004  r5 = 0001B000  r4 = 00077C01 
[<c02403e0>] (__schedule+0x0/0x5b0) from [<c0240afc>]
(schedule+0xec/0x124)
[<c0240a10>] (schedule+0x0/0x124) from [<c023ffb0>]
(__compat_down+0xe0/0x178)
 r4 = C0A4A000 
[<c023fed0>] (__compat_down+0x0/0x178) from [<c023fe6c>]
(__compat_down_failed+0xc
/0x20)
 r8 = C2C2A42C  r7 = C32262E8  r6 = C32262E8  r5 = C2C2A400
 r4 = C322630C 
[<c00d6090>] (jffs2_reserve_space+0x0/0x268) from [<c00d85e4>]
(jffs2_write_inode_
range+0x5c/0x468)
[<c00d8588>] (jffs2_write_inode_range+0x0/0x468) from [<c00d35dc>]
(jffs2_commit_w
rite+0x1b0/0x31c)
[<c00d342c>] (jffs2_commit_write+0x0/0x31c) from [<c0063e10>]
(generic_file_buffer
ed_write+0x3e4/0x64c)
[<c0063a2c>] (generic_file_buffered_write+0x0/0x64c) from [<c0064700>]
(__generic_
file_aio_write_nolock+0x488/0x4b4)
[<c0064278>] (__generic_file_aio_write_nolock+0x0/0x4b4) from
[<c00647ac>] (__gene
ric_file_write_nolock+0x80/0xac)
[<c006472c>] (__generic_file_write_nolock+0x0/0xac) from [<c00648fc>]
(generic_fil
e_write+0x58/0xdc)
 r9 = C0A4A000  r8 = C245F820  r6 = 4001E008  r5 = C245F8BC
 r4 = C245F88C 
[<c00648a4>] (generic_file_write+0x0/0xdc) from [<c007f3ec>]
(vfs_write+0xec/0x170
)
[<c007f300>] (vfs_write+0x0/0x170) from [<c007f52c>]
(sys_write+0x48/0x74)
 r8 = C0025154  r7 = 00000004  r6 = C1C560E0  r5 = 00000000
 r4 = 0001B000 
[<c007f4e4>] (sys_write+0x0/0x74) from [<c00249a0>]
(ret_fast_syscall+0x0/0x34)
 r6 = 00000004  r5 = 0001B000  r4 = 00077C01 
[<c02403e0>] (__schedule+0x0/0x5b0) from [<c0240afc>]
(schedule+0xec/0x124)
[<c0240a10>] (schedule+0x0/0x124) from [<c023ffb0>]
(__compat_down+0xe0/0x178)
 r4 = C0A4A000 
[<c023fed0>] (__compat_down+0x0/0x178) from [<c023fe6c>]
(__compat_down_failed+0xc
/0x20)
 r8 = C2C2A42C  r7 = C32262E8  r6 = C32262E8  r5 = C2C2A400
 r4 = C322630C 
[<c00d6090>] (jffs2_reserve_space+0x0/0x268) from [<c00d85e4>]
(jffs2_write_inode_
range+0x5c/0x468)
[<c00d8588>] (jffs2_write_inode_range+0x0/0x468) from [<c00d35dc>]
(jffs2_commit_w
rite+0x1b0/0x31c)
[<c00d342c>] (jffs2_commit_write+0x0/0x31c) from [<c0063e10>]
(generic_file_buffer
ed_write+0x3e4/0x64c)
[<c0063a2c>] (generic_file_buffered_write+0x0/0x64c) from [<c0064700>]
(__generic_
file_aio_write_nolock+0x488/0x4b4)
[<c0064278>] (__generic_file_aio_write_nolock+0x0/0x4b4) from
[<c00647ac>] (__gene
ric_file_write_nolock+0x80/0xac)
[<c006472c>] (__generic_file_write_nolock+0x0/0xac) from [<c00648fc>]
(generic_fil
e_write+0x58/0xdc)
 r9 = C0A4A000  r8 = C245F820  r6 = 4001E008  r5 = C245F8BC
 r4 = C245F88C 
[<c00648a4>] (generic_file_write+0x0/0xdc) from [<c007f3ec>]
(vfs_write+0xec/0x170
)
[<c007f300>] (vfs_write+0x0/0x170) from [<c007f52c>]
(sys_write+0x48/0x74)
 r8 = C0025154  r7 = 00000004  r6 = C1C560E0  r5 = 00000000
 r4 = 0001B000 
[<c007f4e4>] (sys_write+0x0/0x74) from [<c00249a0>]
(ret_fast_syscall+0x0/0x34)
 r6 = 00000004  r5 = 0001B000  r4 = 00077C01 



====

jffs2_gcd_mtd18 info


[<c02403e0>] (__schedule+0x0/0x5b0) from [<c0240afc>]
(schedule+0xec/0x124)
[<c0240a10>] (schedule+0x0/0x124) from [<c02416c0>]
(io_schedule+0x2c/0x48)
 r4 = C02E02C8 
[<c0241694>] (io_schedule+0x0/0x48) from [<c00610f4>]
(sync_page+0x40/0x48)
 r5 = 00000000  r4 = C261DCF8 
[<c00610b4>] (sync_page+0x0/0x48) from [<c0241a8c>]
(__wait_on_bit_lock+0x54/0x88)
[<c0241a38>] (__wait_on_bit_lock+0x0/0x88) from [<c00619dc>]
(__lock_page+0x88/0x9
8)
[<c0061954>] (__lock_page+0x0/0x98) from [<c0063554>]
(read_cache_page+0x21c/0x324
)
 r5 = 00000000  r4 = C0352580 
[<c0063338>] (read_cache_page+0x0/0x324) from [<c00dfe68>]
(jffs2_gc_fetch_page+0x
2c/0x64)
[<c00dfe3c>] (jffs2_gc_fetch_page+0x0/0x64) from [<c00dcbb4>]
(jffs2_garbage_colle
ct_pass+0x14c4/0x1c68)
 r4 = C1805880 
[<c00db6f0>] (jffs2_garbage_collect_pass+0x0/0x1c68) from [<c00debf4>]
(jffs2_garb
age_collect_thread+0x148/0x19c)
[<c00deaac>] (jffs2_garbage_collect_thread+0x0/0x19c) from [<c0044a20>]
(do_exit+0
x0/0xd88)
 r7 = 00000000  r6 = 00000000  r5 = 00000000  r4 = 00000000
[<c02403e0>] (__schedule+0x0/0x5b0) from [<c0240afc>]
(schedule+0xec/0x124)
[<c0240a10>] (schedule+0x0/0x124) from [<c02416c0>]
(io_schedule+0x2c/0x48)
 r4 = C02E02C8 
[<c0241694>] (io_schedule+0x0/0x48) from [<c00610f4>]
(sync_page+0x40/0x48)
 r5 = 00000000  r4 = C261DCF8 
[<c00610b4>] (sync_page+0x0/0x48) from [<c0241a8c>]
(__wait_on_bit_lock+0x54/0x88)
[<c0241a38>] (__wait_on_bit_lock+0x0/0x88) from [<c00619dc>]
(__lock_page+0x88/0x9
8)
[<c0061954>] (__lock_page+0x0/0x98) from [<c0063554>]
(read_cache_page+0x21c/0x324
)
 r5 = 00000000  r4 = C0352580 
[<c0063338>] (read_cache_page+0x0/0x324) from [<c00dfe68>]
(jffs2_gc_fetch_page+0x
2c/0x64)
[<c00dfe3c>] (jffs2_gc_fetch_page+0x0/0x64) from [<c00dcbb4>]
(jffs2_garbage_colle
ct_pass+0x14c4/0x1c68)
 r4 = C1805880 
[<c00db6f0>] (jffs2_garbage_collect_pass+0x0/0x1c68) from [<c00debf4>]
(jffs2_garb
age_collect_thread+0x148/0x19c)
[<c00deaac>] (jffs2_garbage_collect_thread+0x0/0x19c) from [<c0044a20>]
(do_exit+0
x0/0xd88)
 r7 = 00000000  r6 = 00000000  r5 = 00000000  r4 = 00000000
[<c02403e0>] (__schedule+0x0/0x5b0) from [<c0240afc>]
(schedule+0xec/0x124)
[<c0240a10>] (schedule+0x0/0x124) from [<c02416c0>]
(io_schedule+0x2c/0x48)
 r4 = C02E02C8 
[<c0241694>] (io_schedule+0x0/0x48) from [<c00610f4>]
(sync_page+0x40/0x48)
 r5 = 00000000  r4 = C261DCF8 
[<c00610b4>] (sync_page+0x0/0x48) from [<c0241a8c>]
(__wait_on_bit_lock+0x54/0x88)
[<c0241a38>] (__wait_on_bit_lock+0x0/0x88) from [<c00619dc>]
(__lock_page+0x88/0x9
8)
[<c0061954>] (__lock_page+0x0/0x98) from [<c0063554>]
(read_cache_page+0x21c/0x324
)
 r5 = 00000000  r4 = C0352580 
[<c0063338>] (read_cache_page+0x0/0x324) from [<c00dfe68>]
(jffs2_gc_fetch_page+0x
2c/0x64)
[<c00dfe3c>] (jffs2_gc_fetch_page+0x0/0x64) from [<c00dcbb4>]
(jffs2_garbage_colle
ct_pass+0x14c4/0x1c68)
 r4 = C1805880 
[<c00db6f0>] (jffs2_garbage_collect_pass+0x0/0x1c68) from [<c00debf4>]
(jffs2_garb
age_collect_thread+0x148/0x19c)
[<c00deaac>] (jffs2_garbage_collect_thread+0x0/0x19c) from [<c0044a20>]
(do_exit+0
x0/0xd88)
 r7 = 00000000  r6 = 00000000  r5 = 00000000  r4 = 00000000


======
diff -uprN jffs2_orig/file.c jffs2_new/file.c
--- jffs2_orig/file.c	2007-06-11 13:07:29.000000000 -0500
+++ jffs2_new/file.c	2007-06-11 13:06:49.000000000 -0500
@@ -100,14 +100,26 @@ static int jffs2_do_readpage_nolock (str
 
 int jffs2_do_readpage_unlock(struct inode *inode, struct page *pg)
 {
+#if 0  
 	int ret = jffs2_do_readpage_nolock(inode, pg);
 	unlock_page(pg);
+#else
+   struct jffs2_inode_info *f = JFFS2_INODE_INFO(inode);
+   int ret;
+
+   down(&f->sem);
+   ret = jffs2_do_readpage_nolock(inode, pg);
+   unlock_page(pg);
+   up(&f->sem);
+
+#endif
 	return ret;
 }
 
 
 static int jffs2_readpage (struct file *filp, struct page *pg)
 {
+#if 0  
 	struct jffs2_inode_info *f =
JFFS2_INODE_INFO(pg->mapping->host);
 	int ret;
 	
@@ -115,6 +127,9 @@ static int jffs2_readpage (struct file *
 	ret = jffs2_do_readpage_unlock(pg->mapping->host, pg);
 	up(&f->sem);
 	return ret;
+#else
+   return jffs2_do_readpage_unlock(pg->mapping->host, pg);
+#endif
 }
 
 static int jffs2_prepare_write (struct file *filp, struct page *pg,
diff -uprN jffs2_orig/gc.c jffs2_new/gc.c
--- jffs2_orig/gc.c	2007-06-11 13:09:05.000000000 -0500
+++ jffs2_new/gc.c	2007-06-11 13:06:38.000000000 -0500
@@ -1202,7 +1202,13 @@ static int jffs2_garbage_collect_dnode(s
 	 *    page OK. We'll actually write it out again in
commit_write, which is a little
 	 *    suboptimal, but at least we're correct.
 	 */
+#if 0
 	pg_ptr = jffs2_gc_fetch_page(c, f, start, &pg);
+#else
+    up(&f->sem);
+	pg_ptr = jffs2_gc_fetch_page(c, f, start, &pg);
+    down(&f->sem);
+#endif
 
 	if (IS_ERR(pg_ptr)) {
 		printk(KERN_WARNING "read_cache_page() returned error:
%ld\n", PTR_ERR(pg_ptr));

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: JFFS2 deadlock with alloc_sem
  2007-06-12  1:45   ` Roberts Nathan-mcg31137
@ 2007-06-19 16:11     ` Dave Kleikamp
  2007-06-19 19:42       ` Dave Kleikamp
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Kleikamp @ 2007-06-19 16:11 UTC (permalink / raw)
  To: Roberts Nathan-mcg31137; +Cc: linux-mtd, David Woodhouse

On Mon, 2007-06-11 at 21:45 -0400, Roberts Nathan-mcg31137 wrote:
>  
> > On Fri, 2007-06-08 at 14:26 -0500, Dave Kleikamp wrote:
> > > What do you think of this patch, and does it have any affect on 
> > > Nathan's deadlock?
> >
> > It looks sensible. Nathan?
> 
> We tried the patch at the bottom of this email(please let us know if
> this doesn't look correct). We're now able to reproduce the hang even
> quicker than before. Within a few seconds it will hang with these
> backtraces. Is there any additional data we can provide that would help
> narrow this down? 

We came up with a fix for the hang were seeing on an older kernel.  It's
hacky and intrusive, but I wanted to offer it up at least to see if it
fixes Nathan's hang, and if so, maybe it can lead to some nicer patch
that may be acceptable.

This version, against linux-2.6.22-rc5 has been compile-tested only.
Don't let the _async scare you.  Since jffs2's filler function is
synchronous, so there's no need to wait for the page to become unlocked
again.

diff -Nurp linux-2.6.22-rc5/fs/jffs2/fs.c linux/fs/jffs2/fs.c
--- linux-2.6.22-rc5/fs/jffs2/fs.c	2007-06-17 08:01:52.000000000 -0500
+++ linux/fs/jffs2/fs.c	2007-06-19 10:29:38.000000000 -0500
@@ -627,8 +627,10 @@ unsigned char *jffs2_gc_fetch_page(struc
 	struct inode *inode = OFNI_EDONI_2SFFJ(f);
 	struct page *pg;
 
-	pg = read_cache_page(inode->i_mapping, offset >> PAGE_CACHE_SHIFT,
-			     (void *)jffs2_do_readpage_unlock, inode);
+	pg = read_cache_page_async_trylock(inode->i_mapping,
+					   offset >> PAGE_CACHE_SHIFT,
+					   (void *)jffs2_do_readpage_unlock,
+					   inode);
 	if (IS_ERR(pg))
 		return (void *)pg;
 
diff -Nurp linux-2.6.22-rc5/include/linux/pagemap.h linux/include/linux/pagemap.h
--- linux-2.6.22-rc5/include/linux/pagemap.h	2007-06-17 08:02:01.000000000 -0500
+++ linux/include/linux/pagemap.h	2007-06-19 10:28:27.000000000 -0500
@@ -112,6 +112,10 @@ extern struct page * read_cache_page_asy
 extern struct page * read_cache_page(struct address_space *mapping,
 				unsigned long index, filler_t *filler,
 				void *data);
+extern struct page * read_cache_page_async_trylock(
+				struct address_space *mapping,
+				unsigned long index, filler_t *filler,
+				void *data);
 extern int read_cache_pages(struct address_space *mapping,
 		struct list_head *pages, filler_t *filler, void *data);
 
diff -Nurp linux-2.6.22-rc5/mm/filemap.c linux/mm/filemap.c
--- linux-2.6.22-rc5/mm/filemap.c	2007-06-17 08:02:02.000000000 -0500
+++ linux/mm/filemap.c	2007-06-19 10:37:26.000000000 -0500
@@ -1844,6 +1844,50 @@ struct page *read_cache_page(struct addr
 EXPORT_SYMBOL(read_cache_page);
 
 /*
+ * Same as read_cache_page_async, but abort if the page is locked.
+ */
+struct page *read_cache_page_async_trylock(struct address_space *mapping,
+				unsigned long index,
+				int (*filler)(void *, struct page *),
+				void *data)
+{
+	struct page *page;
+	int err;
+
+retry:
+	page = __read_cache_page(mapping, index, filler, data);
+	if (IS_ERR(page))
+		return page;
+	mark_page_accessed(page);
+	if (PageUptodate(page))
+		goto out;
+
+	if (TestSetPageLocked(page)) {
+		page_cache_release(page);
+		return ERR_PTR(-EBUSY);
+	}
+
+	if (!page->mapping) {
+		unlock_page(page);
+		page_cache_release(page);
+		goto retry;
+	}
+	if (PageUptodate(page)) {
+		unlock_page(page);
+		goto out;
+	}
+	err = filler(data, page);
+	if (err < 0) {
+		page_cache_release(page);
+		return ERR_PTR(err);
+	}
+out:
+	mark_page_accessed(page);
+	return page;
+}
+EXPORT_SYMBOL(read_cache_page_async_trylock);
+
+/*
  * If the page was newly created, increment its refcount and add it to the
  * caller's lru-buffering pagevec.  This function is specifically for
  * generic_file_write().

-- 
David Kleikamp
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: JFFS2 deadlock with alloc_sem
  2007-06-19 16:11     ` Dave Kleikamp
@ 2007-06-19 19:42       ` Dave Kleikamp
  0 siblings, 0 replies; 23+ messages in thread
From: Dave Kleikamp @ 2007-06-19 19:42 UTC (permalink / raw)
  To: Roberts Nathan-mcg31137; +Cc: linux-mtd, David Woodhouse, copelanm

On Tue, 2007-06-19 at 11:11 -0500, Dave Kleikamp wrote:

> We came up with a fix for the hang were seeing on an older kernel.  It's
> hacky and intrusive, but I wanted to offer it up at least to see if it
> fixes Nathan's hang, and if so, maybe it can lead to some nicer patch
> that may be acceptable.
> 
> This version, against linux-2.6.22-rc5 has been compile-tested only.
> Don't let the _async scare you.  Since jffs2's filler function is
> synchronous, so there's no need to wait for the page to become unlocked
> again.

Monte Copeland pointed out that my last patch could pass the -EBUSY
return code up to a calling thread which would be undesirable.  I was
only considering the garbage-collector thread which ignores the return
code.  Here is the fixed patch.  This has been tested against a
2.6.16-based kernel.

Again, this is a bit intrusive, but it may be a step in the right
direction.

diff -Nurp linux-2.6.22-rc5/fs/jffs2/fs.c linux/fs/jffs2/fs.c
--- linux-2.6.22-rc5/fs/jffs2/fs.c	2007-06-17 08:01:52.000000000 -0500
+++ linux/fs/jffs2/fs.c	2007-06-19 14:28:30.000000000 -0500
@@ -627,8 +627,15 @@ unsigned char *jffs2_gc_fetch_page(struc
 	struct inode *inode = OFNI_EDONI_2SFFJ(f);
 	struct page *pg;
 
-	pg = read_cache_page(inode->i_mapping, offset >> PAGE_CACHE_SHIFT,
-			     (void *)jffs2_do_readpage_unlock, inode);
+	/* read_cache_page_async_trylock will return -EBUSY
+	 * if it is not possible to lock the cache page. If we
+	 * get -EBUSY, then avoid a deadlock between
+	 * cache page locks and f->sem.
+	 */
+	pg = read_cache_page_async_trylock(inode->i_mapping,
+					   offset >> PAGE_CACHE_SHIFT,
+					   (void *)jffs2_do_readpage_unlock,
+					   inode);
 	if (IS_ERR(pg))
 		return (void *)pg;
 
diff -Nurp linux-2.6.22-rc5/fs/jffs2/gc.c linux/fs/jffs2/gc.c
--- linux-2.6.22-rc5/fs/jffs2/gc.c	2007-06-17 08:01:52.000000000 -0500
+++ linux/fs/jffs2/gc.c	2007-06-19 14:30:00.000000000 -0500
@@ -1221,8 +1221,13 @@ static int jffs2_garbage_collect_dnode(s
 	pg_ptr = jffs2_gc_fetch_page(c, f, start, &pg);
 
 	if (IS_ERR(pg_ptr)) {
-		printk(KERN_WARNING "read_cache_page() returned error: %ld\n", PTR_ERR(pg_ptr));
-		return PTR_ERR(pg_ptr);
+		if ( PTR_ERR(pg_ptr) == -EBUSY ) {
+			printk(KERN_WARNING "jffs2_gc_fetch_page() rc -EBUSY. Deadlock avoided.\n" );
+			return 0;
+		} else {
+			printk(KERN_WARNING "jffs2_gc_fetch_page() rc %ld\n", PTR_ERR(pg_ptr));
+			return PTR_ERR(pg_ptr);
+		}
 	}
 
 	offset = start;
diff -Nurp linux-2.6.22-rc5/include/linux/pagemap.h linux/include/linux/pagemap.h
--- linux-2.6.22-rc5/include/linux/pagemap.h	2007-06-17 08:02:01.000000000 -0500
+++ linux/include/linux/pagemap.h	2007-06-19 14:30:12.000000000 -0500
@@ -112,6 +112,10 @@ extern struct page * read_cache_page_asy
 extern struct page * read_cache_page(struct address_space *mapping,
 				unsigned long index, filler_t *filler,
 				void *data);
+extern struct page * read_cache_page_async_trylock(
+				struct address_space *mapping,
+				unsigned long index, filler_t *filler,
+				void *data);
 extern int read_cache_pages(struct address_space *mapping,
 		struct list_head *pages, filler_t *filler, void *data);
 
diff -Nurp linux-2.6.22-rc5/mm/filemap.c linux/mm/filemap.c
--- linux-2.6.22-rc5/mm/filemap.c	2007-06-17 08:02:02.000000000 -0500
+++ linux/mm/filemap.c	2007-06-19 14:31:36.000000000 -0500
@@ -1843,6 +1843,51 @@ struct page *read_cache_page(struct addr
 }
 EXPORT_SYMBOL(read_cache_page);
 
+
+/*
+ * Same as read_cache_page, but abort if the page is locked.
+ */
+struct page *read_cache_page_async_trylock(struct address_space *mapping,
+				unsigned long index,
+				int (*filler)(void *, struct page *),
+				void *data)
+{
+	struct page *page;
+	int err;
+
+retry:
+	page = __read_cache_page(mapping, index, filler, data);
+	if (IS_ERR(page))
+		goto out;
+	mark_page_accessed(page);
+	if (PageUptodate(page))
+		goto out;
+
+	if (TestSetPageLocked(page)) {
+		page_cache_release(page);
+		return ERR_PTR(-EBUSY);
+	}
+
+	if (!page->mapping) {
+		unlock_page(page);
+		page_cache_release(page);
+		goto retry;
+	}
+	if (PageUptodate(page)) {
+		unlock_page(page);
+		goto out;
+	}
+	err = filler(data, page);
+	if (err < 0) {
+		page_cache_release(page);
+		page = ERR_PTR(err);
+	}
+ out:
+	mark_page_accessed(page);
+	return page;
+}
+EXPORT_SYMBOL(read_cache_page_async_trylock);
+
 /*
  * If the page was newly created, increment its refcount and add it to the
  * caller's lru-buffering pagevec.  This function is specifically for

-- 
David Kleikamp
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: JFFS2 deadlock with alloc_sem
       [not found] <af3ea28a0707262032h7ee22775t6ef54e364a9cd704@mail.gmail.com>
@ 2007-07-27  3:38 ` ye janboe
  2007-07-27 13:42   ` Dave Kleikamp
  0 siblings, 1 reply; 23+ messages in thread
From: ye janboe @ 2007-07-27  3:38 UTC (permalink / raw)
  To: shaggy; +Cc: linux-mtd

Does this make fsync system call failed to cause a risk of loss data?
I posted my fix base on your patch, please review it.

Thanks

--- gc.c@@/main/fl71_main/a22116_jffs2_sync_debug/0     2007-07-24
17:02:26.746154000 +0800
+++ gc.c@@/main/fl71_main/a22116_jffs2_sync_debug/2     2007-07-24
17:39:35.000000000 +0800
@@ -1299,7 +1299,10 @@ static int jffs2_garbage_collect_dnode(s
        if (IS_ERR(pg_ptr)) {
                if ( PTR_ERR(pg_ptr) == -EBUSY ) {
                        printk(KERN_WARNING "jffs2_gc_fetch_page() rc
-EBUSY. Deadlock avoided.\n" );
-                       return 0;
+                       if (current->flags & PF_SYNCWRITE)
+                               return -EBUSY;
+                       else
+                               return 0;
                } else {
                        printk(KERN_WARNING "jffs2_gc_fetch_page() rc
%ld\n", PTR_ERR(pg_ptr));
                        return PTR_ERR(pg_ptr);



> On Tue, 2007-06-19 at 11:11 -0500, Dave Kleikamp wrote:
>
> > We came up with a fix for the hang were seeing on an older kernel.  It's
> > hacky and intrusive, but I wanted to offer it up at least to see if it
> > fixes Nathan's hang, and if so, maybe it can lead to some nicer patch
> > that may be acceptable.
> >
> > This version, against linux-2.6.22-rc5 has been compile-tested only.
> > Don't let the _async scare you.  Since jffs2's filler function is
> > synchronous, so there's no need to wait for the page to become unlocked
> > again.
>
> Monte Copeland pointed out that my last patch could pass the -EBUSY
> return code up to a calling thread which would be undesirable.  I was
> only considering the garbage-collector thread which ignores the return
> code.  Here is the fixed patch.  This has been tested against a
> 2.6.16-based kernel.
>
> Again, this is a bit intrusive, but it may be a step in the right
> direction.
>
> diff -Nurp linux-2.6.22-rc5/fs/jffs2/fs.c linux/fs/jffs2/fs.c
> --- linux-2.6.22-rc5/fs/jffs2/fs.c      2007-06-17 08:01:52.000000000 -0500
> +++ linux/fs/jffs2/fs.c 2007-06-19 14:28:30.000000000 -0500
> @@ -627,8 +627,15 @@ unsigned char *jffs2_gc_fetch_page(struc
>         struct inode *inode = OFNI_EDONI_2SFFJ(f);
>         struct page *pg;
>
> -       pg = read_cache_page(inode->i_mapping, offset >> PAGE_CACHE_SHIFT,
> -                            (void *)jffs2_do_readpage_unlock, inode);
> +       /* read_cache_page_async_trylock will return -EBUSY
> +        * if it is not possible to lock the cache page. If we
> +        * get -EBUSY, then avoid a deadlock between
> +        * cache page locks and f->sem.
> +        */
> +       pg = read_cache_page_async_trylock(inode->i_mapping,
> +                                          offset >> PAGE_CACHE_SHIFT,
> +                                          (void *)jffs2_do_readpage_unlock,
> +                                          inode);
>         if (IS_ERR(pg))
>                 return (void *)pg;
>
> diff -Nurp linux-2.6.22-rc5/fs/jffs2/gc.c linux/fs/jffs2/gc.c
> --- linux-2.6.22-rc5/fs/jffs2/gc.c      2007-06-17 08:01:52.000000000 -0500
> +++ linux/fs/jffs2/gc.c 2007-06-19 14:30:00.000000000 -0500
> @@ -1221,8 +1221,13 @@ static int jffs2_garbage_collect_dnode(s
>         pg_ptr = jffs2_gc_fetch_page(c, f, start, &pg);
>
>         if (IS_ERR(pg_ptr)) {
> -               printk(KERN_WARNING "read_cache_page() returned error: %ld\n",
> PTR_ERR(pg_ptr));
> -               return PTR_ERR(pg_ptr);
> +               if ( PTR_ERR(pg_ptr) == -EBUSY ) {
> +                       printk(KERN_WARNING "jffs2_gc_fetch_page() rc -EBUSY. Deadlock
> avoided.\n" );
> +                       return 0;
> +               } else {
> +                       printk(KERN_WARNING "jffs2_gc_fetch_page() rc %ld\n", PTR_ERR(pg_ptr));
> +                       return PTR_ERR(pg_ptr);
> +               }
>         }
>
>         offset = start;
> diff -Nurp linux-2.6.22-rc5/include/linux/pagemap.h
> linux/include/linux/pagemap.h
> --- linux-2.6.22-rc5/include/linux/pagemap.h    2007-06-17 08:02:01.000000000 -0500
> +++ linux/include/linux/pagemap.h       2007-06-19 14:30:12.000000000 -0500
> @@ -112,6 +112,10 @@ extern struct page * read_cache_page_asy
>  extern struct page * read_cache_page(struct address_space *mapping,
>                                 unsigned long index, filler_t *filler,
>                                 void *data);
> +extern struct page * read_cache_page_async_trylock(
> +                               struct address_space *mapping,
> +                               unsigned long index, filler_t *filler,
> +                               void *data);
>  extern int read_cache_pages(struct address_space *mapping,
>                 struct list_head *pages, filler_t *filler, void *data);
>
> diff -Nurp linux-2.6.22-rc5/mm/filemap.c linux/mm/filemap.c
> --- linux-2.6.22-rc5/mm/filemap.c       2007-06-17 08:02:02.000000000 -0500
> +++ linux/mm/filemap.c  2007-06-19 14:31:36.000000000 -0500
> @@ -1843,6 +1843,51 @@ struct page *read_cache_page(struct addr
>  }
>  EXPORT_SYMBOL(read_cache_page);
>
> +
> +/*
> + * Same as read_cache_page, but abort if the page is locked.
> + */
> +struct page *read_cache_page_async_trylock(struct address_space *mapping,
> +                               unsigned long index,
> +                               int (*filler)(void *, struct page *),
> +                               void *data)
> +{
> +       struct page *page;
> +       int err;
> +
> +retry:
> +       page = __read_cache_page(mapping, index, filler, data);
> +       if (IS_ERR(page))
> +               goto out;
> +       mark_page_accessed(page);
> +       if (PageUptodate(page))
> +               goto out;
> +
> +       if (TestSetPageLocked(page)) {
> +               page_cache_release(page);
> +               return ERR_PTR(-EBUSY);
> +       }
> +
> +       if (!page->mapping) {
> +               unlock_page(page);
> +               page_cache_release(page);
> +               goto retry;
> +       }
> +       if (PageUptodate(page)) {
> +               unlock_page(page);
> +               goto out;
> +       }
> +       err = filler(data, page);
> +       if (err < 0) {
> +               page_cache_release(page);
> +               page = ERR_PTR(err);
> +       }
> + out:
> +       mark_page_accessed(page);
> +       return page;
> +}
> +EXPORT_SYMBOL(read_cache_page_async_trylock);
> +
>  /*
>   * If the page was newly created, increment its refcount and add it to the
>   * caller's lru-buffering pagevec.  This function is specifically for
>
> --
> David Kleikamp
> IBM Linux Technology Center
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: JFFS2 deadlock with alloc_sem
  2007-07-27  3:38 ` ye janboe
@ 2007-07-27 13:42   ` Dave Kleikamp
  2007-07-27 16:35     ` ye janboe
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Kleikamp @ 2007-07-27 13:42 UTC (permalink / raw)
  To: ye janboe; +Cc: linux-mtd

On Fri, 2007-07-27 at 11:38 +0800, ye janboe wrote:
> Does this make fsync system call failed to cause a risk of loss data?

Could be.  It looks like I wasn't careful about considering all the
callers of jffs2_garbage_collect_pass().  I assume the problem is when
it is called from jffs2_flush_wbuf_gc().

> I posted my fix base on your patch, please review it.
> 
> Thanks
> 
> --- gc.c@@/main/fl71_main/a22116_jffs2_sync_debug/0     2007-07-24
> 17:02:26.746154000 +0800
> +++ gc.c@@/main/fl71_main/a22116_jffs2_sync_debug/2     2007-07-24
> 17:39:35.000000000 +0800
> @@ -1299,7 +1299,10 @@ static int jffs2_garbage_collect_dnode(s
>         if (IS_ERR(pg_ptr)) {
>                 if ( PTR_ERR(pg_ptr) == -EBUSY ) {
>                         printk(KERN_WARNING "jffs2_gc_fetch_page() rc
> -EBUSY. Deadlock avoided.\n" );
> -                       return 0;
> +                       if (current->flags & PF_SYNCWRITE)
> +                               return -EBUSY;
> +                       else
> +                               return 0;
>                 } else {
>                         printk(KERN_WARNING "jffs2_gc_fetch_page() rc
> %ld\n", PTR_ERR(pg_ptr));
>                         return PTR_ERR(pg_ptr);

That is probably okay, but I think it is getting more complicated than
it needs to be. 

What if we let jffs2_garbage_collect_dnode() return -EBUSY all the time,
and let the callers that don't care ignore it?

How's this look?

diff -Nurp linux.orig/fs/jffs2/gc.c linux/fs/jffs2/gc.c
--- linux.orig/fs/jffs2/gc.c	2007-07-27 08:32:36.000000000 -0500
+++ linux/fs/jffs2/gc.c	2007-07-27 08:33:59.000000000 -0500
@@ -1221,13 +1221,8 @@ static int jffs2_garbage_collect_dnode(s
 	pg_ptr = jffs2_gc_fetch_page(c, f, start, &pg);
 
 	if (IS_ERR(pg_ptr)) {
-		if ( PTR_ERR(pg_ptr) == -EBUSY ) {
-			printk(KERN_WARNING "jffs2_gc_fetch_page() rc -EBUSY. Deadlock avoided.\n" );
-			return 0;
-		} else {
-			printk(KERN_WARNING "jffs2_gc_fetch_page() rc %ld\n", PTR_ERR(pg_ptr));
-			return PTR_ERR(pg_ptr);
-		}
+		printk(KERN_WARNING "jffs2_gc_fetch_page() rc %ld\n", PTR_ERR(pg_ptr));
+		return PTR_ERR(pg_ptr);
 	}
 
 	offset = start;
diff -Nurp linux.orig/fs/jffs2/nodemgmt.c linux/fs/jffs2/nodemgmt.c
--- linux.orig/fs/jffs2/nodemgmt.c	2007-07-27 08:32:25.000000000 -0500
+++ linux/fs/jffs2/nodemgmt.c	2007-07-27 08:35:46.000000000 -0500
@@ -117,7 +117,7 @@ int jffs2_reserve_space(struct jffs2_sb_
 			spin_unlock(&c->erase_completion_lock);
 
 			ret = jffs2_garbage_collect_pass(c);
-			if (ret)
+			if (ret && (ret != -EBUSY))
 				return ret;
 
 			cond_resched();

-- 
David Kleikamp
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: JFFS2 deadlock with alloc_sem
  2007-07-27 13:42   ` Dave Kleikamp
@ 2007-07-27 16:35     ` ye janboe
  2007-07-27 17:38       ` Dave Kleikamp
  0 siblings, 1 reply; 23+ messages in thread
From: ye janboe @ 2007-07-27 16:35 UTC (permalink / raw)
  To: Dave Kleikamp; +Cc: linux-mtd

David

Thanks for you comments.
I still can not understand why there is comment in code say the dead
lock can not happen.
Could you give me some hints?

2007/7/27, Dave Kleikamp <shaggy@linux.vnet.ibm.com>:
> On Fri, 2007-07-27 at 11:38 +0800, ye janboe wrote:
> > Does this make fsync system call failed to cause a risk of loss data?
>
> Could be.  It looks like I wasn't careful about considering all the
> callers of jffs2_garbage_collect_pass().  I assume the problem is when
> it is called from jffs2_flush_wbuf_gc().
>
> > I posted my fix base on your patch, please review it.
> >
> > Thanks
> >
> > --- gc.c@@/main/fl71_main/a22116_jffs2_sync_debug/0     2007-07-24
> > 17:02:26.746154000 +0800
> > +++ gc.c@@/main/fl71_main/a22116_jffs2_sync_debug/2     2007-07-24
> > 17:39:35.000000000 +0800
> > @@ -1299,7 +1299,10 @@ static int jffs2_garbage_collect_dnode(s
> >         if (IS_ERR(pg_ptr)) {
> >                 if ( PTR_ERR(pg_ptr) == -EBUSY ) {
> >                         printk(KERN_WARNING "jffs2_gc_fetch_page() rc
> > -EBUSY. Deadlock avoided.\n" );
> > -                       return 0;
> > +                       if (current->flags & PF_SYNCWRITE)
> > +                               return -EBUSY;
> > +                       else
> > +                               return 0;
> >                 } else {
> >                         printk(KERN_WARNING "jffs2_gc_fetch_page() rc
> > %ld\n", PTR_ERR(pg_ptr));
> >                         return PTR_ERR(pg_ptr);
>
> That is probably okay, but I think it is getting more complicated than
> it needs to be.
>
> What if we let jffs2_garbage_collect_dnode() return -EBUSY all the time,
> and let the callers that don't care ignore it?
>
> How's this look?
>
> diff -Nurp linux.orig/fs/jffs2/gc.c linux/fs/jffs2/gc.c
> --- linux.orig/fs/jffs2/gc.c    2007-07-27 08:32:36.000000000 -0500
> +++ linux/fs/jffs2/gc.c 2007-07-27 08:33:59.000000000 -0500
> @@ -1221,13 +1221,8 @@ static int jffs2_garbage_collect_dnode(s
>         pg_ptr = jffs2_gc_fetch_page(c, f, start, &pg);
>
>         if (IS_ERR(pg_ptr)) {
> -               if ( PTR_ERR(pg_ptr) == -EBUSY ) {
> -                       printk(KERN_WARNING "jffs2_gc_fetch_page() rc -EBUSY. Deadlock avoided.\n" );
> -                       return 0;
> -               } else {
> -                       printk(KERN_WARNING "jffs2_gc_fetch_page() rc %ld\n", PTR_ERR(pg_ptr));
> -                       return PTR_ERR(pg_ptr);
> -               }
> +               printk(KERN_WARNING "jffs2_gc_fetch_page() rc %ld\n", PTR_ERR(pg_ptr));
> +               return PTR_ERR(pg_ptr);
>         }
>
>         offset = start;
> diff -Nurp linux.orig/fs/jffs2/nodemgmt.c linux/fs/jffs2/nodemgmt.c
> --- linux.orig/fs/jffs2/nodemgmt.c      2007-07-27 08:32:25.000000000 -0500
> +++ linux/fs/jffs2/nodemgmt.c   2007-07-27 08:35:46.000000000 -0500
> @@ -117,7 +117,7 @@ int jffs2_reserve_space(struct jffs2_sb_
>                         spin_unlock(&c->erase_completion_lock);
>
>                         ret = jffs2_garbage_collect_pass(c);
> -                       if (ret)
> +                       if (ret && (ret != -EBUSY))
>                                 return ret;
>
>                         cond_resched();
>
> --
> David Kleikamp
> IBM Linux Technology Center
>
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: JFFS2 deadlock with alloc_sem
  2007-07-27 16:35     ` ye janboe
@ 2007-07-27 17:38       ` Dave Kleikamp
  2007-07-30 12:45         ` David Woodhouse
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Kleikamp @ 2007-07-27 17:38 UTC (permalink / raw)
  To: ye janboe; +Cc: linux-mtd

On Sat, 2007-07-28 at 00:35 +0800, ye janboe wrote:
> David
> 
> Thanks for you comments.
> I still can not understand why there is comment in code say the dead
> lock can not happen.
> Could you give me some hints?

I could only speculate, since I really don't know the code.  I suspect
the writer of the comment had some other deadlock scenario in mind and
didn't foresee the one that was encountered.

Shaggy
-- 
David Kleikamp
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: JFFS2 deadlock with alloc_sem
  2007-07-27 17:38       ` Dave Kleikamp
@ 2007-07-30 12:45         ` David Woodhouse
  2007-07-30 16:45           ` Dave Kleikamp
  0 siblings, 1 reply; 23+ messages in thread
From: David Woodhouse @ 2007-07-30 12:45 UTC (permalink / raw)
  To: Dave Kleikamp; +Cc: linux-mtd, ye janboe

On Fri, 2007-07-27 at 12:38 -0500, Dave Kleikamp wrote:
> I could only speculate, since I really don't know the code.  I suspect
> the writer of the comment had some other deadlock scenario in mind and
> didn't foresee the one that was encountered. 

I'm still confused because we deliberately ensure the page is up to date
in prepare_write -- so read_cache_page() should never have to lock it,
and we should never deadlock.

Or did I miss something in your analysis (or elsewhere)?

Why is read_cache_page() ever trying to lock this page in the first
place? It should be up to date.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: JFFS2 deadlock with alloc_sem
  2007-07-30 12:45         ` David Woodhouse
@ 2007-07-30 16:45           ` Dave Kleikamp
  2007-07-31 12:10             ` David Woodhouse
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Kleikamp @ 2007-07-30 16:45 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-mtd, ye janboe

On Mon, 2007-07-30 at 13:45 +0100, David Woodhouse wrote:
> On Fri, 2007-07-27 at 12:38 -0500, Dave Kleikamp wrote:
> > I could only speculate, since I really don't know the code.  I suspect
> > the writer of the comment had some other deadlock scenario in mind and
> > didn't foresee the one that was encountered. 
> 
> I'm still confused because we deliberately ensure the page is up to date
> in prepare_write -- so read_cache_page() should never have to lock it,
> and we should never deadlock.

I got involved in this because of a deadlock we found on the 2.4 kernel
which is similar to the one that Nathan reported, but there are some
differences.  Our deadlock didn't involve a thread in vfs_write.

In our case, the page is locked in jffs2_readpage, then blocks on the
file lock.

I don't think prepare_write() get called in this case, and there isn't a
guarantee that the page is up-to-date.

> Or did I miss something in your analysis (or elsewhere)?jffs2_gc_fetch_page

My analysis is of a race between the gc thread and jffs2_readpage, so
it's a bit different.

> Why is read_cache_page() ever trying to lock this page in the first
> place? It should be up to date.

1) In our case, I don't think the page is up-to-date.

2) In Nathan's case, I don't know.  Could there have been another thread
locking a page that wasn't in his report?

3) I see a problem in linux-2.6.22 and later.  It looks like
read_cache_page() will wait for the page to become unlocked, regardless
of whether or not the page is up to date.

For some background, here is a comment from an internal bug report that
explains the deadlock we found:
-------------------------
Although the scenario of this failure may vary from bug 4168, the crux of the 
problem is still the deadlock caused by jffs2 layer. 


Digging deeper, the deadlock involves two threads: pid 75 and pid 17169
PID: 75     TASK: c22b0000  CPU: 0   COMMAND: "jffs2_gcd_mtd17"
 #0 [c22b1dc0] crash_save_current_state at c00190d4
 #1 [c22b1e00] __lock_page at c0038be0
 #2 [c22b1e30] read_cache_page at c003b5dc
 #3 [c22b1e60] jffs2_garbage_collect_dnode at c00960fc
 #4 [c22b1f10] jffs2_garbage_collect_pass at c00951b0
 #5 [c22b1f50] jffs2_garbage_collect_thread at c0097994
 #6 [c22b1ff0] original_kernel_thread at c00055ac
PID: 17169  TASK: c171a000  CPU: 0   COMMAND: "netsCommonMsgSe"
 #0 [c171bdf0] crash_save_current_state at c00190d4
 #1 [c171be30] __down at c0008150
 #2 [c171be60] jffs2_readpage at c008f658
 #3 [c171be80] do_generic_file_read at c00394fc
 #4 [c171bed0] generic_file_read at c0039b28
 #5 [c171bf10] sys_read at c0048bd0
 #6 [c171bf30] ret_from_syscall_1 at c0002b48
syscall [c00] exception frame:
R0:  00000003   R1:  7f5ff1b0   R2:  00000000   R3:  0000000e
R4:  3002a008   R5:  00020000   R6:  1002d8a0   R7:  00000001
R8:  0000002c   R9:  0fdcd918   R10: 7f5ffc00   R11: 7f5fffff
R12: 28428884   R13: 1001ba28   R14: 0000439a   R15: 00fbce5c
R16: 80a00701   R17: 00000000   R18: 7f5ff3d0   R19: 7f5ff71c
R20: 7f5ff390   R21: 00000000   R22: 3002a008   R23: 7f5ff370
R24: 7f5ff2f0   R25: 3002a008   R26: 00020000   R27: 00020000
R28: 3002a008   R29: 0000000e   R30: 0fb8ef24   R31: 00000000
NIP: 0fb1e310   MSR: 0002d030   OR3: 0000000e   CTR: 0faccd0c
LR:  0fdb48cc   XER: 20000000   CCR: 28424884   MQ:  00000000
DAR: 3002a000 DSISR: 00800000        Syscall Result: 00000000
Switching to user space stack (no more symbol info).
foreach: invalid kernel virtual address: c000000  type: "mm_struct pgd"

Thread 75 (gc thread) is holding file lock of /alt/extucode/81e002ff.lid
and waiting for the page lock, while thread 17169 is holding the page lock and 
waiting for the same file lock. 

Thus we conclude that the root cause of the problem is that jffs2 is not 
conforming to the strict order of acquiring multiple locks, ie., all code 
paths resulting in acquiring multiple locks must do so in the same order. 
In this case, gc thread requests first the file lock, then the page lock, 
however jffs2_readpage function requests the page lock first, then the file 
lock. Another potential deadlock source is in jffs2_prepare_write, in which it 
requests page lock, then the file lock. 
-- 
David Kleikamp
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: JFFS2 deadlock with alloc_sem
  2007-07-30 16:45           ` Dave Kleikamp
@ 2007-07-31 12:10             ` David Woodhouse
  2007-07-31 12:40               ` David Woodhouse
  2007-07-31 13:23               ` Dave Kleikamp
  0 siblings, 2 replies; 23+ messages in thread
From: David Woodhouse @ 2007-07-31 12:10 UTC (permalink / raw)
  To: Dave Kleikamp; +Cc: Roberts Nathan-mcg31137, linux-mtd, ye janboe

On Mon, 2007-07-30 at 11:45 -0500, Dave Kleikamp wrote:
> Thus we conclude that the root cause of the problem is that jffs2 is not 
> conforming to the strict order of acquiring multiple locks, ie., all code 
> paths resulting in acquiring multiple locks must do so in the same order. 
> In this case, gc thread requests first the file lock, then the page lock, 
> however jffs2_readpage function requests the page lock first, then the file 
> lock. Another potential deadlock source is in jffs2_prepare_write, in which it 
> requests page lock, then the file lock. 

If that's the explanation, then the patch which Nathan tried (dropping
f->sem before jffs2_gc_fetch_page(), followed by your cleanups¹) ought
to have fixed the problem. And I'd be happier with that version rather
than introducing a new read_cache_page_async_trylock() solely for JFFS2.

It's actually OK to drop f->sem in jffs2_garbage_collect_dnode(). We
hold the alloc_sem anyway -- nobody's going to be _changing_ the file
under us. In fact, the garbage collector probably doesn't need to grab
f->sem until it's actually going to _change_ something.

-- 
dwmw2

¹ http://lists.infradead.org/pipermail/linux-mtd/2007-June/018588.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: JFFS2 deadlock with alloc_sem
  2007-07-31 12:10             ` David Woodhouse
@ 2007-07-31 12:40               ` David Woodhouse
  2007-07-31 13:23               ` Dave Kleikamp
  1 sibling, 0 replies; 23+ messages in thread
From: David Woodhouse @ 2007-07-31 12:40 UTC (permalink / raw)
  To: Dave Kleikamp; +Cc: Roberts Nathan-mcg31137, linux-mtd, ye janboe

On Tue, 2007-07-31 at 13:10 +0100, David Woodhouse wrote:
> If that's the explanation, then the patch which Nathan tried (dropping
> f->sem before jffs2_gc_fetch_page(), followed by your cleanups¹) ought
> to have fixed the problem. And I'd be happier with that version rather
> than introducing a new read_cache_page_async_trylock() solely for
> JFFS2.

This is the tidied-up version of that patch...

diff --git a/fs/jffs2/README.Locking b/fs/jffs2/README.Locking
index d14d5a4..5836de7 100644
--- a/fs/jffs2/README.Locking
+++ b/fs/jffs2/README.Locking
@@ -69,6 +69,8 @@ Ordering constraints:
 	   any f->sem held.
 	2. Never attempt to lock two file semaphores in one thread.
 	   No ordering rules have been made for doing so.
+(Linux)	3. f->sem must always be locked _after_ the page lock of
+	   any page cache page belonging to the file in question.
 
 
 	erase_completion_lock spinlock
diff --git a/fs/jffs2/file.c b/fs/jffs2/file.c
index c253019..4c09531 100644
--- a/fs/jffs2/file.c
+++ b/fs/jffs2/file.c
@@ -69,7 +69,7 @@ const struct address_space_operations jffs2_file_address_operations =
 	.commit_write =	jffs2_commit_write
 };
 
-static int jffs2_do_readpage_nolock (struct inode *inode, struct page *pg)
+static int jffs2_do_readpage(struct inode *inode, struct page *pg)
 {
 	struct jffs2_inode_info *f = JFFS2_INODE_INFO(inode);
 	struct jffs2_sb_info *c = JFFS2_SB_INFO(inode->i_sb);
@@ -83,8 +83,11 @@ static int jffs2_do_readpage_nolock (struct inode *inode, struct page *pg)
 	pg_buf = kmap(pg);
 	/* FIXME: Can kmap fail? */
 
+	down(&f->sem);
+
 	ret = jffs2_read_inode_range(c, f, pg_buf, pg->index << PAGE_CACHE_SHIFT, PAGE_CACHE_SIZE);
 
+	up(&f->sem);
 	if (ret) {
 		ClearPageUptodate(pg);
 		SetPageError(pg);
@@ -102,21 +105,14 @@ static int jffs2_do_readpage_nolock (struct inode *inode, struct page *pg)
 
 int jffs2_do_readpage_unlock(struct inode *inode, struct page *pg)
 {
-	int ret = jffs2_do_readpage_nolock(inode, pg);
+	int ret = jffs2_do_readpage(inode, pg);
 	unlock_page(pg);
 	return ret;
 }
 
-
-static int jffs2_readpage (struct file *filp, struct page *pg)
+static int jffs2_readpage(struct file *filp, struct page *pg)
 {
-	struct jffs2_inode_info *f = JFFS2_INODE_INFO(pg->mapping->host);
-	int ret;
-
-	down(&f->sem);
-	ret = jffs2_do_readpage_unlock(pg->mapping->host, pg);
-	up(&f->sem);
-	return ret;
+	return jffs2_do_readpage_unlock(pg->mapping->host, pg);
 }
 
 static int jffs2_prepare_write (struct file *filp, struct page *pg,
@@ -194,11 +190,9 @@ static int jffs2_prepare_write (struct file *filp, struct page *pg,
 	}
 
 	/* Read in the page if it wasn't already present, unless it's a whole page */
-	if (!PageUptodate(pg) && (start || end < PAGE_CACHE_SIZE)) {
-		down(&f->sem);
-		ret = jffs2_do_readpage_nolock(inode, pg);
-		up(&f->sem);
-	}
+	if (!PageUptodate(pg) && (start || end < PAGE_CACHE_SIZE))
+		ret = jffs2_do_readpage(inode, pg);
+
 	D1(printk(KERN_DEBUG "end prepare_write(). pg->flags %lx\n", pg->flags));
 	return ret;
 }
diff --git a/fs/jffs2/fs.c b/fs/jffs2/fs.c
diff --git a/fs/jffs2/gc.c b/fs/jffs2/gc.c
index eded819..51cb3d1 100644
--- a/fs/jffs2/gc.c
+++ b/fs/jffs2/gc.c
@@ -1218,7 +1218,9 @@ static int jffs2_garbage_collect_dnode(struct jffs2_sb_info *c, struct jffs2_era
 	 *    page OK. We'll actually write it out again in commit_write, which is a little
 	 *    suboptimal, but at least we're correct.
 	 */
+	up(&f->sem);
 	pg_ptr = jffs2_gc_fetch_page(c, f, start, &pg);
+	down(&f->sem);
 
 	if (IS_ERR(pg_ptr)) {
 		printk(KERN_WARNING "read_cache_page() returned error: %ld\n", PTR_ERR(pg_ptr));
diff --git a/fs/jffs2/os-linux.h b/fs/jffs2/os-linux.h

-- 
dwmw2

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: JFFS2 deadlock with alloc_sem
  2007-07-31 12:10             ` David Woodhouse
  2007-07-31 12:40               ` David Woodhouse
@ 2007-07-31 13:23               ` Dave Kleikamp
  2007-07-31 15:23                 ` David Woodhouse
  1 sibling, 1 reply; 23+ messages in thread
From: Dave Kleikamp @ 2007-07-31 13:23 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Roberts Nathan-mcg31137, linux-mtd, ye janboe

[-- Attachment #1: Type: text/plain, Size: 2452 bytes --]

On Tue, 2007-07-31 at 13:10 +0100, David Woodhouse wrote:
> On Mon, 2007-07-30 at 11:45 -0500, Dave Kleikamp wrote:
> > Thus we conclude that the root cause of the problem is that jffs2 is not 
> > conforming to the strict order of acquiring multiple locks, ie., all code 
> > paths resulting in acquiring multiple locks must do so in the same order. 
> > In this case, gc thread requests first the file lock, then the page lock, 
> > however jffs2_readpage function requests the page lock first, then the file 
> > lock. Another potential deadlock source is in jffs2_prepare_write, in which it 
> > requests page lock, then the file lock. 
> 
> If that's the explanation, then the patch which Nathan tried (dropping
> f->sem before jffs2_gc_fetch_page(), followed by your cleanups¹) ought
> to have fixed the problem. And I'd be happier with that version rather
> than introducing a new read_cache_page_async_trylock() solely for JFFS2.
> 
> It's actually OK to drop f->sem in jffs2_garbage_collect_dnode(). We
> hold the alloc_sem anyway -- nobody's going to be _changing_ the file
> under us. In fact, the garbage collector probably doesn't need to grab
> f->sem until it's actually going to _change_ something.

We had tried a similar patch, attached here, but it caused problems.
Maybe our patch is missing something.

>From the bug report:
-----------------------------
Built and ran the 2nd patch (attachment 28493 [edit]).  Results are similar
as before, jffs2 runs for a little while, but soon complains there's 
already data at the point where it intends to write. 

ARGH. About to write node to 0x00140010 on flash, but there's data already 
there:
0x00140010: 19 85 e0 02 00 00 00 ac 0a 3e 48 74 00 00 00 74
argh. node added in wrong place
Node totlen on flash (0x00000004) != totlen in node ref (0x000000ac)
ARGH. About to write node to 0x00140010 on flash, but there's data already 
there:
0x00140010: 19 85 e0 02 00 00 00 04 08 34 00 74 00 00 00 74
argh. node added in wrong place
ARGH. About to write node to 0xc01a4600 on flash, but there's data already 
there:
0xc01a4600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Write of 324 bytes at 0xc01a4600 failed. returned 0, retlen 0
Not marking the space at 0xc01a4600 as dirty because the flash driver returned 
retlen zero

It appeared to deadlock here, and after a few minutes the unit check timer 
stepped in a rebooted the system. 
-- 
David Kleikamp
IBM Linux Technology Center

[-- Attachment #2: up_down_3.patch --]
[-- Type: text/x-patch, Size: 1553 bytes --]

diff -Nurp devel-2.4.18/fs/jffs2/file.c linux/fs/jffs2/file.c
--- devel-2.4.18/fs/jffs2/file.c	2007-06-07 13:01:11.000000000 -0500
+++ linux/fs/jffs2/file.c	2007-06-08 10:12:54.000000000 -0500
@@ -333,21 +333,19 @@ int jffs2_do_readpage_nolock (struct ino
 
 int jffs2_do_readpage_unlock(struct inode *inode, struct page *pg)
 {
+	struct jffs2_inode_info *f = JFFS2_INODE_INFO(inode);
+
+	down(&f->sem);
 	int ret = jffs2_do_readpage_nolock(inode, pg);
 	UnlockPage(pg);
+	up(&f->sem);
 	return ret;
 }
 
 
 int jffs2_readpage (struct file *filp, struct page *pg)
 {
-	struct jffs2_inode_info *f = JFFS2_INODE_INFO(filp->f_dentry->d_inode);
-	int ret;
-	
-	down(&f->sem);
-	ret = jffs2_do_readpage_unlock(filp->f_dentry->d_inode, pg);
-	up(&f->sem);
-	return ret;
+	return jffs2_do_readpage_unlock(filp->f_dentry->d_inode, pg);
 }
 
 int jffs2_prepare_write (struct file *filp, struct page *pg, unsigned start, unsigned end)
diff -Nurp devel-2.4.18/fs/jffs2/gc.c linux/fs/jffs2/gc.c
--- devel-2.4.18/fs/jffs2/gc.c	2007-06-07 13:01:09.000000000 -0500
+++ linux/fs/jffs2/gc.c	2007-06-08 10:12:54.000000000 -0500
@@ -703,7 +703,9 @@ static int jffs2_garbage_collect_dnode(s
 	 *    page OK. We'll actually write it out again in commit_write, which is a little
 	 *    suboptimal, but at least we're correct.
 	 */
+	up(f->sem);
 	pg = read_cache_page(inode->i_mapping, start >> PAGE_CACHE_SHIFT, (void *)jffs2_do_readpage_unlock, inode);
+	down(f->sem);
 
 	if (IS_ERR(pg)) {
 		printk(KERN_WARNING "read_cache_page() returned error: %ld\n", PTR_ERR(pg));

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: JFFS2 deadlock with alloc_sem
  2007-07-31 13:23               ` Dave Kleikamp
@ 2007-07-31 15:23                 ` David Woodhouse
  2007-07-31 15:36                   ` Dave Kleikamp
  0 siblings, 1 reply; 23+ messages in thread
From: David Woodhouse @ 2007-07-31 15:23 UTC (permalink / raw)
  To: Dave Kleikamp; +Cc: Roberts Nathan-mcg31137, linux-mtd, ye janboe

On Tue, 2007-07-31 at 08:23 -0500, Dave Kleikamp wrote:
> We had tried a similar patch, attached here, but it caused problems.
> Maybe our patch is missing something.
> 
> >From the bug report:
> -----------------------------
> Built and ran the 2nd patch (attachment 28493 [edit]).  Results are similar
> as before, jffs2 runs for a little while, but soon complains there's 
> already data at the point where it intends to write. 
> 
> ARGH. About to write node to 0x00140010 on flash, but there's data already 
> there:
> 0x00140010: 19 85 e0 02 00 00 00 ac 0a 3e 48 74 00 00 00 74 

Hm, I think that's probably a _separate_ problem, which you happened to
exacerbate. I don't see how it's related to f->sem locking.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: JFFS2 deadlock with alloc_sem
  2007-07-31 15:23                 ` David Woodhouse
@ 2007-07-31 15:36                   ` Dave Kleikamp
  2007-07-31 16:23                     ` David Woodhouse
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Kleikamp @ 2007-07-31 15:36 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Roberts Nathan-mcg31137, linux-mtd, ye janboe

On Tue, 2007-07-31 at 16:23 +0100, David Woodhouse wrote:
> On Tue, 2007-07-31 at 08:23 -0500, Dave Kleikamp wrote:
> > We had tried a similar patch, attached here, but it caused problems.
> > Maybe our patch is missing something.
> > 
> > >From the bug report:
> > -----------------------------
> > Built and ran the 2nd patch (attachment 28493 [edit]).  Results are similar
> > as before, jffs2 runs for a little while, but soon complains there's 
> > already data at the point where it intends to write. 
> > 
> > ARGH. About to write node to 0x00140010 on flash, but there's data already 
> > there:
> > 0x00140010: 19 85 e0 02 00 00 00 ac 0a 3e 48 74 00 00 00 74 
> 
> Hm, I think that's probably a _separate_ problem, which you happened to
> exacerbate. I don't see how it's related to f->sem locking.

Could be.  Remember this was on a 2.4 kernel as well, so mileage may
vary on the latest code.

Shaggy
-- 
David Kleikamp
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: JFFS2 deadlock with alloc_sem
  2007-07-31 15:36                   ` Dave Kleikamp
@ 2007-07-31 16:23                     ` David Woodhouse
  2007-08-02  4:11                       ` ye janboe
  0 siblings, 1 reply; 23+ messages in thread
From: David Woodhouse @ 2007-07-31 16:23 UTC (permalink / raw)
  To: Dave Kleikamp; +Cc: Roberts Nathan-mcg31137, linux-mtd, ye janboe

On Tue, 2007-07-31 at 10:36 -0500, Dave Kleikamp wrote:
> Could be.  Remember this was on a 2.4 kernel as well, so mileage may
> vary on the latest code. 

Right. NAND was never even properly supported on 2.4.

Of course, there's still the fact that Nathan reports that this approach
_exacerbates_ his problem rather than fixing it. If your patch actually
fixes something for him, we'll need to look more closely at why (and if
not, we need to look more closely at his problem anyway).

-- 
dwmw2

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: JFFS2 deadlock with alloc_sem
  2007-07-31 16:23                     ` David Woodhouse
@ 2007-08-02  4:11                       ` ye janboe
  0 siblings, 0 replies; 23+ messages in thread
From: ye janboe @ 2007-08-02  4:11 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Roberts Nathan-mcg31137, linux-mtd, Dave Kleikamp

How can we tell which page is locked by which process?

Thanks

2007/8/1, David Woodhouse <dwmw2@infradead.org>:
> On Tue, 2007-07-31 at 10:36 -0500, Dave Kleikamp wrote:
> > Could be.  Remember this was on a 2.4 kernel as well, so mileage may
> > vary on the latest code.
>
> Right. NAND was never even properly supported on 2.4.
>
> Of course, there's still the fact that Nathan reports that this approach
> _exacerbates_ his problem rather than fixing it. If your patch actually
> fixes something for him, we'll need to look more closely at why (and if
> not, we need to look more closely at his problem anyway).
>
> --
> dwmw2
>
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2007-08-02  4:11 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-30 19:41 JFFS2 deadlock with alloc_sem Roberts Nathan-mcg31137
2007-05-05  8:23 ` David Woodhouse
2007-06-02 17:42 ` David Woodhouse
2007-06-05 14:21   ` Roberts Nathan-mcg31137
2007-06-07 14:29   ` Josh Boyer
  -- strict thread matches above, loose matches on Subject: below --
2007-06-08 19:26 Dave Kleikamp
2007-06-11 12:14 ` David Woodhouse
2007-06-12  1:45   ` Roberts Nathan-mcg31137
2007-06-19 16:11     ` Dave Kleikamp
2007-06-19 19:42       ` Dave Kleikamp
     [not found] <af3ea28a0707262032h7ee22775t6ef54e364a9cd704@mail.gmail.com>
2007-07-27  3:38 ` ye janboe
2007-07-27 13:42   ` Dave Kleikamp
2007-07-27 16:35     ` ye janboe
2007-07-27 17:38       ` Dave Kleikamp
2007-07-30 12:45         ` David Woodhouse
2007-07-30 16:45           ` Dave Kleikamp
2007-07-31 12:10             ` David Woodhouse
2007-07-31 12:40               ` David Woodhouse
2007-07-31 13:23               ` Dave Kleikamp
2007-07-31 15:23                 ` David Woodhouse
2007-07-31 15:36                   ` Dave Kleikamp
2007-07-31 16:23                     ` David Woodhouse
2007-08-02  4:11                       ` ye janboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).