linux-mtd.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* JFFS2 deadlock with alloc_sem
@ 2007-06-08 19:26 Dave Kleikamp
  2007-06-11 12:14 ` David Woodhouse
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Kleikamp @ 2007-06-08 19:26 UTC (permalink / raw)
  To: dwmw2; +Cc: Nathan.Roberts, linux-mtd

Forgive me for not following up properly, but I'm not on the mailing
list, and I'm following up from the archives.

> On Mon, 2007-04-30 at 15:41 -0400, Roberts Nathan-mcg31137 wrote:
> > Has anyone seen this deadlock before? It seems to be a classic deadlock 
> > situation so I'm not sure if maybe I'm misinterpreting things or 
> > the use case (several postmark tests running in parallel on a
> > preemptible kernel) is especially vulnerable. 
> 
> I think Josh has spotted the real problem here. Does this help? If so,
> as better fix will be forthcoming....
> 
> diff --git a/fs/jffs2/gc.c b/fs/jffs2/gc.c
> index 2d99e06..1066120 100644
> --- a/fs/jffs2/gc.c
> +++ b/fs/jffs2/gc.c
> @@ -1218,7 +1218,9 @@ static int jffs2_garbage_collect_dnode(struct jffs2_sb_info *c, struct jffs2_era
>  	 *    page OK. We'll actually write it out again in commit_write, which is a little
>  	 *    suboptimal, but at least we're correct.
>  	 */
> +	up(&f->sem);
>  	pg_ptr = jffs2_gc_fetch_page(c, f, start, &pg);
> +	down(&f->sem);
>  
>  	if (IS_ERR(pg_ptr)) {
>  		printk(KERN_WARNING "read_cache_page() returned error: %ld\n", PTR_ERR(pg_ptr));

As you know, jffs2_gc_fetch_page() causes read_cache_page() to call
jffs2_do_readpage_unlock().  jffs2_do_readpage_unlock() expects f->sem
to be held, but in no longer is.  I think, with this change,
jffs2_do_readpage_unlock() is the right place to take f->sem.  What do
you think of this patch, and does it have any affect on Nathan's
deadlock?

Note: If this works out, jffs2_readpage and jffs2_do_readpage_unlock can
probably be collapsed into one function.

diff -Nurp linux-2.6.22-rc4/fs/jffs2/file.c linux/fs/jffs2/file.c
--- linux-2.6.22-rc4/fs/jffs2/file.c	2007-06-05 08:10:59.000000000 -0500
+++ linux/fs/jffs2/file.c	2007-06-08 13:18:18.000000000 -0500
@@ -102,21 +102,20 @@ static int jffs2_do_readpage_nolock (str
 
 int jffs2_do_readpage_unlock(struct inode *inode, struct page *pg)
 {
-	int ret = jffs2_do_readpage_nolock(inode, pg);
+	struct jffs2_inode_info *f = JFFS2_INODE_INFO(inode);
+	int ret;
+
+	down(&f->sem);
+	ret = jffs2_do_readpage_nolock(inode, pg);
 	unlock_page(pg);
+	up(&f->sem);
 	return ret;
 }
 
 
 static int jffs2_readpage (struct file *filp, struct page *pg)
 {
-	struct jffs2_inode_info *f = JFFS2_INODE_INFO(pg->mapping->host);
-	int ret;
-
-	down(&f->sem);
-	ret = jffs2_do_readpage_unlock(pg->mapping->host, pg);
-	up(&f->sem);
-	return ret;
+	return jffs2_do_readpage_unlock(pg->mapping->host, pg);
 }
 
 static int jffs2_prepare_write (struct file *filp, struct page *pg,
diff -Nurp linux-2.6.22-rc4/fs/jffs2/gc.c linux/fs/jffs2/gc.c
--- linux-2.6.22-rc4/fs/jffs2/gc.c	2007-06-05 08:10:59.000000000 -0500
+++ linux/fs/jffs2/gc.c	2007-06-08 13:16:29.000000000 -0500
@@ -1218,7 +1218,9 @@ static int jffs2_garbage_collect_dnode(s
 	 *    page OK. We'll actually write it out again in commit_write, which is a little
 	 *    suboptimal, but at least we're correct.
 	 */
+	up(&f->sem);
 	pg_ptr = jffs2_gc_fetch_page(c, f, start, &pg);
+	down(&f->sem);
 
 	if (IS_ERR(pg_ptr)) {
 		printk(KERN_WARNING "read_cache_page() returned error: %ld\n", PTR_ERR(pg_ptr));

Thanks,
Shaggy
-- 
David Kleikamp
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 23+ messages in thread
[parent not found: <af3ea28a0707262032h7ee22775t6ef54e364a9cd704@mail.gmail.com>]
* JFFS2 deadlock with alloc_sem
@ 2007-04-30 19:41 Roberts Nathan-mcg31137
  2007-05-05  8:23 ` David Woodhouse
  2007-06-02 17:42 ` David Woodhouse
  0 siblings, 2 replies; 23+ messages in thread
From: Roberts Nathan-mcg31137 @ 2007-04-30 19:41 UTC (permalink / raw)
  To: linux-mtd

Has anyone seen this deadlock before? It seems to be a classic deadlock 
situation so I'm not sure if maybe I'm misinterpreting things or 
the use case (several postmark tests running in parallel on a
preemptible
kernel) is especially vulnerable. 

If the lock at NOTE2 is aquired followed by the lock at NOTE4, it seems 
like deadlock is inevitable. 

Kernel is 2.6.10 based. (although I've looked at 2.6.18 sources and it 
appears that the scenario is possible there as well.)

Kernel preemption is enabled. 

Flash is large page NAND (we've also got it to happen on small page NAND

but it's more difficult). 

It's not always the GC thread which is the other party.  We've also 
gotten other callers of garbage_collect_pass() to end up in the same 
situation.

Thread1
--
[<c023fd60>] (__schedule+0x0/0x5b0) from [<c024047c>]
(schedule+0xec/0x124)
[<c0240390>] (schedule+0x0/0x124) from [<c023f930>]
(__compat_down+0xe0/0x178)
 r4 = C0EA4000 
[<c023f850>] (__compat_down+0x0/0x178) from [<c023f7ec>]
(__compat_down_failed+0xc/0x20)
 r8 = C001462C  r7 = C2B7C2E8  r6 = C2B7C2E8  r5 = C0014600
 r4 = C2B7C30C 
[<c00d5b60>] (jffs2_reserve_space+0x0/0x268) from [<c00d80b4>]
(jffs2_write_inode_range+0x5c/0x468)
NOTE1: jffs2_reserve_space() releases and reaquires alloc_sem

[<c00d8058>] (jffs2_write_inode_range+0x0/0x468) from [<c00d30ac>]
(jffs2_commit_write+0x1b0/0x31c)
[<c00d2efc>] (jffs2_commit_write+0x0/0x31c) from [<c00638d8>]
(generic_file_buffered_write+0x3e4/0x64c)
[<c00634f4>] (generic_file_buffered_write+0x0/0x64c) from [<c00641c8>]
(__generic_file_aio_write_nolock+0x488/0x4b4)
NOTE2: generic_file_buffered_write() gets page_lock

[<c0063d40>] (__generic_file_aio_write_nolock+0x0/0x4b4) from
[<c0064274>] (__generic_file_write_nolock+0x80/0xac)
[<c00641f4>] (__generic_file_write_nolock+0x0/0xac) from [<c00643c4>]
(generic_file_write+0x58/0xdc)
 r9 = C0EA4000  r8 = C1CC3A00  r6 = 40030008  r5 = C1CC3A9C
 r4 = C1CC3A6C 
[<c006436c>] (generic_file_write+0x0/0xdc) from [<c007eeb4>]
(vfs_write+0xec/0x170)
[<c007edc8>] (vfs_write+0x0/0x170) from [<c007eff4>]
(sys_write+0x48/0x74)
 r8 = C0025154  r7 = 00000004  r6 = C26E7960  r5 = 00000000
 r4 = 0002D000 
[<c007efac>] (sys_write+0x0/0x74) from [<c00249a0>]
(ret_fast_syscall+0x0/0x34)
 r6 = 00000004  r5 = 0002D000  r4 = 0008A0FF 


Thread2
--
[<c023fd60>] (__schedule+0x0/0x5b0) from [<c024047c>]
(schedule+0xec/0x124)
[<c0240390>] (schedule+0x0/0x124) from [<c0241040>]
(io_schedule+0x2c/0x48)
 r4 = C02E01A8 
[<c0241014>] (io_schedule+0x0/0x48) from [<c0060bbc>]
(sync_page+0x40/0x48)
 r5 = 00000000  r4 = C34F7CF8 
[<c0060b7c>] (sync_page+0x0/0x48) from [<c024140c>]
(__wait_on_bit_lock+0x54/0x88)
[<c02413b8>] (__wait_on_bit_lock+0x0/0x88) from [<c00614a4>]
(__lock_page+0x88/0x98)
[<c006141c>] (__lock_page+0x0/0x98) from [<c006301c>]
(read_cache_page+0x21c/0x324)
 r5 = 00000000  r4 = C0336FE0 
[<c0062e00>] (read_cache_page+0x0/0x324) from [<c00df8f8>]
(jffs2_gc_fetch_page+0x2c/0x64)
[<c00df8cc>] (jffs2_gc_fetch_page+0x0/0x64) from [<c00dc664>]
(jffs2_garbage_collect_pass+0x14a4/0x1c28)
NOTE3: jffs2_gc_fetch_page() attempts to get page_lock

[<c00db1c0>] (jffs2_garbage_collect_pass+0x0/0x1c28) from [<c00de684>]
(jffs2_garbage_collect_thread+0x148/0x19c)
NOTE4: garbage_collect_pass() aquires alloc_sem

[<c00de53c>] (jffs2_garbage_collect_thread+0x0/0x19c) from [<c00444e8>]
(do_exit+0x0/0xd88)
 r7 = 00000000  r6 = 00000000  r5 = 00000000  r4 = 00000000

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2007-08-02  4:11 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-06-08 19:26 JFFS2 deadlock with alloc_sem Dave Kleikamp
2007-06-11 12:14 ` David Woodhouse
2007-06-12  1:45   ` Roberts Nathan-mcg31137
2007-06-19 16:11     ` Dave Kleikamp
2007-06-19 19:42       ` Dave Kleikamp
     [not found] <af3ea28a0707262032h7ee22775t6ef54e364a9cd704@mail.gmail.com>
2007-07-27  3:38 ` ye janboe
2007-07-27 13:42   ` Dave Kleikamp
2007-07-27 16:35     ` ye janboe
2007-07-27 17:38       ` Dave Kleikamp
2007-07-30 12:45         ` David Woodhouse
2007-07-30 16:45           ` Dave Kleikamp
2007-07-31 12:10             ` David Woodhouse
2007-07-31 12:40               ` David Woodhouse
2007-07-31 13:23               ` Dave Kleikamp
2007-07-31 15:23                 ` David Woodhouse
2007-07-31 15:36                   ` Dave Kleikamp
2007-07-31 16:23                     ` David Woodhouse
2007-08-02  4:11                       ` ye janboe
  -- strict thread matches above, loose matches on Subject: below --
2007-04-30 19:41 Roberts Nathan-mcg31137
2007-05-05  8:23 ` David Woodhouse
2007-06-02 17:42 ` David Woodhouse
2007-06-05 14:21   ` Roberts Nathan-mcg31137
2007-06-07 14:29   ` Josh Boyer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).