linux-mtd.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* JFFS2 deadlock
@ 2016-01-27 15:36 Szabó Tamás
  2016-01-27 16:05 ` Joakim Tjernlund
  0 siblings, 1 reply; 11+ messages in thread
From: Szabó Tamás @ 2016-01-27 15:36 UTC (permalink / raw)
  To: linux-mtd

Hello all,

I work on an embedded system running Linux 3.10 and found a deadlock
situation between jffs2_readpage and jffs2_write.
The problem is present on the latest 4.4 kernel too and occurs when
two tasks want to access the same file, one reads and the other writes it.

The kernel stack traces for writer and reader in deadlock:

__switch_to+0x4c/0x98
sleep_on_page+0x10/0x24
__lock_page+0x8c/0x9c
find_lock_page+0x7c/0x94
grab_cache_page_write_begin+0x64/0xd8
jffs2_write_begin+0x6c/0x2ec
generic_file_buffered_write+0x188/0x258
__generic_file_aio_write+0x1e0/0x484
generic_file_aio_write+0x70/0xfc
do_sync_write+0x7c/0xd4
vfs_write+0xc8/0x1b0
SyS_write+0x4c/0xa8
ret_from_syscall+0x0/0x38

__switch_to+0x4c/0x98
jffs2_readpage+0x28/0x5c
generic_file_aio_read+0x22c/0x7a0
do_sync_read+0x7c/0xd4
vfs_read+0xb0/0x170
SyS_read+0x4c/0xa8
ret_from_syscall+0x0/0x38

The root cause here is the locking order of f->sem mutex and pagelock.
jffs2_readpage function gets the page in locked state and then locks
the f->sem mutex, while jffs2_write_begin does it in reverse order.

I found a commit that brought in this bug.
That was a fix for another deadlock issue:
https://github.com/torvalds/linux/commit/5ffd3412ae5536a4c57469cb8ea31887121dcb2e

According to this commit and my code inspections the lock orders may be
the following:
readpage: page lock, f->sem
writepage_begin: f->sem, page lock
writepage_end: page lock, f->sem
GC: f->sem, page lock


Reproducing:
Besides the physical device I can reproduce the deadlock on a desktop debian8
virtual machine too, with a JFFS2 filesystem created on top of nandsim device.
I made it in the following way:

modprobe nandsim first_id_byte=0x20 second_id_byte=0x55
modprobe mtdblock
mkdir /mnt/jffs2
mount -t jffs2 /dev/mtdblock0 /mnt/jffs2/
TEST_FILE="/mnt/jffs2/test"
( while [ true ]; do date > $TEST_FILE; done ) &
( while [ true ]; do cat $TEST_FILE >/dev/null; done ) &

In a short time the date and cat processes will stuck in uninterruptible
sleep state.

Is it a known issue? If not, is there anyone who is familiar with JFFS2
internals and could help me how to correct it?

Best regards,
Tamas

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-02-25 17:57 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-27 15:36 JFFS2 deadlock Szabó Tamás
2016-01-27 16:05 ` Joakim Tjernlund
2016-01-28  0:05   ` Brian Norris
2016-01-28  8:16     ` Thomas.Betker
2016-02-01 14:28       ` David Woodhouse
2016-02-01 18:54         ` Thomas.Betker
     [not found]         ` <OF2969B332.8B296F0B-ONC1257F4C.006307EB-C1257F4C.0067DE44@LocalDomain>
2016-02-18  9:57           ` Thomas.Betker
2016-02-25  7:46             ` Joakim Tjernlund
2016-02-25  9:57               ` Thomas.Betker
2016-02-25 11:22                 ` David Woodhouse
2016-02-25 17:57                   ` Brian Norris

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).