* Re: JFFS2 deadlock [not found] ` <1453910781.20662.35.camel@infinera.com> @ 2016-01-28 0:05 ` Brian Norris 2016-01-28 8:16 ` Thomas.Betker 0 siblings, 1 reply; 9+ messages in thread From: Brian Norris @ 2016-01-28 0:05 UTC (permalink / raw) To: Joakim Tjernlund, David Woodhouse Cc: sztomi89@gmail.com, linux-mtd@lists.infradead.org, linux-fsdevel, linux-kernel, David Woodhouse, Artem Bityutskiy, Thomas Betker, Ming Liu, Deng Chao, wangzaiwei, Alexander Viro + David (maintainer), linux-fsdevel, and others On Wed, Jan 27, 2016 at 04:05:35PM +0000, Joakim Tjernlund wrote: > On Wed, 2016-01-27 at 16:36 +0100, Szabó Tamás wrote: > > Hello all, > > > > I work on an embedded system running Linux 3.10 and found a deadlock > > situation between jffs2_readpage and jffs2_write. > > The problem is present on the latest 4.4 kernel too and occurs when > > two tasks want to access the same file, one reads and the other writes it. > > > > The kernel stack traces for writer and reader in deadlock: > > > > __switch_to+0x4c/0x98 > > sleep_on_page+0x10/0x24 > > __lock_page+0x8c/0x9c > > find_lock_page+0x7c/0x94 > > grab_cache_page_write_begin+0x64/0xd8 > > jffs2_write_begin+0x6c/0x2ec > > generic_file_buffered_write+0x188/0x258 > > __generic_file_aio_write+0x1e0/0x484 > > generic_file_aio_write+0x70/0xfc > > do_sync_write+0x7c/0xd4 > > vfs_write+0xc8/0x1b0 > > SyS_write+0x4c/0xa8 > > ret_from_syscall+0x0/0x38 > > > > __switch_to+0x4c/0x98 > > jffs2_readpage+0x28/0x5c > > generic_file_aio_read+0x22c/0x7a0 > > do_sync_read+0x7c/0xd4 > > vfs_read+0xb0/0x170 > > SyS_read+0x4c/0xa8 > > ret_from_syscall+0x0/0x38 > > > > The root cause here is the locking order of f->sem mutex and pagelock. > > jffs2_readpage function gets the page in locked state and then locks > > the f->sem mutex, while jffs2_write_begin does it in reverse order. > > > > I found a commit that brought in this bug. > > That was a fix for another deadlock issue: > > https://github.com/torvalds/linux/commit/5ffd3412ae5536a4c57469cb8ea31887121dcb2e > > > > According to this commit and my code inspections the lock orders may be > > the following: > > readpage: page lock, f->sem > > writepage_begin: f->sem, page lock > > writepage_end: page lock, f->sem > > GC: f->sem, page lock > > I am not sure if this is the first time I hear this or if someone else has reported > a similar issue. No, I'm pretty sure this is not the first report. I think there have even been patches. The problem is that JFFS2 is effectively unmaintained, despite what MAINTAINERS has to say about it. Previous reports: Subject: Another JFFS2 deadlock, kernel 3.4.11 http://thread.gmane.org/gmane.linux.drivers.mtd/62523 Subject: [JFFS2] Revision "jffs2: Fix lock acquisition order bug in jffs2_write_begin" introduces another dead lock. http://thread.gmane.org/gmane.linux.drivers.mtd/47986 There are other reports of deadlocks in jffs2_readpage, but in my limited scanning, they look slightly different, so I won't include them in this list. For reference: outstanding patches, waiting for a maintainer (I've been keeping patchwork up-to-date, mostly, but I'm not touching JFFS2 myself, for the most part): http://patchwork.ozlabs.org/project/linux-mtd/list/?q=jffs2 I'm tempted to resurrect this patch, to mark JFFS2 as Orphaned / Obsolete: http://patchwork.ozlabs.org/patch/422160/ David, can you please clarify your role here? Are you maintaining JFFS2 or not? Or perhaps someone else should be added? I don't really know any interested parties. Maybe the MAINTAINERS entry should be directed to linux-fsdevel too? Brian ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: JFFS2 deadlock 2016-01-28 0:05 ` JFFS2 deadlock Brian Norris @ 2016-01-28 8:16 ` Thomas.Betker 2016-02-01 14:28 ` David Woodhouse 0 siblings, 1 reply; 9+ messages in thread From: Thomas.Betker @ 2016-01-28 8:16 UTC (permalink / raw) To: computersforpeace Cc: Artem Bityutskiy, Deng Chao, David Woodhouse, Joakim Tjernlund, linux-fsdevel, linux-kernel, linux-mtd@lists.infradead.org, linux-mtd, Ming Liu, sztomi89@gmail.com, Thomas Betker, Alexander Viro, wangzaiwei Hello Brian: > No, I'm pretty sure this is not the first report. I think there have > even been patches. The problem is that JFFS2 is effectively > unmaintained, despite what MAINTAINERS has to say about it. > > Previous reports: > > Subject: Another JFFS2 deadlock, kernel 3.4.11 > http://thread.gmane.org/gmane.linux.drivers.mtd/62523 > > Subject: [JFFS2] Revision "jffs2: Fix lock acquisition order bug in > jffs2_write_begin" introduces another dead lock. > http://thread.gmane.org/gmane.linux.drivers.mtd/47986 > For reference: outstanding patches, waiting for a maintainer (I've been > keeping patchwork up-to-date, mostly, but I'm not touching JFFS2 myself, > for the most part): > http://patchwork.ozlabs.org/project/linux-mtd/list/?q=jffs2 Subject: [PATCH] Revert "jffs2: Fix lock acquisition order bug in jffs2_write_begin" http://article.gmane.org/gmane.linux.drivers.mtd/62951 This is a patch revising my original patch, which I sent to linux-mtd on 10-Nov-2015. I didn't see a response yet, but it's one of the outstanding patches above. Best regards, Thomas ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: JFFS2 deadlock 2016-01-28 8:16 ` Thomas.Betker @ 2016-02-01 14:28 ` David Woodhouse 2016-02-01 18:54 ` Thomas.Betker [not found] ` <OF2969B332.8B296F0B-ONC1257F4C.006307EB-C1257F4C.0067DE44@LocalDomain> 0 siblings, 2 replies; 9+ messages in thread From: David Woodhouse @ 2016-02-01 14:28 UTC (permalink / raw) To: Thomas.Betker, computersforpeace Cc: linux-mtd, Artem Bityutskiy, Thomas Betker, linux-kernel, Joakim Tjernlund, sztomi89@gmail.com, wangzaiwei, linux-mtd@lists.infradead.org, Alexander Viro, Ming Liu, linux-fsdevel, Deng Chao [-- Attachment #1: Type: text/plain, Size: 3738 bytes --] On Thu, 2016-01-28 at 09:16 +0100, Thomas.Betker@rohde-schwarz.com wrote: > > Subject: [PATCH] Revert "jffs2: Fix lock acquisition order bug in > jffs2_write_begin" > http://article.gmane.org/gmane.linux.drivers.mtd/62951 > > This is a patch revising my original patch, which I sent to linux-mtd on > 10-Nov-2015. I didn't see a response yet, but it's one of the outstanding > patches above. That looks necessary but not sufficient. I think we need this (untested) patch on top of it, to ensure that we *always* take the page lock before f->sem? Please could you try what's in the tree at http://git.infradead.org/users/dwmw2/jffs2-fixes.git ---- From: David Woodhouse <David.Woodhouse@intel.com> Subject: [PATCH] jffs2: Fix page lock / f->sem deadlock With this fix, all code paths should now be obtaining the page lock before f->sem. Reported-by: Szabó Tamás <sztomi89@gmail.com> Reported-by: Thomas Betker <thomas.betker@rohde-schwarz.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Cc: stable@vger.kernel.org --- fs/jffs2/README.Locking | 5 +---- fs/jffs2/gc.c | 17 ++++++++++------- 2 files changed, 11 insertions(+), 11 deletions(-) diff --git a/fs/jffs2/README.Locking b/fs/jffs2/README.Locking index 3ea3655..8918ac9 100644 --- a/fs/jffs2/README.Locking +++ b/fs/jffs2/README.Locking @@ -2,10 +2,6 @@ JFFS2 LOCKING DOCUMENTATION --------------------------- -At least theoretically, JFFS2 does not require the Big Kernel Lock -(BKL), which was always helpfully obtained for it by Linux 2.4 VFS -code. It has its own locking, as described below. - This document attempts to describe the existing locking rules for JFFS2. It is not expected to remain perfectly up to date, but ought to be fairly close. @@ -69,6 +65,7 @@ Ordering constraints: any f->sem held. 2. Never attempt to lock two file mutexes in one thread. No ordering rules have been made for doing so. + 3. Never lock a page cache page with f->sem held. erase_completion_lock spinlock diff --git a/fs/jffs2/gc.c b/fs/jffs2/gc.c index 6fb0802..5919fef 100644 --- a/fs/jffs2/gc.c +++ b/fs/jffs2/gc.c @@ -1316,14 +1316,17 @@ static int jffs2_garbage_collect_dnode(struct jffs2_sb_info *c, struct jffs2_era BUG_ON(start > orig_start); } - /* First, use readpage() to read the appropriate page into the page cache */ - /* Q: What happens if we actually try to GC the _same_ page for which commit_write() - * triggered garbage collection in the first place? - * A: I _think_ it's OK. read_cache_page shouldn't deadlock, we'll write out the - * page OK. We'll actually write it out again in commit_write, which is a little - * suboptimal, but at least we're correct. - */ + /* The rules state that we must obtain the page lock *before* f->sem, so + * drop f->sem temporarily. Since we also hold c->alloc_sem, nothing's + * actually going to *change* so we're safe; we only allow reading. + * + * It is important to note that jffs2_write_begin() will ensure that its + * page is marked Uptodate before allocating space. That means that if we + * end up here trying to GC the *same* page that jffs2_write_begin() is + * trying to write out, read_cache_page() will not deadlock. */ + mutex_unlock(&f->sem); pg_ptr = jffs2_gc_fetch_page(c, f, start, &pg); + mutex_lock(&f->sem); if (IS_ERR(pg_ptr)) { pr_warn("read_cache_page() returned error: %ld\n", -- 2.5.0 -- David Woodhouse Open Source Technology Centre David.Woodhouse@intel.com Intel Corporation [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5691 bytes --] ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: JFFS2 deadlock 2016-02-01 14:28 ` David Woodhouse @ 2016-02-01 18:54 ` Thomas.Betker [not found] ` <OF2969B332.8B296F0B-ONC1257F4C.006307EB-C1257F4C.0067DE44@LocalDomain> 1 sibling, 0 replies; 9+ messages in thread From: Thomas.Betker @ 2016-02-01 18:54 UTC (permalink / raw) To: dwmw2 Cc: computersforpeace, Artem Bityutskiy, Deng Chao, Joakim Tjernlund, linux-fsdevel, linux-kernel, linux-mtd@lists.infradead.org, linux-mtd, Ming Liu, sztomi89@gmail.com, Thomas Betker, Alexander Viro, wangzaiwei Hello David: > > Subject: [PATCH] Revert "jffs2: Fix lock acquisition order bug in > > jffs2_write_begin" > > http://article.gmane.org/gmane.linux.drivers.mtd/62951 > > > > This is a patch revising my original patch, which I sent to linux-mtd on > > 10-Nov-2015. I didn't see a response yet, but it's one of the outstanding > > patches above. > > That looks necessary but not sufficient. I think we need this > (untested) patch on top of it, to ensure that we *always* take the page > lock before f->sem? > > Please could you try what's in the tree at > http://git.infradead.org/users/dwmw2/jffs2-fixes.git I have been using a variant of Deng Chao's patch here for a long time, so that one has been tested quite a bit: http://lists.infradead.org/pipermail/linux-mtd/2013-August/048352.html. The problem with that patch was that it modified mm/filemap.c and include/linux/pagemap.h, which we were not too happy about. Your patch looks much simpler, and I will definitely test it. It may take a few days, though, as I have to unearth the test scripts, and find a time slot for testing. Best regards, Thomas Betker ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <OF2969B332.8B296F0B-ONC1257F4C.006307EB-C1257F4C.0067DE44@LocalDomain>]
* Re: JFFS2 deadlock [not found] ` <OF2969B332.8B296F0B-ONC1257F4C.006307EB-C1257F4C.0067DE44@LocalDomain> @ 2016-02-18 9:57 ` Thomas.Betker 2016-02-25 7:46 ` Joakim Tjernlund 0 siblings, 1 reply; 9+ messages in thread From: Thomas.Betker @ 2016-02-18 9:57 UTC (permalink / raw) To: David Woodhouse Cc: computersforpeace, Artem Bityutskiy, Deng Chao, Joakim Tjernlund, linux-fsdevel, linux-kernel, linux-mtd@lists.infradead.org, linux-mtd, Ming Liu, sztomi89@gmail.com, Alexander Viro, wangzaiwei Hello David: > > Please could you try what's in the tree at > > http://git.infradead.org/users/dwmw2/jffs2-fixes.git > Your patch looks much simpler, and I will definitely test it. It may > take a few days, though, as I have to unearth the test scripts, and > find a time slot for testing. Here is what I did (sorry for the wait, things were piling up): 1) Removed Deng Chao's patch from my kernel, added your patch "jffs2: Fix page lock / f->sem deadlock". I am still on linux-3.14, but jffs2 hasn't changed much since then, so this shouldn't make a difference. Added a printk() before mutex_unlock(&f->sem) to check if the prospective page was locked, i.e. if the deadlock situation actually occurs. 2) On my target system, started wangzaiwei's test (with some fixes), plus a loop copying a large file over and over (to get GC rolling, which increases the chance of a deadlock). 3) After 24 hours, the system was still alive, and the printk() had been hit 32 times. So yes, I am confident that your patch avoids the deadlock, and if that's good enough for you, please add my Tested-by:. However, I am going to run some more stress tests here just to check that there weren't any unexpected side effects. (Don't get me wrong -- I am sure the patch is fine, but for me it's a case of "once bitten, twice shy" ...) Best regards, Thomas Betker ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: JFFS2 deadlock 2016-02-18 9:57 ` Thomas.Betker @ 2016-02-25 7:46 ` Joakim Tjernlund 2016-02-25 9:57 ` Thomas.Betker 0 siblings, 1 reply; 9+ messages in thread From: Joakim Tjernlund @ 2016-02-25 7:46 UTC (permalink / raw) To: dwmw2@infradead.org, Thomas.Betker@rohde-schwarz.com Cc: linux-kernel@vger.kernel.org, linux-mtd@lists.infradead.org, sztomi89@gmail.com, wangzaiwei@top-vision.cn, deng.chao1@zte.com.cn, liu.ming50@gmail.com, dedekind1@gmail.com, viro@zeniv.linux.org.uk, linux-mtd-bounces@lists.infradead.org, linux-fsdevel@vger.kernel.org, computersforpeace@gmail.com On Thu, 2016-02-18 at 10:57 +0100, Thomas.Betker@rohde-schwarz.com wrote: > Hello David: > > > > > > > > > Please could you try what's in the tree at > > > http://git.infradead.org/users/dwmw2/jffs2-fixes.git > > > > Your patch looks much simpler, and I will definitely test it. It may > > take a few days, though, as I have to unearth the test scripts, and > > find a time slot for testing. > Here is what I did (sorry for the wait, things were piling up): > > 1) Removed Deng Chao's patch from my kernel, added your patch "jffs2: Fix > page lock / f->sem deadlock". I am still on linux-3.14, but jffs2 hasn't > changed much since then, so this shouldn't make a difference. Added a > printk() before mutex_unlock(&f->sem) to check if the prospective page was > locked, i.e. if the deadlock situation actually occurs. > > 2) On my target system, started wangzaiwei's test (with some fixes), plus > a loop copying a large file over and over (to get GC rolling, which > increases the chance of a deadlock). > > 3) After 24 hours, the system was still alive, and the printk() had been > hit 32 times. > > So yes, I am confident that your patch avoids the deadlock, and if that's > good enough for you, please add my Tested-by:. However, I am going to run > some more stress tests here just to check that there weren't any > unexpected side effects. (Don't get me wrong -- I am sure the patch is > fine, but for me it's a case of "once bitten, twice shy" ...) Can we get this upstream before next release? I don't think there will much more testing at this point. Jocke ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: JFFS2 deadlock 2016-02-25 7:46 ` Joakim Tjernlund @ 2016-02-25 9:57 ` Thomas.Betker 2016-02-25 11:22 ` David Woodhouse 0 siblings, 1 reply; 9+ messages in thread From: Thomas.Betker @ 2016-02-25 9:57 UTC (permalink / raw) To: Joakim.Tjernlund Cc: computersforpeace@gmail.com, dedekind1@gmail.com, deng.chao1@zte.com.cn, dwmw2@infradead.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mtd@lists.infradead.org, linux-mtd-bounces@lists.infradead.org, liu.ming50@gmail.com, sztomi89@gmail.com, viro@zeniv.linux.org.uk, wangzaiwei@top-vision.cn Hello Joakim: > Can we get this upstream before next release? I don't think there > will much more > testing at this point. I would second this. Actually, I did some additional stress testing, but didn't see any problems. Best regards, Thomas ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: JFFS2 deadlock 2016-02-25 9:57 ` Thomas.Betker @ 2016-02-25 11:22 ` David Woodhouse 2016-02-25 17:57 ` Brian Norris 0 siblings, 1 reply; 9+ messages in thread From: David Woodhouse @ 2016-02-25 11:22 UTC (permalink / raw) To: Thomas.Betker, Joakim.Tjernlund Cc: computersforpeace@gmail.com, dedekind1@gmail.com, deng.chao1@zte.com.cn, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mtd@lists.infradead.org, linux-mtd-bounces@lists.infradead.org, liu.ming50@gmail.com, sztomi89@gmail.com, viro@zeniv.linux.org.uk, wangzaiwei@top-vision.cn [-- Attachment #1: Type: text/plain, Size: 840 bytes --] On Thu, 2016-02-25 at 10:57 +0100, Thomas.Betker@rohde-schwarz.com wrote: > Hello Joakim: > > > Can we get this upstream before next release? I don't think there > > will much more > > testing at this point. > > I would second this. Actually, I did some additional stress testing, but > didn't see any problems. OK, of the four fixes I had in my branch, three are Cc:stable and I have pushed them to linux-mtd.git. I'll give them a day or two in linux-next and then send a pull request. The fourth, which is just a mount speedup, I've put in l2-mtd.git. Brian, was that the right way to do it? I feel like an impostor pushing to my own repositories now :) -- David Woodhouse Open Source Technology Centre David.Woodhouse@intel.com Intel Corporation [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5691 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: JFFS2 deadlock 2016-02-25 11:22 ` David Woodhouse @ 2016-02-25 17:57 ` Brian Norris 0 siblings, 0 replies; 9+ messages in thread From: Brian Norris @ 2016-02-25 17:57 UTC (permalink / raw) To: David Woodhouse Cc: Thomas.Betker, Joakim.Tjernlund, dedekind1@gmail.com, deng.chao1@zte.com.cn, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mtd@lists.infradead.org, linux-mtd-bounces@lists.infradead.org, liu.ming50@gmail.com, sztomi89@gmail.com, viro@zeniv.linux.org.uk, wangzaiwei@top-vision.cn On Thu, Feb 25, 2016 at 11:22:12AM +0000, David Woodhouse wrote: > OK, of the four fixes I had in my branch, three are Cc:stable and I > have pushed them to linux-mtd.git. I'll give them a day or two in > linux-next and then send a pull request. > > The fourth, which is just a mount speedup, I've put in l2-mtd.git. > > Brian, was that the right way to do it? I feel like an impostor pushing > to my own repositories now :) Nope, that's perfect. Happy to have a JFFS2 maintainer again :) (Feel free to send the MAINTAINERS update that's also in linux-mtd.git/master. It was meant to ride along with whatever else needed pushed to Linus.) Brian ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2016-02-25 17:57 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CAKatumo13ZM2o3dK9_ouxzta-sxeiB5E236Qh5dfbnEBc9CuMw@mail.gmail.com>
[not found] ` <1453910781.20662.35.camel@infinera.com>
2016-01-28 0:05 ` JFFS2 deadlock Brian Norris
2016-01-28 8:16 ` Thomas.Betker
2016-02-01 14:28 ` David Woodhouse
2016-02-01 18:54 ` Thomas.Betker
[not found] ` <OF2969B332.8B296F0B-ONC1257F4C.006307EB-C1257F4C.0067DE44@LocalDomain>
2016-02-18 9:57 ` Thomas.Betker
2016-02-25 7:46 ` Joakim Tjernlund
2016-02-25 9:57 ` Thomas.Betker
2016-02-25 11:22 ` David Woodhouse
2016-02-25 17:57 ` Brian Norris
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).