From mboxrd@z Thu Jan 1 00:00:00 1970 Subject: Re: JFFS2 deadlock with alloc_sem From: David Woodhouse To: Dave Kleikamp In-Reply-To: <1185813909.9523.42.camel@kleikamp.austin.ibm.com> References: <1185543729.13873.10.camel@kleikamp.austin.ibm.com> <1185557896.22352.4.camel@kleikamp.austin.ibm.com> <1185799508.3083.20.camel@pmac.infradead.org> <1185813909.9523.42.camel@kleikamp.austin.ibm.com> Content-Type: text/plain; charset=UTF-8 Date: Tue, 31 Jul 2007 13:10:27 +0100 Message-Id: <1185883827.3083.109.camel@pmac.infradead.org> Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Cc: Roberts Nathan-mcg31137 , linux-mtd@lists.infradead.org, ye janboe List-Id: Linux MTD discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Mon, 2007-07-30 at 11:45 -0500, Dave Kleikamp wrote: > Thus we conclude that the root cause of the problem is that jffs2 is not > conforming to the strict order of acquiring multiple locks, ie., all code > paths resulting in acquiring multiple locks must do so in the same order. > In this case, gc thread requests first the file lock, then the page lock, > however jffs2_readpage function requests the page lock first, then the file > lock. Another potential deadlock source is in jffs2_prepare_write, in which it > requests page lock, then the file lock. If that's the explanation, then the patch which Nathan tried (dropping f->sem before jffs2_gc_fetch_page(), followed by your cleanups¹) ought to have fixed the problem. And I'd be happier with that version rather than introducing a new read_cache_page_async_trylock() solely for JFFS2. It's actually OK to drop f->sem in jffs2_garbage_collect_dnode(). We hold the alloc_sem anyway -- nobody's going to be _changing_ the file under us. In fact, the garbage collector probably doesn't need to grab f->sem until it's actually going to _change_ something. -- dwmw2 ¹ http://lists.infradead.org/pipermail/linux-mtd/2007-June/018588.html