From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailrelay010.isp.belgacom.be ([195.238.6.177]) by bombadil.infradead.org with esmtp (Exim 4.68 #1 (Red Hat Linux)) id 1J8C4u-0001oI-6N for linux-mtd@lists.infradead.org; Fri, 28 Dec 2007 10:04:39 +0000 Received: from mail.technotrade.biz (localhost [127.0.0.1]) by mail.technotrade.biz (Postfix) with ESMTP id 8B1D85B66 for ; Fri, 28 Dec 2007 11:04:29 +0100 (CET) Received: from pclaurent.technotrade.biz (pclaurent.technotrade.biz [192.168.1.47]) by mail.technotrade.biz (Postfix) with ESMTP id 7AA9D5B3D for ; Fri, 28 Dec 2007 11:04:29 +0100 (CET) From: Laurent Pinchart To: linux-mtd@lists.infradead.org Subject: Deadlock while accessing mtdchar device while erase in progress MIME-Version: 1.0 Date: Fri, 28 Dec 2007 11:04:26 +0100 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Message-Id: <200712281104.27663.laurentp@cse-semaphore.com> List-Id: Linux MTD discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Hi everybody, I sent a mail two weeks ago about an MTD-related deadlock. I haven't receiv= ed=20 any answer so far, so I'm pinging the list just in case the message was=20 missed. Here's the original mail. I'm having troubles fixing an MTD-related deadlock. I'm not sure if the=20 problem comes from an MTD and/or JFFS2 kernel bug, or from some userspace=20 operation performed as root that I shouldn't have started in the first plac= e.=20 I'd appreciate if someone could share his thoughts (and experience :-)) abo= ut=20 this. Some background. My embedded system has a single NOR flash chip (cfi_cmdset_0002 driver) wit= h=20 128kB sectors. Flash space is divided in 4 partitions, the last one used wi= th=20 a JFFS2 filesystem. A userspace process run as root sets its scheduling policy to SCHED_RR. It= =20 writes 400kB of data to the JFFS2 filesystem and restarts by exec()ing=20 itself, inheriting the SCHED_RR policy. Soon after being restarted, it opens /dev/mtd0 (boot loader partition) and= =20 reads a few bytes of data. It then close() /dev/mtd0, at which point the=20 kernel deadlocks. I added debugging messages to the MTD subsystem and it turned out my proces= s=20 blocks in cfi_amdstd_sync (drivers/mtd/chips/cfi_cmdset_0002.c). chip->stat= e=20 is set to FL_ERASING, and the kernel keeps scheduling in the default case. I'm not sure what goes wrong. I expect the JFFS2 garbage collector to have= =20 kicked in. It might be erasing sectors when I close() /dev/mtd0, at which=20 point the mtdchar driver tries to sync() the flash and locks in=20 cfi_amdstd_sync because my process, being scheduled with a SCHED_RR policy,= =20 never gives the JFFS2 garbage collector a chance to run. That's just a wild= =20 guess though. Any help regarding how to better diagnose the problem (to confirm or reject= =20 the above suspicion) will be appreciated. A fix would be even more=20 welcome :-) Best regards, =2D-=20 Laurent Pinchart CSE Semaphore Belgium Chauss=E9e de Bruxelles, 732A B-1410 Waterloo Belgium T +32 (2) 387 42 59 =46 +32 (2) 387 42 75