From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from petasus.ims.intel.com ([62.118.80.130]) by pentafluge.infradead.org with esmtp (Exim 4.54 #1 (Red Hat Linux)) id 1EfdcC-0002N7-Ai for linux-mtd@lists.infradead.org; Fri, 25 Nov 2005 13:27:49 +0000 Message-ID: <4387113C.3050206@intel.com> Date: Fri, 25 Nov 2005 16:27:24 +0300 From: "Alexey, Korolev" MIME-Version: 1.0 To: Nicolas Pitre References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: linux-mtd@lists.infradead.org Subject: Re: Deadlock in cfi_cmdset_0001.c on simultaneous write operations. List-Id: Linux MTD discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Nicolas Pitre wrote: > On Thu, 24 Nov 2005, Alexey, Korolev wrote: > > > Nicolas, > > > > I'm using non SMP platform ( Mainstone II). CONFIG_PREEMPT is disabled. > > What kernel version are you using? > linux 2.6.11 > Can you send me your kernel .config? I'll try to reproduce it here. > > > Partition size is 8MB. Current configuration: each logical volume is > located > > on each h/w partition. Logical volumes don't share h/w partitions. > > This is Sibley flash? > Yes it is M18 flash chip. > > I also disabled erase suspend on write feature. > > Why? > I thought that it would be better for the bug localization. Please correct me if I'm wrong. The code recursion in get_chip function is mostly related to usage of erase suspend on write feature. Code just fall to sleep on attempt to get busy chip if I disable erase suspend on write. It just showed to me that it is not a problem with erase suspend. > > I applied code which you have send in previous letter. > > After that code behavior has changed. > > It didn't halt on basic simultaneous write operations. > > Actually, I wonder why. Especially with CONFIG_PREEMPT on non SMP > system all spin_locks are just no ops. > > > But it failed to kernel panic in our test case. (Five applications, > each of > > them performs writing, erasing and reading own logical volume ) > > Can you share your test application with me? > The test application is a part of rather big test harness. I'm will try to find a way for you to reproduce the issue. > > Here is kernel panic message: > > After this message I received two more almost the same as this > kernel panic > > messages. > > > [...] > > Stack: (0xc391dfa8 to 0xc391e000) > > dfa0: c391dfc8 c391dfb8 c003129c c0030eb4 02c76300 > c391e004 > > dfc0: c391dfcc c01a0928 c0031284 02734e47 33c93d00 00000075 c3982450 > c3c732f0 > > dfe0: c391e08c c02deba0 00000007 c3c732d4 00000001 00000001 c391e0c8 > c391e008 > > Backtrace: > [...] > > This looks extremely suspicious, given that the backtrace has at least > 40 calls and the stack cannot contain all of them given its location > (the kernel stack is 8kb aligned). So this really looks like a kernel > stack overflow, and frankly I wonder how you managed that. > > Did you modify your kernel somehow? What patches if any did you apply > to it? > Yes we modified kernel. We made own patches for kernel. But it doesn't relate to chip getting process. I think it will be possible to reproduce the issue on default configuration . I need some time to find a way how to do it. Thanks, Alexey