public inbox for linux-mtd@lists.infradead.org
 help / color / mirror / Atom feed
* DOC2000 + PPM-TX166 (PATCH)
@ 2003-11-05 17:20 Jim Duchek
  2003-11-06  0:24 ` David Woodhouse
  0 siblings, 1 reply; 4+ messages in thread
From: Jim Duchek @ 2003-11-05 17:20 UTC (permalink / raw)
  To: linux-mtd

[-- Attachment #1: Type: text/plain, Size: 1434 bytes --]

Hello.  I'm not subscribed to the mailing list, so please, if you could 
make sure any replies for me go to my address and not to the list, I 
would appreciate.

We saw a problem (actually, a pretty big problem) wherein when a sync() 
was done on a filesystem on the DoC we were using, it would hang and 
never come back.  The system was perfectly responsive, except trying to 
talk to the DoC wasn't going to happen again until reboot.  The problem 
would never occur if we let the fs sync itself -- that is, just wait a 
few minutes until the buffers have been cleared.

I traced the problem down to the cond_resched() in WaitReady in 
doc2000.c.  Removing cond_resched() made the problem go away (and the 
system unresponsive while doing any DoC access).  Replacing the 
udelay(1) and cond_resched() with a yield() fixes everything.  The patch 
is short and appended to this message.  I believe the patch should be 
Good for all users, although I'm not sure why we don't see this problem 
on some other setups.  My best guess is that the other setups we are 
using have Geode processors, which I don't believe have a TSC, and our 
P-MMX does have one, and udelay() is totally different depending on the 
existence of a TSC.

For those curious, we are using an unpatched 2.4.22 kernel on a 
WinSystems PPM-TX166.  The DoC in question is a 48M Industrial (X) rated 
chip.  We saw the problem with both msdos and ext2 filesystems.




[-- Attachment #2: doc2000.patch --]
[-- Type: text/plain, Size: 149 bytes --]

94a95
> 		yield();
99,100d99
< 		udelay(1);
< 		cond_resched();
101a101,104
> 	
> 	DEBUG(MTD_DEBUG_LEVEL3,
> 	      "_DoC_WaitReady finished\n");
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: DOC2000 + PPM-TX166 (PATCH)
  2003-11-05 17:20 DOC2000 + PPM-TX166 (PATCH) Jim Duchek
@ 2003-11-06  0:24 ` David Woodhouse
  2003-11-06  0:53   ` Jim Duchek
  0 siblings, 1 reply; 4+ messages in thread
From: David Woodhouse @ 2003-11-06  0:24 UTC (permalink / raw)
  To: Jim Duchek; +Cc: linux-mtd

On Wed, 2003-11-05 at 11:20 -0600, Jim Duchek wrote:
> I traced the problem down to the cond_resched() in WaitReady in 
> doc2000.c.  Removing cond_resched() made the problem go away (and the 
> system unresponsive while doing any DoC access).  Replacing the 
> udelay(1) and cond_resched() with a yield() fixes everything. 

Very strange. Calling yield() really isn't what we want there; we wanted
cond_resched(). What if you leave the cond_resched() and remove the
udelay()? What if you make the udelay() larger?

When the system is locked up as you described, can you hit SysRq-T to
see the backtrace of the stuck thread, and show me the (decoded) output.

Does the same happen even with the current CVS code?

-- 
dwmw2

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: DOC2000 + PPM-TX166 (PATCH)
  2003-11-06  0:24 ` David Woodhouse
@ 2003-11-06  0:53   ` Jim Duchek
  2003-11-06  7:40     ` David Woodhouse
  0 siblings, 1 reply; 4+ messages in thread
From: Jim Duchek @ 2003-11-06  0:53 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-mtd

I'll take a look at getting you that backtrace -- I don't think we have 
one of our boxes with a keyboard/video on it right now, but next time I 
get the opportunity I'll get you that.  I haven't tried removing the 
udelay.  I didn't consider that the udelay might have been the problem 
until after I'd put the yield() in, and then later was discussing what 
the difference might have been and I thought of the TSC/non-TSC 
udelays.  I'll give that a shot tomorrow as well -- it really ought to 
be removed anyhow if a resched() is going to happen right after it.  We 
did try the latest CVS code (perhaps two weeks ago) and saw no 
difference, so reverted back to the 2.4.22 released version. 


Jim

David Woodhouse wrote:

>On Wed, 2003-11-05 at 11:20 -0600, Jim Duchek wrote:
>  
>
>>I traced the problem down to the cond_resched() in WaitReady in 
>>doc2000.c.  Removing cond_resched() made the problem go away (and the 
>>system unresponsive while doing any DoC access).  Replacing the 
>>udelay(1) and cond_resched() with a yield() fixes everything. 
>>    
>>
>
>Very strange. Calling yield() really isn't what we want there; we wanted
>cond_resched(). What if you leave the cond_resched() and remove the
>udelay()? What if you make the udelay() larger?
>
>When the system is locked up as you described, can you hit SysRq-T to
>see the backtrace of the stuck thread, and show me the (decoded) output.
>
>Does the same happen even with the current CVS code?
>
>  
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: DOC2000 + PPM-TX166 (PATCH)
  2003-11-06  0:53   ` Jim Duchek
@ 2003-11-06  7:40     ` David Woodhouse
  0 siblings, 0 replies; 4+ messages in thread
From: David Woodhouse @ 2003-11-06  7:40 UTC (permalink / raw)
  To: Jim Duchek; +Cc: linux-mtd

On Wed, 2003-11-05 at 18:53 -0600, Jim Duchek wrote:
> I'll take a look at getting you that backtrace -- I don't think we have 
> one of our boxes with a keyboard/video on it right now, but next time I 
> get the opportunity I'll get you that.  

Thanks.

> We did try the latest CVS code (perhaps two weeks ago) and saw no 
> difference, so reverted back to the 2.4.22 released version. 

This routine was changed yesterday, to make it conform with the
recommendations in the programming documentation... :)

-- 
dwmw2

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2003-11-06  7:43 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-11-05 17:20 DOC2000 + PPM-TX166 (PATCH) Jim Duchek
2003-11-06  0:24 ` David Woodhouse
2003-11-06  0:53   ` Jim Duchek
2003-11-06  7:40     ` David Woodhouse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox