* DOC2000 + PPM-TX166 (PATCH)
@ 2003-11-05 17:20 Jim Duchek
2003-11-06 0:24 ` David Woodhouse
0 siblings, 1 reply; 4+ messages in thread
From: Jim Duchek @ 2003-11-05 17:20 UTC (permalink / raw)
To: linux-mtd
[-- Attachment #1: Type: text/plain, Size: 1434 bytes --]
Hello. I'm not subscribed to the mailing list, so please, if you could
make sure any replies for me go to my address and not to the list, I
would appreciate.
We saw a problem (actually, a pretty big problem) wherein when a sync()
was done on a filesystem on the DoC we were using, it would hang and
never come back. The system was perfectly responsive, except trying to
talk to the DoC wasn't going to happen again until reboot. The problem
would never occur if we let the fs sync itself -- that is, just wait a
few minutes until the buffers have been cleared.
I traced the problem down to the cond_resched() in WaitReady in
doc2000.c. Removing cond_resched() made the problem go away (and the
system unresponsive while doing any DoC access). Replacing the
udelay(1) and cond_resched() with a yield() fixes everything. The patch
is short and appended to this message. I believe the patch should be
Good for all users, although I'm not sure why we don't see this problem
on some other setups. My best guess is that the other setups we are
using have Geode processors, which I don't believe have a TSC, and our
P-MMX does have one, and udelay() is totally different depending on the
existence of a TSC.
For those curious, we are using an unpatched 2.4.22 kernel on a
WinSystems PPM-TX166. The DoC in question is a 48M Industrial (X) rated
chip. We saw the problem with both msdos and ext2 filesystems.
[-- Attachment #2: doc2000.patch --]
[-- Type: text/plain, Size: 149 bytes --]
94a95
> yield();
99,100d99
< udelay(1);
< cond_resched();
101a101,104
>
> DEBUG(MTD_DEBUG_LEVEL3,
> "_DoC_WaitReady finished\n");
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: DOC2000 + PPM-TX166 (PATCH)
2003-11-05 17:20 DOC2000 + PPM-TX166 (PATCH) Jim Duchek
@ 2003-11-06 0:24 ` David Woodhouse
2003-11-06 0:53 ` Jim Duchek
0 siblings, 1 reply; 4+ messages in thread
From: David Woodhouse @ 2003-11-06 0:24 UTC (permalink / raw)
To: Jim Duchek; +Cc: linux-mtd
On Wed, 2003-11-05 at 11:20 -0600, Jim Duchek wrote:
> I traced the problem down to the cond_resched() in WaitReady in
> doc2000.c. Removing cond_resched() made the problem go away (and the
> system unresponsive while doing any DoC access). Replacing the
> udelay(1) and cond_resched() with a yield() fixes everything.
Very strange. Calling yield() really isn't what we want there; we wanted
cond_resched(). What if you leave the cond_resched() and remove the
udelay()? What if you make the udelay() larger?
When the system is locked up as you described, can you hit SysRq-T to
see the backtrace of the stuck thread, and show me the (decoded) output.
Does the same happen even with the current CVS code?
--
dwmw2
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: DOC2000 + PPM-TX166 (PATCH)
2003-11-06 0:24 ` David Woodhouse
@ 2003-11-06 0:53 ` Jim Duchek
2003-11-06 7:40 ` David Woodhouse
0 siblings, 1 reply; 4+ messages in thread
From: Jim Duchek @ 2003-11-06 0:53 UTC (permalink / raw)
To: David Woodhouse; +Cc: linux-mtd
I'll take a look at getting you that backtrace -- I don't think we have
one of our boxes with a keyboard/video on it right now, but next time I
get the opportunity I'll get you that. I haven't tried removing the
udelay. I didn't consider that the udelay might have been the problem
until after I'd put the yield() in, and then later was discussing what
the difference might have been and I thought of the TSC/non-TSC
udelays. I'll give that a shot tomorrow as well -- it really ought to
be removed anyhow if a resched() is going to happen right after it. We
did try the latest CVS code (perhaps two weeks ago) and saw no
difference, so reverted back to the 2.4.22 released version.
Jim
David Woodhouse wrote:
>On Wed, 2003-11-05 at 11:20 -0600, Jim Duchek wrote:
>
>
>>I traced the problem down to the cond_resched() in WaitReady in
>>doc2000.c. Removing cond_resched() made the problem go away (and the
>>system unresponsive while doing any DoC access). Replacing the
>>udelay(1) and cond_resched() with a yield() fixes everything.
>>
>>
>
>Very strange. Calling yield() really isn't what we want there; we wanted
>cond_resched(). What if you leave the cond_resched() and remove the
>udelay()? What if you make the udelay() larger?
>
>When the system is locked up as you described, can you hit SysRq-T to
>see the backtrace of the stuck thread, and show me the (decoded) output.
>
>Does the same happen even with the current CVS code?
>
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: DOC2000 + PPM-TX166 (PATCH)
2003-11-06 0:53 ` Jim Duchek
@ 2003-11-06 7:40 ` David Woodhouse
0 siblings, 0 replies; 4+ messages in thread
From: David Woodhouse @ 2003-11-06 7:40 UTC (permalink / raw)
To: Jim Duchek; +Cc: linux-mtd
On Wed, 2003-11-05 at 18:53 -0600, Jim Duchek wrote:
> I'll take a look at getting you that backtrace -- I don't think we have
> one of our boxes with a keyboard/video on it right now, but next time I
> get the opportunity I'll get you that.
Thanks.
> We did try the latest CVS code (perhaps two weeks ago) and saw no
> difference, so reverted back to the 2.4.22 released version.
This routine was changed yesterday, to make it conform with the
recommendations in the programming documentation... :)
--
dwmw2
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2003-11-06 7:43 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-11-05 17:20 DOC2000 + PPM-TX166 (PATCH) Jim Duchek
2003-11-06 0:24 ` David Woodhouse
2003-11-06 0:53 ` Jim Duchek
2003-11-06 7:40 ` David Woodhouse
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox