* time() glitch on 2.4.18 at 177 days uptime?
@ 2002-11-02 6:37 Jim Paris
2002-11-03 19:32 ` time() glitch on 2.4.18: solved Jim Paris
0 siblings, 1 reply; 30+ messages in thread
From: Jim Paris @ 2002-11-02 6:37 UTC (permalink / raw)
To: linux-kernel
I'm running Linux 2.4.18 on an Athlon (i386).
My uptime is 182 days. About five days ago, I started noticing
strange date effects. It may be hardware related, but I'm also
suspecting that it may be due to the long uptime, and so I'm hesitant
to reboot, in case there is a bug here that needs to get fixed.
The problem is with time(). Every second, for approximately 1.1ms,
time() reports a value that is about 2^32 microseconds (4295 seconds,
or about an hour and a quarter) in the future. The glitches always
occur between a change of seconds. Look at this:
$ while true; do paste <( cat /proc/uptime ) <( date ) ; done | grep -A 1 -B 1 ' 02:' | head -11
15765463.31 15455589.84 Sat Nov 2 00:59:16 EST 2002
15765463.32 15455589.84 Sat Nov 2 02:10:51 EST 2002
15765463.33 15455589.84 Sat Nov 2 00:59:17 EST 2002
--
15765465.31 15455589.84 Sat Nov 2 00:59:18 EST 2002
15765465.32 15455589.84 Sat Nov 2 02:10:53 EST 2002
15765465.34 15455589.84 Sat Nov 2 00:59:19 EST 2002
--
15765466.30 15455589.84 Sat Nov 2 00:59:19 EST 2002
15765466.32 15455589.84 Sat Nov 2 02:10:54 EST 2002
15765466.33 15455589.84 Sat Nov 2 00:59:20 EST 2002
The first two values on each line are from /proc/uptime, and the rest
is of course from "date". This bash script runs a bit slow, which is
why it missed the glitch between :17 and :18, but by watching time()s
more frequently, I can definitely see that they're there. I don't
know for certain whether this happens at all times during the day, but
I believe it does. I can do some more logging to figure that out if
necessary.
I've been running ntpdate (and therefore calling do_adjtime()) every
half hour since the system booted, in case that might affect
something.
This glitch does _not_ occur on a 2.2.? system on some x86 processor
with about 256 days uptime (I unfortunately don't know any more
details than that; I only have very limited access to that machine).
I also don't have access to any other 2.4 systems with an uptime of
greater than about 80 days.
This is not a glibc bug; the erroneous values are being returned by
the kernel. Perhaps someone can help me track this down. I've put my
kernel config and relevant info from /proc at
http://neurosis.mit.edu/~jim/time-glitch/
(and would appreciate a CC of any replies)
-jim
^ permalink raw reply [flat|nested] 30+ messages in thread* Re: time() glitch on 2.4.18: solved 2002-11-02 6:37 time() glitch on 2.4.18 at 177 days uptime? Jim Paris @ 2002-11-03 19:32 ` Jim Paris 2002-11-03 20:30 ` Alan Cox 0 siblings, 1 reply; 30+ messages in thread From: Jim Paris @ 2002-11-03 19:32 UTC (permalink / raw) To: linux-kernel > The problem is with time(). Every second, for approximately 1.1ms, > time() reports a value that is about 2^32 microseconds (4295 seconds, > or about an hour and a quarter) in the future. The glitches always > occur between a change of seconds. I finally found the problem. My i8253 was out of phase. The 16-bit timer value is read in two 8-bit reads from the 8253 in arch/i386/kernel/time.c, and this value should be between 0 and LATCH-1. My kernel was getting the MSB and LSB reversed, and so the read values were usually too high, and delay_at_last_interrupt ended up negative. This caused some small random negative amount to be subtracted from usecs during do_gettimeofday, and so my clock was always making small jumps backwards, and occasionally jumping forward 2^32 when usecs was small. To fix it, I just read a single byte from port 0x40. If I do it again, the problem returns (and I've tested that this is the case on multiple systems, so it's not just a problem with my 8253). After 180 days of uptime, it's not surprising that there would have been one read of the port that failed, triggering the problem, so I think the kernel should detect and fix this. We could just check for it: if the returned count > LATCH, read an extra byte from port 0x40, as I did. Or, use the method in do_slow_gettimeoffset, which basically resets the 8253's counter if count > LATCH. Any comments? -jim ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: time() glitch on 2.4.18: solved 2002-11-03 19:32 ` time() glitch on 2.4.18: solved Jim Paris @ 2002-11-03 20:30 ` Alan Cox 2002-11-05 16:30 ` [PATCH] " Jim Paris 0 siblings, 1 reply; 30+ messages in thread From: Alan Cox @ 2002-11-03 20:30 UTC (permalink / raw) To: Jim Paris; +Cc: Linux Kernel Mailing List On Sun, 2002-11-03 at 19:32, Jim Paris wrote: > After 180 days of uptime, it's not surprising that there would have > been one read of the port that failed, triggering the problem, so I > think the kernel should detect and fix this. We could just check for > it: if the returned count > LATCH, read an extra byte from port 0x40, > as I did. Or, use the method in do_slow_gettimeoffset, which > basically resets the 8253's counter if count > LATCH. We have locking that ought to get that all correct nowdays but I've seen at least one bios generate half a read in SMM mode > Any comments? Have a play with it, if your idea works when you deliberately disturb it then send in a patch ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH] Re: time() glitch on 2.4.18: solved 2002-11-03 20:30 ` Alan Cox @ 2002-11-05 16:30 ` Jim Paris 2002-11-05 16:54 ` george anzinger ` (3 more replies) 0 siblings, 4 replies; 30+ messages in thread From: Jim Paris @ 2002-11-05 16:30 UTC (permalink / raw) To: Alan Cox; +Cc: Linux Kernel Mailing List > > Any comments? > > Have a play with it, if your idea works when you deliberately disturb it > then send in a patch This works well. -jim diff -urN linux-2.4.18/arch/i386/kernel/time.c linux-2.4.18-jim/arch/i386/kernel/time.c --- linux-2.4.18/arch/i386/kernel/time.c Fri Mar 15 18:28:53 2002 +++ linux-2.4.18-jim/arch/i386/kernel/time.c Tue Nov 5 11:22:02 2002 @@ -501,6 +501,16 @@ count = inb_p(0x40); /* read the latched count */ count |= inb(0x40) << 8; + + /* Any unpaired read will cause the above to swap MSB/LSB + forever. Try to detect this and reset the counter. */ + if (count > LATCH) { + outb_p(0x34, 0x43); + outb_p(LATCH & 0xff, 0x40); + outb(LATCH >> 8, 0x40); + count = LATCH - 1; + } + spin_unlock(&i8253_lock); count = ((LATCH-1) - count) * TICK_SIZE; ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Re: time() glitch on 2.4.18: solved 2002-11-05 16:30 ` [PATCH] " Jim Paris @ 2002-11-05 16:54 ` george anzinger 2002-11-05 17:10 ` Willy Tarreau ` (2 subsequent siblings) 3 siblings, 0 replies; 30+ messages in thread From: george anzinger @ 2002-11-05 16:54 UTC (permalink / raw) To: Jim Paris; +Cc: Alan Cox, Linux Kernel Mailing List Jim Paris wrote: > > > > Any comments? > > > > Have a play with it, if your idea works when you deliberately disturb it > > then send in a patch > > This works well. But it does introduce a hiccup in time. One could just do an odd read IF that is all that is wrong. Your code could also be correcting for other ills... -g > > -jim > > diff -urN linux-2.4.18/arch/i386/kernel/time.c linux-2.4.18-jim/arch/i386/kernel/time.c > --- linux-2.4.18/arch/i386/kernel/time.c Fri Mar 15 18:28:53 2002 > +++ linux-2.4.18-jim/arch/i386/kernel/time.c Tue Nov 5 11:22:02 2002 > @@ -501,6 +501,16 @@ > > count = inb_p(0x40); /* read the latched count */ > count |= inb(0x40) << 8; > + > + /* Any unpaired read will cause the above to swap MSB/LSB > + forever. Try to detect this and reset the counter. */ > + if (count > LATCH) { > + outb_p(0x34, 0x43); > + outb_p(LATCH & 0xff, 0x40); > + outb(LATCH >> 8, 0x40); > + count = LATCH - 1; > + } > + > spin_unlock(&i8253_lock); > > count = ((LATCH-1) - count) * TICK_SIZE; > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Re: time() glitch on 2.4.18: solved 2002-11-05 16:30 ` [PATCH] " Jim Paris 2002-11-05 16:54 ` george anzinger @ 2002-11-05 17:10 ` Willy Tarreau 2002-11-05 18:16 ` Alan Cox 2002-11-05 17:14 ` Willy Tarreau 2002-11-05 17:32 ` Alan Cox 3 siblings, 1 reply; 30+ messages in thread From: Willy Tarreau @ 2002-11-05 17:10 UTC (permalink / raw) To: Jim Paris; +Cc: Alan Cox, Linux Kernel Mailing List On Tue, Nov 05, 2002 at 11:30:20AM -0500, Jim Paris wrote: > + if (count > LATCH) { may be (count >= LATCH) would be even better ? Willy ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Re: time() glitch on 2.4.18: solved 2002-11-05 17:10 ` Willy Tarreau @ 2002-11-05 18:16 ` Alan Cox 2002-11-05 18:02 ` Jim Paris 0 siblings, 1 reply; 30+ messages in thread From: Alan Cox @ 2002-11-05 18:16 UTC (permalink / raw) To: Willy Tarreau; +Cc: Jim Paris, Linux Kernel Mailing List On Tue, 2002-11-05 at 17:10, Willy Tarreau wrote: > On Tue, Nov 05, 2002 at 11:30:20AM -0500, Jim Paris wrote: > > + if (count > LATCH) { > > may be (count >= LATCH) would be even better ? Some PIT clones seem to hold the LATCH value momentarily judging by other things that were triggered wrongly by >= ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Re: time() glitch on 2.4.18: solved 2002-11-05 18:16 ` Alan Cox @ 2002-11-05 18:02 ` Jim Paris 2002-11-05 18:37 ` Alan Cox 0 siblings, 1 reply; 30+ messages in thread From: Jim Paris @ 2002-11-05 18:02 UTC (permalink / raw) To: Alan Cox; +Cc: Willy Tarreau, Linux Kernel Mailing List > > > + if (count > LATCH) { > > > > may be (count >= LATCH) would be even better ? > > Some PIT clones seem to hold the LATCH value momentarily judging by > other things that were triggered wrongly by >= If so, then that's a separate problem: the later code count = ((LATCH-1) - count) * TICK_SIZE; delay_at_last_interrupt = (count + LATCH/2) / LATCH; will cause delay_at_last_interrupt to be negative, so you'll see backwards jumps in time and occasional wraparound of usecs as I did. Perhaps a if (count == LATCH) count--; after the reset code? -jim ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Re: time() glitch on 2.4.18: solved 2002-11-05 18:02 ` Jim Paris @ 2002-11-05 18:37 ` Alan Cox 2002-11-05 18:20 ` Willy Tarreau 0 siblings, 1 reply; 30+ messages in thread From: Alan Cox @ 2002-11-05 18:37 UTC (permalink / raw) To: Jim Paris; +Cc: Willy Tarreau, Linux Kernel Mailing List On Tue, 2002-11-05 at 18:02, Jim Paris wrote: > > > > + if (count > LATCH) { > > > > > > may be (count >= LATCH) would be even better ? > > > > Some PIT clones seem to hold the LATCH value momentarily judging by > > other things that were triggered wrongly by >= > > If so, then that's a separate problem: the later code > > count = ((LATCH-1) - count) * TICK_SIZE; > delay_at_last_interrupt = (count + LATCH/2) / LATCH; > It might be interesting to catch that case with a printk too and put both in 2.5 and see what comes out in the wash yes ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Re: time() glitch on 2.4.18: solved 2002-11-05 18:37 ` Alan Cox @ 2002-11-05 18:20 ` Willy Tarreau 2002-11-05 18:50 ` Jim Paris 0 siblings, 1 reply; 30+ messages in thread From: Willy Tarreau @ 2002-11-05 18:20 UTC (permalink / raw) To: Alan Cox; +Cc: Jim Paris, Willy Tarreau, Linux Kernel Mailing List On Tue, Nov 05, 2002 at 06:37:57PM +0000, Alan Cox wrote: > On Tue, 2002-11-05 at 18:02, Jim Paris wrote: > > > > > + if (count > LATCH) { > > > > > > > > may be (count >= LATCH) would be even better ? > > > > > > Some PIT clones seem to hold the LATCH value momentarily judging by > > > other things that were triggered wrongly by >= > > > > If so, then that's a separate problem: the later code > > > > count = ((LATCH-1) - count) * TICK_SIZE; > > delay_at_last_interrupt = (count + LATCH/2) / LATCH; > > > > It might be interesting to catch that case with a printk too and put > both in 2.5 and see what comes out in the wash yes could that be the reason a few people have experienced occasionnal jumps backwards in gettimeofday() a few months ago, which many others could never reproduce ? Just because of buggy hardware ? If so, I think the printk patch should be proposed to Marcelo before 2.4.20-final. Regards, Willy ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Re: time() glitch on 2.4.18: solved 2002-11-05 18:20 ` Willy Tarreau @ 2002-11-05 18:50 ` Jim Paris 0 siblings, 0 replies; 30+ messages in thread From: Jim Paris @ 2002-11-05 18:50 UTC (permalink / raw) To: Willy Tarreau; +Cc: Alan Cox, Linux Kernel Mailing List > could that be the reason a few people have experienced occasionnal jumps > backwards in gettimeofday() a few months ago, which many others could never > reproduce ? Just because of buggy hardware ? An unpaired read on port 0x40 is almost certainly the cause of e.g. http://www.uwsg.iu.edu/hypermail/linux/kernel/0206.0/1505.html and we would expect to see the problem with count==LATCH about 1/11932 of the time, about 0.008% -- almost exactly the value reported in: http://www.uwsg.iu.edu/hypermail/linux/kernel/0206.3/0043.html I believe that by adding both the (count > LATCH) check and a second (count == LATCH) check, we can fix both of these. -jim ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Re: time() glitch on 2.4.18: solved 2002-11-05 16:30 ` [PATCH] " Jim Paris 2002-11-05 16:54 ` george anzinger 2002-11-05 17:10 ` Willy Tarreau @ 2002-11-05 17:14 ` Willy Tarreau 2002-11-05 17:42 ` Jim Paris 2002-11-05 17:32 ` Alan Cox 3 siblings, 1 reply; 30+ messages in thread From: Willy Tarreau @ 2002-11-05 17:14 UTC (permalink / raw) To: Jim Paris; +Cc: Alan Cox, Linux Kernel Mailing List BTW, why not trying to resync, with something like : if (count >= LATCH) count = (count >> 8) | inb(0x40) << 8; Cheers, Willy > count = inb_p(0x40); /* read the latched count */ > count |= inb(0x40) << 8; > + > + /* Any unpaired read will cause the above to swap MSB/LSB > + forever. Try to detect this and reset the counter. */ > + if (count > LATCH) { > + outb_p(0x34, 0x43); > + outb_p(LATCH & 0xff, 0x40); > + outb(LATCH >> 8, 0x40); > + count = LATCH - 1; > + } > + > spin_unlock(&i8253_lock); > > count = ((LATCH-1) - count) * TICK_SIZE; > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Re: time() glitch on 2.4.18: solved 2002-11-05 17:14 ` Willy Tarreau @ 2002-11-05 17:42 ` Jim Paris 2002-11-05 18:57 ` Richard B. Johnson 0 siblings, 1 reply; 30+ messages in thread From: Jim Paris @ 2002-11-05 17:42 UTC (permalink / raw) To: Willy Tarreau; +Cc: Alan Cox, Linux Kernel Mailing List > BTW, why not trying to resync, with something like : > > if (count >= LATCH) > count = (count >> 8) | inb(0x40) << 8; That's best if we really are out of sync, but it's hard to tell. It could be that the 8253's latch value got clobbered somehow, in which case we definitely want to fix that or our timer interrupts will come out at the wrong rate. We also still need to double-check that count < LATCH after your code. Unless we assume that an unpaired read is more common than having the latch value clobbered in some other way, then I think just resetting the counter should be okay. Since none of this _should_ ever happen (but did on my system, grr), it's probably not worth making it too complicated just to try to figure out what went wrong. Updated patch with the printk and corrected conditional is below. -jim diff -urN linux-2.4.18/arch/i386/kernel/time.c linux-2.4.18-jim/arch/i386/kernel/time.c --- linux-2.4.18/arch/i386/kernel/time.c Fri Mar 15 18:28:53 2002 +++ linux-2.4.18-jim/arch/i386/kernel/time.c Tue Nov 5 12:39:38 2002 @@ -501,6 +501,18 @@ count = inb_p(0x40); /* read the latched count */ count |= inb(0x40) << 8; + + /* Any unpaired read will cause the above to swap MSB/LSB + forever. Try to detect this and reset the counter. */ + if (count >= LATCH) { + printk(KERN_WARNING + "i8253 count too high! resetting..\n"); + outb_p(0x34, 0x43); + outb_p(LATCH & 0xff, 0x40); + outb(LATCH >> 8, 0x40); + count = LATCH - 1; + } + spin_unlock(&i8253_lock); count = ((LATCH-1) - count) * TICK_SIZE; ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Re: time() glitch on 2.4.18: solved 2002-11-05 17:42 ` Jim Paris @ 2002-11-05 18:57 ` Richard B. Johnson 2002-11-05 19:08 ` Willy Tarreau 2002-11-05 20:06 ` Alan Cox 0 siblings, 2 replies; 30+ messages in thread From: Richard B. Johnson @ 2002-11-05 18:57 UTC (permalink / raw) To: Jim Paris; +Cc: Willy Tarreau, Alan Cox, Linux Kernel Mailing List On Tue, 5 Nov 2002, Jim Paris wrote: > > BTW, why not trying to resync, with something like : > > > > if (count >= LATCH) > > count = (count >> 8) | inb(0x40) << 8; > > That's best if we really are out of sync, but it's hard to tell. It > could be that the 8253's latch value got clobbered somehow, in which > case we definitely want to fix that or our timer interrupts will come > out at the wrong rate. We also still need to double-check that > count < LATCH after your code. > > Unless we assume that an unpaired read is more common than having the > latch value clobbered in some other way, then I think just resetting > the counter should be okay. Since none of this _should_ ever happen > (but did on my system, grr), it's probably not worth making it too > complicated just to try to figure out what went wrong. > > Updated patch with the printk and corrected conditional is below. > > -jim > > diff -urN linux-2.4.18/arch/i386/kernel/time.c linux-2.4.18-jim/arch/i386/kernel/time.c > --- linux-2.4.18/arch/i386/kernel/time.c Fri Mar 15 18:28:53 2002 > +++ linux-2.4.18-jim/arch/i386/kernel/time.c Tue Nov 5 12:39:38 2002 > @@ -501,6 +501,18 @@ > > count = inb_p(0x40); /* read the latched count */ > count |= inb(0x40) << 8; > + > + /* Any unpaired read will cause the above to swap MSB/LSB > + forever. Try to detect this and reset the counter. */ > + if (count >= LATCH) { > + printk(KERN_WARNING > + "i8253 count too high! resetting..\n"); > + outb_p(0x34, 0x43); > + outb_p(LATCH & 0xff, 0x40); > + outb(LATCH >> 8, 0x40); > + count = LATCH - 1; > + } > + > spin_unlock(&i8253_lock); > > count = ((LATCH-1) - count) * TICK_SIZE; > - No! You will break many machines. You cannot use out_p() when writing the latch it __must__ be out(). the "_p" puts a write to another port between the two writes. That will screw up the internal state-machine of most PITs including AMD-SC520. Again, any time you must make two consecutive writes to the same device port to set it, you must not use the "_p" version. The above latch setting must be: outb(LATCH & 0xff, 0x40); outb(LATCH >> 8, 0x40); Of course, the second write could have a "_p" option, but you won't need it. Cheers, Dick Johnson Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips). Bush : The Fourth Reich of America ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Re: time() glitch on 2.4.18: solved 2002-11-05 18:57 ` Richard B. Johnson @ 2002-11-05 19:08 ` Willy Tarreau 2002-11-05 19:29 ` Richard B. Johnson 2002-11-05 20:06 ` Alan Cox 1 sibling, 1 reply; 30+ messages in thread From: Willy Tarreau @ 2002-11-05 19:08 UTC (permalink / raw) To: Richard B. Johnson Cc: Jim Paris, Willy Tarreau, Alan Cox, Linux Kernel Mailing List On Tue, Nov 05, 2002 at 01:57:16PM -0500, Richard B. Johnson wrote: > No! You will break many machines. You cannot use out_p() when > writing the latch it __must__ be out(). the "_p" puts a write > to another port between the two writes. That will screw up > the internal state-machine of most PITs including AMD-SC520. You make an interesting point. Have you checked if an inb_p() on this hardware could unlock the latch when reading the first byte ? It may also be one cause of time jumps if the counter is just about to wrap when it's being read, and gets unlocked by an out to port 0x80 (the "_p"). > count = inb_p(0x40); /* read the latched count */ > count |= inb(0x40) << 8; Or perhaps it will be time to change port 0x80 to something else, that no chipset will use. I've seen 0xED in a bios somewhere, but I don't remember where. Cheers, Willy ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Re: time() glitch on 2.4.18: solved 2002-11-05 19:08 ` Willy Tarreau @ 2002-11-05 19:29 ` Richard B. Johnson 2002-11-05 20:07 ` Alan Cox 0 siblings, 1 reply; 30+ messages in thread From: Richard B. Johnson @ 2002-11-05 19:29 UTC (permalink / raw) To: Willy Tarreau; +Cc: Jim Paris, Alan Cox, Linux Kernel Mailing List On Tue, 5 Nov 2002, Willy Tarreau wrote: > On Tue, Nov 05, 2002 at 01:57:16PM -0500, Richard B. Johnson wrote: > > > No! You will break many machines. You cannot use out_p() when > > writing the latch it __must__ be out(). the "_p" puts a write > > to another port between the two writes. That will screw up > > the internal state-machine of most PITs including AMD-SC520. > > You make an interesting point. Have you checked if an inb_p() on this hardware > could unlock the latch when reading the first byte ? It may also be one cause > of time jumps if the counter is just about to wrap when it's being read, and > gets unlocked by an out to port 0x80 (the "_p"). > > > count = inb_p(0x40); /* read the latched count */ > > count |= inb(0x40) << 8; > > Or perhaps it will be time to change port 0x80 to something else, that no > chipset will use. I've seen 0xED in a bios somewhere, but I don't remember > where. > > Cheers, > Willy > It may well be that the inb_p() also causes the same problem. You don't want any port read/write between those required consecutive reads or writes. You just don't want to use the _p option there --ever. The _p option is not a panacea that one uses to make sure that the port read/writes go okay. It's also not a "bus settle time" as IBM said in their BIOS. It is to give the internal state-machine of some devices enough time to set up. For instance, the CMOS clock runs with low-power CMOS. Its internal state-machine is low-power. It does not run as fast as an external bus can run. Therefore, when you set the next address to be read or written, with an out to the port at 0x70, you need to wait for the internal low-power CMOS state machine to actually latch the internal address. Then you can read or write from 0x71. There are only a few devices that require the _p. They are not the PIT, any UARTS, or the DMA controller. Sometimes you need to play nurse-maid to the FDC, but modern ones in super-IO chips are fine. The only hardware a modern PC needs to use "slow-down_io" on is the RTC CMOS device. Since we need to support older boards, you don't want to remove the _p options indiscriminately, but you do not want them ever between two consecutive writes to the same device- port. Cheers, Dick Johnson Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips). Bush : The Fourth Reich of America ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Re: time() glitch on 2.4.18: solved 2002-11-05 19:29 ` Richard B. Johnson @ 2002-11-05 20:07 ` Alan Cox 2002-11-05 20:14 ` Willy Tarreau 2002-11-05 20:23 ` Richard B. Johnson 0 siblings, 2 replies; 30+ messages in thread From: Alan Cox @ 2002-11-05 20:07 UTC (permalink / raw) To: root; +Cc: Willy Tarreau, Jim Paris, Linux Kernel Mailing List On Tue, 2002-11-05 at 19:29, Richard B. Johnson wrote: > The only hardware a modern PC needs to use "slow-down_io" on is > the RTC CMOS device. Since we need to support older boards, you > don't want to remove the _p options indiscriminately, but you do > not want them ever between two consecutive writes to the same device- > port. I own at least one that needs the _p on the DMA controller and at one that needs _p on the PIT ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Re: time() glitch on 2.4.18: solved 2002-11-05 20:07 ` Alan Cox @ 2002-11-05 20:14 ` Willy Tarreau 2002-11-05 20:38 ` Richard B. Johnson 2002-11-05 20:23 ` Richard B. Johnson 1 sibling, 1 reply; 30+ messages in thread From: Willy Tarreau @ 2002-11-05 20:14 UTC (permalink / raw) To: Alan Cox; +Cc: root, Willy Tarreau, Jim Paris, Linux Kernel Mailing List On Tue, Nov 05, 2002 at 08:07:26PM +0000, Alan Cox wrote: > On Tue, 2002-11-05 at 19:29, Richard B. Johnson wrote: > > The only hardware a modern PC needs to use "slow-down_io" on is > > the RTC CMOS device. Since we need to support older boards, you > > don't want to remove the _p options indiscriminately, but you do > > not want them ever between two consecutive writes to the same device- > > port. > > I own at least one that needs the _p on the DMA controller and at one > that needs _p on the PIT Well, in fact, Intel's 82C54 datasheet says that this chip needs at least 165 ns between two consecutive operations, either read or write. So with a 8 Mhz bus, you may effectively need to insert fake accesses, although most modern chipsets certainly have better specs. But the spec clearly states that you can interleave accesses to other counters between the first and second bytes without problem, so good implementations should see no side effect. Richard, if your PIT doesn't support accesses to port 80, could you try to use other ports ? Cheers, Willy ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Re: time() glitch on 2.4.18: solved 2002-11-05 20:14 ` Willy Tarreau @ 2002-11-05 20:38 ` Richard B. Johnson 2002-11-05 23:15 ` Willy Tarreau 0 siblings, 1 reply; 30+ messages in thread From: Richard B. Johnson @ 2002-11-05 20:38 UTC (permalink / raw) To: Willy Tarreau; +Cc: Alan Cox, Jim Paris, Linux Kernel Mailing List On Tue, 5 Nov 2002, Willy Tarreau wrote: > On Tue, Nov 05, 2002 at 08:07:26PM +0000, Alan Cox wrote: > > On Tue, 2002-11-05 at 19:29, Richard B. Johnson wrote: > > > The only hardware a modern PC needs to use "slow-down_io" on is > > > the RTC CMOS device. Since we need to support older boards, you > > > don't want to remove the _p options indiscriminately, but you do > > > not want them ever between two consecutive writes to the same device- > > > port. > > > > I own at least one that needs the _p on the DMA controller and at one > > that needs _p on the PIT > > Well, in fact, Intel's 82C54 datasheet says that this chip needs at least > 165 ns between two consecutive operations, either read or write. So with a > 8 Mhz bus, you may effectively need to insert fake accesses, although most > modern chipsets certainly have better specs. > Its not a clocked bus. The 8 MHz means nothing. It's an async bus and if the setup timing of the data/address-decode/chip-select sequence is not violated, you get ~200ns between accesses so chip-to-chip select time comes automatically. But, some older motherboards violated all kinds of specs. They did this deliberately so you could plug RAM boards into the ISA bus and get reasonable performance. If we still support them, you need some artifical waits. > But the spec clearly states that you can interleave accesses to other > counters between the first and second bytes without problem, so good > implementations should see no side effect. > > Richard, if your PIT doesn't support accesses to port 80, could you try to > use other ports ? > > Cheers, > Willy > It's not the port address. It's the bus state. The chip requires two consecutive writes when setting that latch. It's just like a 32-bit write to a 32-bit port (like PCI access), you can't write to the low-port, do something else, then write to the high port. It needs to be done with no intervening bus states. In the AMD case, it's probably because the address is ignored on the second write. Cheers, Dick Johnson Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips). Bush : The Fourth Reich of America ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Re: time() glitch on 2.4.18: solved 2002-11-05 20:38 ` Richard B. Johnson @ 2002-11-05 23:15 ` Willy Tarreau 2002-11-06 12:47 ` Richard B. Johnson 0 siblings, 1 reply; 30+ messages in thread From: Willy Tarreau @ 2002-11-05 23:15 UTC (permalink / raw) To: Richard B. Johnson Cc: Willy Tarreau, Alan Cox, Jim Paris, Linux Kernel Mailing List On Tue, Nov 05, 2002 at 03:38:00PM -0500, Richard B. Johnson wrote: > On Tue, 5 Nov 2002, Willy Tarreau wrote: > > > On Tue, Nov 05, 2002 at 08:07:26PM +0000, Alan Cox wrote: > > > On Tue, 2002-11-05 at 19:29, Richard B. Johnson wrote: > > > > The only hardware a modern PC needs to use "slow-down_io" on is > > > > the RTC CMOS device. Since we need to support older boards, you > > > > don't want to remove the _p options indiscriminately, but you do > > > > not want them ever between two consecutive writes to the same device- > > > > port. > > > > > > I own at least one that needs the _p on the DMA controller and at one > > > that needs _p on the PIT > > > > Well, in fact, Intel's 82C54 datasheet says that this chip needs at least > > 165 ns between two consecutive operations, either read or write. So with a > > 8 Mhz bus, you may effectively need to insert fake accesses, although most > > modern chipsets certainly have better specs. > > > > Its not a clocked bus. The 8 MHz means nothing. It's an async bus > and if the setup timing of the data/address-decode/chip-select sequence > is not violated, you get ~200ns between accesses so chip-to-chip > select time comes automatically. But, some older motherboards violated > all kinds of specs. They did this deliberately so you could plug > RAM boards into the ISA bus and get reasonable performance. If we still > support them, you need some artifical waits. Well, some older (386) motherboards had the 8254 directly connected to the ISA bus. And since Linux is 386-compatible, I think we should still support those boards. > It's not the port address. It's the bus state. The chip requires two > consecutive writes when setting that latch. It's just like a 32-bit > write to a 32-bit port (like PCI access), you can't write to the > low-port, do something else, then write to the high port. Yes you can ! the chip uses a CS (chip select) from an address decoder, and sees the reads/writes only when both CS and either RD or WR are low. I understand that all this circuitry may be simplified a bit in recent chipsets (one unified decoder, fewer glue logic between chips), but there's no reason that this should not be respected if the chip makers claim a bit of compatibility. > It needs to be done with no intervening bus states. In the AMD case, it's > probably because the address is ignored on the second write. Then it's buggy. This chip was originaly designed to work for 8086/80186, which barely means work when directly connected to an ISA bus. If AMD make their chip rely on PCI-like back-to-back behaviour, I don't agree for the compatibility. Now, since there's at least one buggy implementation, the solution is udelay(). But do you think that your chip may be confused by a memory access, when udelay() fetches the count variable from memory, for example ? The last other solution may be to simply use jmp $+2, which will be slow enough on old sensible hardware, and imperceptible on more recent hardware which doesn't require this. Since Alan and you have one piece of hardware which show opposite behaviour, we should get an ever working solution quickly :-) Cheers, Willy ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Re: time() glitch on 2.4.18: solved 2002-11-05 23:15 ` Willy Tarreau @ 2002-11-06 12:47 ` Richard B. Johnson 2002-11-06 13:25 ` Alan Cox 0 siblings, 1 reply; 30+ messages in thread From: Richard B. Johnson @ 2002-11-06 12:47 UTC (permalink / raw) To: Willy Tarreau; +Cc: Alan Cox, Jim Paris, Linux Kernel Mailing List On Wed, 6 Nov 2002, Willy Tarreau wrote: > On Tue, Nov 05, 2002 at 03:38:00PM -0500, Richard B. Johnson wrote: > > On Tue, 5 Nov 2002, Willy Tarreau wrote: > > > > > On Tue, Nov 05, 2002 at 08:07:26PM +0000, Alan Cox wrote: > > > > On Tue, 2002-11-05 at 19:29, Richard B. Johnson wrote: > > > > > The only hardware a modern PC needs to use "slow-down_io" on is > > > > > the RTC CMOS device. Since we need to support older boards, you > > > > > don't want to remove the _p options indiscriminately, but you do > > > > > not want them ever between two consecutive writes to the same device- > > > > > port. > > > > > > > > I own at least one that needs the _p on the DMA controller and at one > > > > that needs _p on the PIT > > > > > > Well, in fact, Intel's 82C54 datasheet says that this chip needs at least > > > 165 ns between two consecutive operations, either read or write. So with a > > > 8 Mhz bus, you may effectively need to insert fake accesses, although most > > > modern chipsets certainly have better specs. > > > > > > > Its not a clocked bus. The 8 MHz means nothing. It's an async bus > > and if the setup timing of the data/address-decode/chip-select sequence > > is not violated, you get ~200ns between accesses so chip-to-chip > > select time comes automatically. But, some older motherboards violated > > all kinds of specs. They did this deliberately so you could plug > > RAM boards into the ISA bus and get reasonable performance. If we still > > support them, you need some artifical waits. > > Well, some older (386) motherboards had the 8254 directly connected to the ISA > bus. And since Linux is 386-compatible, I think we should still support those > boards. > > > It's not the port address. It's the bus state. The chip requires two > > consecutive writes when setting that latch. It's just like a 32-bit > > write to a 32-bit port (like PCI access), you can't write to the > > low-port, do something else, then write to the high port. > > Yes you can ! the chip uses a CS (chip select) from an address decoder, > and sees the reads/writes only when both CS and either RD or WR are low. > I understand that all this circuitry may be simplified a bit in recent > chipsets (one unified decoder, fewer glue logic between chips), but there's > no reason that this should not be respected if the chip makers claim a bit > of compatibility. > > > It needs to be done with no intervening bus states. In the AMD case, it's > > probably because the address is ignored on the second write. > > Then it's buggy. This chip was originaly designed to work for 8086/80186, > which barely means work when directly connected to an ISA bus. If AMD make > their chip rely on PCI-like back-to-back behaviour, I don't agree for the > compatibility. > > Now, since there's at least one buggy implementation, the solution is udelay(). > But do you think that your chip may be confused by a memory access, when > udelay() fetches the count variable from memory, for example ? > > The last other solution may be to simply use jmp $+2, which will be slow > enough on old sensible hardware, and imperceptible on more recent hardware > which doesn't require this. > > Since Alan and you have one piece of hardware which show opposite behaviour, > we should get an ever working solution quickly :-) > udelay() would work memory/read/write is entirely different from I/O port read/write even though PWB traces are shared. But that might result in wasted CPU cycles. This works in all cases in machines I have tested. I can't test everything but, depending upon whether or not the forces a cache-line refill, the delay can be from 200 ns to over 800 ns. I have a single instance of this, and call it, rather than doing in-line. This adds a further delay. # .section .text .global cflush .type cflush,@function cflush: pushl %cs # Put code segment on the stack pushl $goto # Put offset on the stack lret # Do a 'long' return (reloads cs) goto: ret .end Cheers, Dick Johnson Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips). Bush : The Fourth Reich of America ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Re: time() glitch on 2.4.18: solved 2002-11-06 12:47 ` Richard B. Johnson @ 2002-11-06 13:25 ` Alan Cox 2002-11-06 23:21 ` Jim Paris 0 siblings, 1 reply; 30+ messages in thread From: Alan Cox @ 2002-11-06 13:25 UTC (permalink / raw) To: root; +Cc: Willy Tarreau, Jim Paris, Linux Kernel Mailing List On Wed, 2002-11-06 at 12:47, Richard B. Johnson wrote: > udelay() would work memory/read/write is entirely different from I/O > port read/write even though PWB traces are shared. But that might > result in wasted CPU cycles. udelay actually makes a lot lot more sense than the current _p stuff. Legacy-free hardware might not do what is expected with port 0x80 eventually and still have stuff using _p > > This works in all cases in machines I have tested. I can't test everything > but, depending upon whether or not the forces a cache-line refill, the > delay can be from 200 ns to over 800 ns. I have a single instance of this, > and call it, rather than doing in-line. This adds a further delay. Thats a call/return stack break. That does delay damage in excess of what you actually want and the wrong places. udelay does the right thing ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Re: time() glitch on 2.4.18: solved 2002-11-06 13:25 ` Alan Cox @ 2002-11-06 23:21 ` Jim Paris 0 siblings, 0 replies; 30+ messages in thread From: Jim Paris @ 2002-11-06 23:21 UTC (permalink / raw) To: Alan Cox; +Cc: root, Willy Tarreau, Linux Kernel Mailing List The recommendations about outb versus outb_p may be valid, but I think this patch should go in anyway in its current form. I'm resetting the i8253 in the same way it was initially set in i8259.c, and so Dick, if you want to change how that's done, it needs to get changed in both places; that's outside the scope of what I'm trying to fix right now. To summarize, this patch will fix: 1) An unpaired read to the i8253 causes time to jump around forever 2) A buggy i8253 causes time to jump around 0.008% of the time #1 is reported whenever it occurs. #2 is only reported the first few times because it's likely to occur every two minutes or so on affected chips. Patch is against 2.4.18. It applies cleanly (but is untested) on 2.4.19 and 2.4.20-rc1. -jim diff -urN linux-2.4.18/arch/i386/kernel/time.c linux-2.4.18-jim/arch/i386/kernel/time.c --- linux-2.4.18/arch/i386/kernel/time.c Fri Mar 15 18:28:53 2002 +++ linux-2.4.18-jim/arch/i386/kernel/time.c Wed Nov 6 17:18:28 2002 @@ -501,7 +501,30 @@ count = inb_p(0x40); /* read the latched count */ count |= inb(0x40) << 8; + + /* Any unpaired read will cause the above to swap MSB/LSB + forever. Try to detect this and reset the counter. */ + if (count > LATCH) { + printk(KERN_WARNING + "i8253 count too high! resetting..\n"); + outb_p(0x34, 0x43); + outb_p(LATCH & 0xff, 0x40); + outb(LATCH >> 8, 0x40); + count = LATCH - 1; + } + spin_unlock(&i8253_lock); + + /* Buggy i8253s might hold the LATCH value. */ + if (count == LATCH) { + static int reported = 0; + if(reported<3) { + printk(KERN_WARNING + "i8253 count too high! adjusting..\n"); + reported++; + } + count--; + } count = ((LATCH-1) - count) * TICK_SIZE; delay_at_last_interrupt = (count + LATCH/2) / LATCH; ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Re: time() glitch on 2.4.18: solved 2002-11-05 20:07 ` Alan Cox 2002-11-05 20:14 ` Willy Tarreau @ 2002-11-05 20:23 ` Richard B. Johnson 2002-11-05 20:57 ` Alan Cox 1 sibling, 1 reply; 30+ messages in thread From: Richard B. Johnson @ 2002-11-05 20:23 UTC (permalink / raw) To: Alan Cox; +Cc: Willy Tarreau, Jim Paris, Linux Kernel Mailing List On 5 Nov 2002, Alan Cox wrote: > On Tue, 2002-11-05 at 19:29, Richard B. Johnson wrote: > > The only hardware a modern PC needs to use "slow-down_io" on is > > the RTC CMOS device. Since we need to support older boards, you > > don't want to remove the _p options indiscriminately, but you do > > not want them ever between two consecutive writes to the same device- > > port. > > I own at least one that needs the _p on the DMA controller and at one > that needs _p on the PIT > Hey, look. I can only warn. You do what you want. As far as I'm concerned support stopped at Linux 2.4.19 when poll got trashed. Nobody can use 2.4.19 or probably anything later unless they have powerful CPUs that can spin with 1000 SIGPOLL signals per second. Like you have said, that's the nature of free software. Cheers, Dick Johnson Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips). Bush : The Fourth Reich of America ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Re: time() glitch on 2.4.18: solved 2002-11-05 20:23 ` Richard B. Johnson @ 2002-11-05 20:57 ` Alan Cox 2002-11-05 20:41 ` Richard B. Johnson 0 siblings, 1 reply; 30+ messages in thread From: Alan Cox @ 2002-11-05 20:57 UTC (permalink / raw) To: root; +Cc: Willy Tarreau, Jim Paris, Linux Kernel Mailing List On Tue, 2002-11-05 at 20:23, Richard B. Johnson wrote: > Hey, look. I can only warn. You do what you want. As far as I'm > concerned support stopped at Linux 2.4.19 when poll got trashed. > Nobody can use 2.4.19 or probably anything later unless they have > powerful CPUs that can spin with 1000 SIGPOLL signals per second. My 486SLC33 (running at 8MHz on battery mode) doesn't believe you. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Re: time() glitch on 2.4.18: solved 2002-11-05 20:57 ` Alan Cox @ 2002-11-05 20:41 ` Richard B. Johnson 2002-11-06 15:09 ` Christer Weinigel 0 siblings, 1 reply; 30+ messages in thread From: Richard B. Johnson @ 2002-11-05 20:41 UTC (permalink / raw) To: Alan Cox; +Cc: Willy Tarreau, Jim Paris, Linux Kernel Mailing List On 5 Nov 2002, Alan Cox wrote: > On Tue, 2002-11-05 at 20:23, Richard B. Johnson wrote: > > Hey, look. I can only warn. You do what you want. As far as I'm > > concerned support stopped at Linux 2.4.19 when poll got trashed. > > Nobody can use 2.4.19 or probably anything later unless they have > > powerful CPUs that can spin with 1000 SIGPOLL signals per second. > > My 486SLC33 (running at 8MHz on battery mode) doesn't believe you. > Enable SIGPOLL on STDIN_FILENO and connect from the network as previously shown. You will get 100% CPU time on the task with this enabled. Cheers, Dick Johnson Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips). Bush : The Fourth Reich of America ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Re: time() glitch on 2.4.18: solved 2002-11-05 20:41 ` Richard B. Johnson @ 2002-11-06 15:09 ` Christer Weinigel 2002-11-06 15:49 ` Richard B. Johnson 0 siblings, 1 reply; 30+ messages in thread From: Christer Weinigel @ 2002-11-06 15:09 UTC (permalink / raw) To: root; +Cc: Alan Cox, Willy Tarreau, Jim Paris, Linux Kernel Mailing List "Richard B. Johnson" <root@chaos.analogic.com> writes: > > On Tue, 2002-11-05 at 20:23, Richard B. Johnson wrote: > > > Hey, look. I can only warn. You do what you want. As far as I'm > > > concerned support stopped at Linux 2.4.19 when poll got trashed. > > > Nobody can use 2.4.19 or probably anything later unless they have > > > powerful CPUs that can spin with 1000 SIGPOLL signals per second. > > Enable SIGPOLL on STDIN_FILENO and connect from the network as previously > shown. You will get 100% CPU time on the task with this enabled. Can you please stop harping about this? You have written essentially the same mail over and over again without adding any new information. First of all, your problem has nothing to do with networking, the same thing will happen if you run your program from within screen, so it seems to have more to do with ptys and how both screen and telnetd sets them up. Second, it seems that your program does exactly what is expected. Try applying this patch to your example program and see what happens: diff -u ./polltest.c.orig ./polltest.c --- ./polltest.c.orig Wed Nov 6 15:36:09 2002 +++ ./polltest.c Wed Nov 6 15:35:54 2002 @@ -47,7 +47,7 @@ { struct pollfd pf; pf.fd = STDIN_FILENO; - pf.events = POLLIN; + pf.events = POLLIN | POLLOUT | POLLERR; pf.revents = 0; (void)poll(&pf, 1, 0); if(pf.revents & POLLIN) Each time you print the "POLLOUT = 3 POLLERR = 0" message, you fill up the write buffer, and each time the write queue empties you will get another SIGIO. Now, this may not be what you want, but it seems to be entirely within specs. If you remove the fprintf(stderr, "POLLOUT... from your signal handler you will not see the same effect as you are complaining about. Your measurements are interfering with what you are trying to measure. /Christer -- "Just how much can I get away with and still go to heaven?" Freelance consultant specializing in device driver programming for Linux Christer Weinigel <christer@weinigel.se> http://www.weinigel.se ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Re: time() glitch on 2.4.18: solved 2002-11-06 15:09 ` Christer Weinigel @ 2002-11-06 15:49 ` Richard B. Johnson 0 siblings, 0 replies; 30+ messages in thread From: Richard B. Johnson @ 2002-11-06 15:49 UTC (permalink / raw) To: Christer Weinigel Cc: Alan Cox, Willy Tarreau, Jim Paris, Linux Kernel Mailing List On 6 Nov 2002, Christer Weinigel wrote: > "Richard B. Johnson" <root@chaos.analogic.com> writes: > > > > On Tue, 2002-11-05 at 20:23, Richard B. Johnson wrote: > > > > Hey, look. I can only warn. You do what you want. As far as I'm > > > > concerned support stopped at Linux 2.4.19 when poll got trashed. > > > > Nobody can use 2.4.19 or probably anything later unless they have > > > > powerful CPUs that can spin with 1000 SIGPOLL signals per second. > > > > Enable SIGPOLL on STDIN_FILENO and connect from the network as previously > > shown. You will get 100% CPU time on the task with this enabled. > > Can you please stop harping about this? You have written essentially > the same mail over and over again without adding any new information. > > First of all, your problem has nothing to do with networking, the same > thing will happen if you run your program from within screen, so it > seems to have more to do with ptys and how both screen and telnetd > sets them up. > > Second, it seems that your program does exactly what is expected. Try > applying this patch to your example program and see what happens: > > diff -u ./polltest.c.orig ./polltest.c > --- ./polltest.c.orig Wed Nov 6 15:36:09 2002 > +++ ./polltest.c Wed Nov 6 15:35:54 2002 > @@ -47,7 +47,7 @@ > { > struct pollfd pf; > pf.fd = STDIN_FILENO; > - pf.events = POLLIN; > + pf.events = POLLIN | POLLOUT | POLLERR; > pf.revents = 0; > (void)poll(&pf, 1, 0); > if(pf.revents & POLLIN) > > Each time you print the "POLLOUT = 3 POLLERR = 0" message, you fill up > the write buffer, and each time the write queue empties you will get > another SIGIO. Now, this may not be what you want, but it seems to be > entirely within specs. > > If you remove the fprintf(stderr, "POLLOUT... from your signal handler > you will not see the same effect as you are complaining about. Your > measurements are interfering with what you are trying to measure. > No they are not. You can no longer `strace` because of the error reported. `strace` will continually write the signal information to the screen (forever) because every time it writes, another signal is generated. All you did was get rid of the method used to show the error. User mode code does not care, and should not care about kernel internals. Whether or not some "write buffer" emptied is (and must not be) any concern to the user. The user needs to know if a read or write to or from a device will block. There are some other things a user might want to know such as "out-of-band data", "hangups", and other protocol-specific things that are signaled in the error bit. A signal must be raised when conditions change. Currently, a signal is raised for every byte transmitted. This is a gross error and cannot be interpreted as "entirely within specs". If a task has attached the SIGIO signal, but the handler does nothing, the handler still gets executed. It gets executed on every byte sent. This makes output "cost" 100 percent of the CPU as seen on `top`. This is simply wrong. There is no possible explaination that will make it right. Everybody, so far, who has answered this thread assumes that I don't know what I am doing. In avery case, they modify the code to remove the forcing function that I have used to make it so they can duplicate the problem on their system. This is not productive and will not fix the problem. The problem is real and needs to be fixed. Cheers, Dick Johnson Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips). Bush : The Fourth Reich of America ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Re: time() glitch on 2.4.18: solved 2002-11-05 18:57 ` Richard B. Johnson 2002-11-05 19:08 ` Willy Tarreau @ 2002-11-05 20:06 ` Alan Cox 1 sibling, 0 replies; 30+ messages in thread From: Alan Cox @ 2002-11-05 20:06 UTC (permalink / raw) To: root; +Cc: Jim Paris, Willy Tarreau, Linux Kernel Mailing List On Tue, 2002-11-05 at 18:57, Richard B. Johnson wrote: > No! You will break many machines. You cannot use out_p() when > writing the latch it __must__ be out(). the "_p" puts a write > to another port between the two writes. That will screw up > the internal state-machine of most PITs including AMD-SC520. The delay is required for the PIT. Your AMD-SC520 is simply a bit wacko. The right way to fix this is actually to switch inb_p to use udelay(8) and to load a conservative bogomip number at boot time (we inb_p before we compute udelay values in a few spots) BTW if your SC520 is broken that way I just broke 2.5.4x support for it fixing some more standards compliant platforms 8) ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] Re: time() glitch on 2.4.18: solved 2002-11-05 16:30 ` [PATCH] " Jim Paris ` (2 preceding siblings ...) 2002-11-05 17:14 ` Willy Tarreau @ 2002-11-05 17:32 ` Alan Cox 3 siblings, 0 replies; 30+ messages in thread From: Alan Cox @ 2002-11-05 17:32 UTC (permalink / raw) To: Jim Paris; +Cc: Linux Kernel Mailing List On Tue, 2002-11-05 at 16:30, Jim Paris wrote: > This works well. > > -jim > > diff -urN linux-2.4.18/arch/i386/kernel/time.c linux-2.4.18-jim/arch/i386/kernel/time.c > --- linux-2.4.18/arch/i386/kernel/time.c Fri Mar 15 18:28:53 2002 > +++ linux-2.4.18-jim/arch/i386/kernel/time.c Tue Nov 5 11:22:02 2002 > @@ -501,6 +501,16 @@ > > count = inb_p(0x40); /* read the latched count */ > count |= inb(0x40) << 8; > + > + /* Any unpaired read will cause the above to swap MSB/LSB > + forever. Try to detect this and reset the counter. */ > + if (count > LATCH) { > + outb_p(0x34, 0x43); > + outb_p(LATCH & 0xff, 0x40); > + outb(LATCH >> 8, 0x40); > + count = LATCH - 1; > + } > + > spin_unlock(&i8253_lock); > > count = ((LATCH-1) - count) * TICK_SIZE; Add a printk to address the fact we want to know about this and I think thats enough to meet George's objection ? ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2002-11-06 23:15 UTC | newest] Thread overview: 30+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2002-11-02 6:37 time() glitch on 2.4.18 at 177 days uptime? Jim Paris 2002-11-03 19:32 ` time() glitch on 2.4.18: solved Jim Paris 2002-11-03 20:30 ` Alan Cox 2002-11-05 16:30 ` [PATCH] " Jim Paris 2002-11-05 16:54 ` george anzinger 2002-11-05 17:10 ` Willy Tarreau 2002-11-05 18:16 ` Alan Cox 2002-11-05 18:02 ` Jim Paris 2002-11-05 18:37 ` Alan Cox 2002-11-05 18:20 ` Willy Tarreau 2002-11-05 18:50 ` Jim Paris 2002-11-05 17:14 ` Willy Tarreau 2002-11-05 17:42 ` Jim Paris 2002-11-05 18:57 ` Richard B. Johnson 2002-11-05 19:08 ` Willy Tarreau 2002-11-05 19:29 ` Richard B. Johnson 2002-11-05 20:07 ` Alan Cox 2002-11-05 20:14 ` Willy Tarreau 2002-11-05 20:38 ` Richard B. Johnson 2002-11-05 23:15 ` Willy Tarreau 2002-11-06 12:47 ` Richard B. Johnson 2002-11-06 13:25 ` Alan Cox 2002-11-06 23:21 ` Jim Paris 2002-11-05 20:23 ` Richard B. Johnson 2002-11-05 20:57 ` Alan Cox 2002-11-05 20:41 ` Richard B. Johnson 2002-11-06 15:09 ` Christer Weinigel 2002-11-06 15:49 ` Richard B. Johnson 2002-11-05 20:06 ` Alan Cox 2002-11-05 17:32 ` Alan Cox
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.