public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* gettimeofday clock jump bug
@ 2002-06-07 20:21 ashieh
  2002-06-07 20:30 ` Karim Yaghmour
  2002-06-07 21:11 ` george anzinger
  0 siblings, 2 replies; 7+ messages in thread
From: ashieh @ 2002-06-07 20:21 UTC (permalink / raw)
  To: linux-kernel

Hi,

time() occasionally returns a bogus value (>1 hour jump forward, and a few microseconds later jumps back to the right time) on my box (Thunderbird 750, Asus K7V (KX133) kernel 2.4.17). This behavior sets in after the box is up for some period of time. I don't think this is related to the 686a configuration reset bug.

I suspect that somehow the either do_gettimeoffset() or xtime.tv_usec in do_gettimeofday is returning a ridiculously large value. I would like to get to the bottom of this problem, but am not familiar with this part of the timing infrastructure. Has anyone seen this bug before? Would using a different locking mode (SMP  vs none SMP, or wahtever) possibly help this problem? Is there a document online describing how this works in Linux?  

In the meantime, I want to hack up a patch to fix this on my box. I'm thinking I could give up a few seconds of clock precision in exchange for monotonic clock behavior, and so I want to comment out the adjustments to usec. What are the possible ramifications of this hack?

Alan

Original do_gettimeofday:

void do_gettimeofday(struct timeval *tv)
{
        unsigned long flags;
        unsigned long usec, sec;

        read_lock_irqsave(&xtime_lock, flags);
        usec = do_gettimeoffset();
        {
                unsigned long lost = jiffies - wall_jiffies;
                if (lost)
                        usec += lost * (1000000 / HZ);
        }
        sec = xtime.tv_sec;
        usec += xtime.tv_usec;
        read_unlock_irqrestore(&xtime_lock, flags);

        while (usec >= 1000000) {
                usec -= 1000000;
                sec++;
        }

        tv->tv_sec = sec;
        tv->tv_usec = usec;
}

My proposed hack (for my system):

void do_gettimeofday(struct timeval *tv)
{
        unsigned long flags;
        unsigned long usec, sec;

        read_lock_irqsave(&xtime_lock, flags);
        usec = do_gettimeoffset();
        /* {
                unsigned long lost = jiffies - wall_jiffies;
                if (lost)
                        usec += lost * (1000000 / HZ);
        } */
        sec = xtime.tv_sec;
        usec = xtime.tv_usec;
        read_unlock_irqrestore(&xtime_lock, flags);

        while (usec >= 1000000) {
                usec -= 1000000;
                sec++;
        }

        tv->tv_sec = sec;
        tv->tv_usec = usec;
}

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: gettimeofday clock jump bug
  2002-06-07 20:21 gettimeofday clock jump bug ashieh
@ 2002-06-07 20:30 ` Karim Yaghmour
  2002-06-07 21:11 ` george anzinger
  1 sibling, 0 replies; 7+ messages in thread
From: Karim Yaghmour @ 2002-06-07 20:30 UTC (permalink / raw)
  To: ashieh; +Cc: linux-kernel


Hello Alan,

While developing the Linux Trace Toolkit, I've seen this bug occur on
at least the i386 and the PPC. I had written down a description of
how time-keeping is done in the kernel in order to get to the bottom of
the problem. I'm not sure if this is still acurate, but it's a starting
point nevetheless:
http://www.embeddedlinuxworks.com/articles/kernel_time.html

Cheers,

Karim

ashieh@OCF.Berkeley.EDU wrote:
> 
> Hi,
> 
> time() occasionally returns a bogus value (>1 hour jump forward, and a few microseconds later jumps back to the right time) on my box (Thunderbird 750, Asus K7V (KX133) kernel 2.4.17). This behavior sets in after the box is up for some period of time. I don't think this is related to the 686a configuration reset bug.
> 
> I suspect that somehow the either do_gettimeoffset() or xtime.tv_usec in do_gettimeofday is returning a ridiculously large value. I would like to get to the bottom of this problem, but am not familiar with this part of the timing infrastructure. Has anyone seen this bug before? Would using a different locking mode (SMP  vs none SMP, or wahtever) possibly help this problem? Is there a document online describing how this works in Linux?
> 
> In the meantime, I want to hack up a patch to fix this on my box. I'm thinking I could give up a few seconds of clock precision in exchange for monotonic clock behavior, and so I want to comment out the adjustments to usec. What are the possible ramifications of this hack?
> 
> Alan
> 
> Original do_gettimeofday:
> 
> void do_gettimeofday(struct timeval *tv)
> {
>         unsigned long flags;
>         unsigned long usec, sec;
> 
>         read_lock_irqsave(&xtime_lock, flags);
>         usec = do_gettimeoffset();
>         {
>                 unsigned long lost = jiffies - wall_jiffies;
>                 if (lost)
>                         usec += lost * (1000000 / HZ);
>         }
>         sec = xtime.tv_sec;
>         usec += xtime.tv_usec;
>         read_unlock_irqrestore(&xtime_lock, flags);
> 
>         while (usec >= 1000000) {
>                 usec -= 1000000;
>                 sec++;
>         }
> 
>         tv->tv_sec = sec;
>         tv->tv_usec = usec;
> }
> 
> My proposed hack (for my system):
> 
> void do_gettimeofday(struct timeval *tv)
> {
>         unsigned long flags;
>         unsigned long usec, sec;
> 
>         read_lock_irqsave(&xtime_lock, flags);
>         usec = do_gettimeoffset();
>         /* {
>                 unsigned long lost = jiffies - wall_jiffies;
>                 if (lost)
>                         usec += lost * (1000000 / HZ);
>         } */
>         sec = xtime.tv_sec;
>         usec = xtime.tv_usec;
>         read_unlock_irqrestore(&xtime_lock, flags);
> 
>         while (usec >= 1000000) {
>                 usec -= 1000000;
>                 sec++;
>         }
> 
>         tv->tv_sec = sec;
>         tv->tv_usec = usec;
> }
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
===================================================
                 Karim Yaghmour
               karim@opersys.com
      Embedded and Real-Time Linux Expert
===================================================

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: gettimeofday clock jump bug
  2002-06-07 20:21 gettimeofday clock jump bug ashieh
  2002-06-07 20:30 ` Karim Yaghmour
@ 2002-06-07 21:11 ` george anzinger
  2002-06-07 23:51   ` Ruth Ivimey-Cook
  1 sibling, 1 reply; 7+ messages in thread
From: george anzinger @ 2002-06-07 21:11 UTC (permalink / raw)
  To: ashieh; +Cc: linux-kernel

ashieh@OCF.Berkeley.EDU wrote:
> 
> Hi,
> 
> time() occasionally returns a bogus value (>1 hour jump forward, and a few microseconds later jumps back to the right time) on my box (Thunderbird 750, Asus K7V (KX133) kernel 2.4.17). This behavior sets in after the box is up for some period of time. I don't think this is related to the 686a configuration reset bug.

I suspect that do_gettimeoffset() may be, on occasion,
returning a negative number.  The normalizing code then
works with this (unsigned) value until it is < 1,000,000. 
If it came back as -1, this would generate an error of about
1.19 hours.  I suspect the best fix would be to test the
result from do_gettimeoffset() for something greater than
say 20ms and if so set it to 0.

-g
> 
> I suspect that somehow the either do_gettimeoffset() or xtime.tv_usec in do_gettimeofday is returning a ridiculously large value. I would like to get to the bottom of this problem, but am not familiar with this part of the timing infrastructure. Has anyone seen this bug before? Would using a different locking mode (SMP  vs none SMP, or wahtever) possibly help this problem? Is there a document online describing how this works in Linux?
> 
> In the meantime, I want to hack up a patch to fix this on my box. I'm thinking I could give up a few seconds of clock precision in exchange for monotonic clock behavior, and so I want to comment out the adjustments to usec. What are the possible ramifications of this hack?
> 
> Alan
> 
> Original do_gettimeofday:
> 
> void do_gettimeofday(struct timeval *tv)
> {
>         unsigned long flags;
>         unsigned long usec, sec;
> 
>         read_lock_irqsave(&xtime_lock, flags);
>         usec = do_gettimeoffset();
>         {
>                 unsigned long lost = jiffies - wall_jiffies;
>                 if (lost)
>                         usec += lost * (1000000 / HZ);
>         }
>         sec = xtime.tv_sec;
>         usec += xtime.tv_usec;
>         read_unlock_irqrestore(&xtime_lock, flags);
> 
>         while (usec >= 1000000) {
>                 usec -= 1000000;
>                 sec++;
>         }
> 
>         tv->tv_sec = sec;
>         tv->tv_usec = usec;
> }
> 
> My proposed hack (for my system):
> 
> void do_gettimeofday(struct timeval *tv)
> {
>         unsigned long flags;
>         unsigned long usec, sec;
> 
>         read_lock_irqsave(&xtime_lock, flags);
>         usec = do_gettimeoffset();
>         /* {
>                 unsigned long lost = jiffies - wall_jiffies;
>                 if (lost)
>                         usec += lost * (1000000 / HZ);
>         } */
>         sec = xtime.tv_sec;
>         usec = xtime.tv_usec;
>         read_unlock_irqrestore(&xtime_lock, flags);
> 
>         while (usec >= 1000000) {
>                 usec -= 1000000;
>                 sec++;
>         }
> 
>         tv->tv_sec = sec;
>         tv->tv_usec = usec;
> }
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
George Anzinger   george@mvista.com
High-res-timers: 
http://sourceforge.net/projects/high-res-timers/
Real time sched:  http://sourceforge.net/projects/rtsched/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: gettimeofday clock jump bug
  2002-06-07 21:11 ` george anzinger
@ 2002-06-07 23:51   ` Ruth Ivimey-Cook
  2002-06-08  3:37     ` george anzinger
  0 siblings, 1 reply; 7+ messages in thread
From: Ruth Ivimey-Cook @ 2002-06-07 23:51 UTC (permalink / raw)
  To: george anzinger; +Cc: ashieh, linux-kernel


ashieh@OCF.Berkeley.EDU wrote:
> time() occasionally returns a
>bogus value (>1 hour jump forward, and a few microseconds later jumps back to
>the right time) on my box (Thunderbird 750, Asus K7V (KX133) kernel 2.4.17).
>This behavior sets in after the box is up for some period of time. I don't
>think this is related to the 686a configuration reset bug.
>
On Fri, 7 Jun 2002, george anzinger wrote:
>I suspect that do_gettimeoffset() may be, on occasion,
>returning a negative number.  The normalizing code then
>works with this (unsigned) value until it is < 1,000,000. 
>If it came back as -1, this would generate an error of about
>1.19 hours.  I suspect the best fix would be to test the
>result from do_gettimeoffset() for something greater than
>say 20ms and if so set it to 0.

I've just looked at the i386 time.c source and can see no obvious way for -1
to be returned by do_gettimeoffset(). I note that this error is fixed in the
next time() call, so I would instead expect the error to be one involving the 
conversion of tv_secs/tv_usecs into the seconds return from time().

One possible way to check this out would be to change the test program from
using the time() call to using gettimeofday(), and to ignore tv_usecs.

Hope this helps,

Ruth

-- 
Ruth Ivimey-Cook
Software engineer and technical writer.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: gettimeofday clock jump bug
@ 2002-06-08  0:35 Tomas Vanderka
  0 siblings, 0 replies; 7+ messages in thread
From: Tomas Vanderka @ 2002-06-08  0:35 UTC (permalink / raw)
  To: linux-kernel

Hi,
I've got Abit KT7 and AMD Athlon system, and i have seen this happen while 
running 2.4.18 (with 2.4.16 and later 2.4.19-pre8/9 it didn't happen)
There has been some discussion about it here
http://marc.theaimsgroup.com/?t=92577204900001&r=1&w=2

VanTo


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: gettimeofday clock jump bug
  2002-06-07 23:51   ` Ruth Ivimey-Cook
@ 2002-06-08  3:37     ` george anzinger
  0 siblings, 0 replies; 7+ messages in thread
From: george anzinger @ 2002-06-08  3:37 UTC (permalink / raw)
  To: Ruth Ivimey-Cook; +Cc: ashieh, linux-kernel

Ruth Ivimey-Cook wrote:
> 
> ashieh@OCF.Berkeley.EDU wrote:
> > time() occasionally returns a
> >bogus value (>1 hour jump forward, and a few microseconds later jumps back to
> >the right time) on my box (Thunderbird 750, Asus K7V (KX133) kernel 2.4.17).
> >This behavior sets in after the box is up for some period of time. I don't
> >think this is related to the 686a configuration reset bug.
> >
> On Fri, 7 Jun 2002, george anzinger wrote:
> >I suspect that do_gettimeoffset() may be, on occasion,
> >returning a negative number.  The normalizing code then
> >works with this (unsigned) value until it is < 1,000,000.
> >If it came back as -1, this would generate an error of about
> >1.19 hours.  I suspect the best fix would be to test the
> >result from do_gettimeoffset() for something greater than
> >say 20ms and if so set it to 0.
> 
> I've just looked at the i386 time.c source and can see no obvious way for -1
> to be returned by do_gettimeoffset(). 

It can happen two ways (and I am not saying it returns -1,
but some large number ~ 1hr worth of usecs).  The first
possibility is an overflow in the conversion of tsc ticks to
usec.  This is more likely a problem if the processor is
running at high speeds, AND interrupts have been held off
for a while.  The second possibility is that either the
latch read failed to latch or the PIT is actually not
resetting at count = 0.  This is, I think, the VIA chip
bug.  If there is no "correction" code for this bug, the net
result will be a slow and erratic clock.  If there is
correction software in place, it could result in the
observed time jump by providing an invalid count which is
then used by do_gettimeoffset().  The reason the fault
clears is a.) on the next tick a valid count will be
obtained, and b) the value from do_gettimeoffset() is never
rolled into the wall clock.

> I note that this error is fixed in the
> next time() call, so I would instead expect the error to be one involving the
> conversion of tv_secs/tv_usecs into the seconds return from time().
> 
> One possible way to check this out would be to change the test program from
> using the time() call to using gettimeofday(), and to ignore tv_usecs.

And just where does time() get the time of day if not from
gettimeofday()?
> 
> Hope this helps,
> 
> Ruth
> 
> --
> Ruth Ivimey-Cook
> Software engineer and technical writer.

-- 
George Anzinger   george@mvista.com
High-res-timers: 
http://sourceforge.net/projects/high-res-timers/
Real time sched:  http://sourceforge.net/projects/rtsched/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: gettimeofday clock jump bug
@ 2002-08-02  7:24 Hartwig. Thomas
  0 siblings, 0 replies; 7+ messages in thread
From: Hartwig. Thomas @ 2002-08-02  7:24 UTC (permalink / raw)
  To: linux-kernel

I was referenced to a gettimeofday problem spoken of in the kernel 
discussion summary:
http://kt.zork.net/kernel-traffic/kt20020708_174.html#1

There is spoken of .01% reproducibility and less of this problem.

I got to this problem here running a version of GNU wget (1.8.2) and 
following kernel: Linux version 2.4.18 (gcc version 2.96 20000731 (Red 
Hat Linux 7.3 2.96-110)) #4 Sun Jul 28 09:01:06 CEST 2002

In this configuration I get the error even more times. In 2300 calls I 
get 319 failures, this is something about 0.07% not too much, but 
significant.

However it is not my skill to analyze nor get to deep in wget and the 
kernel. It's just a note and a offer of some further tests if you need.

Greetings
Thomas

PS: I'm sorry this message is out of the thread index. I got to late on 
the list.



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2002-08-02  7:20 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-06-07 20:21 gettimeofday clock jump bug ashieh
2002-06-07 20:30 ` Karim Yaghmour
2002-06-07 21:11 ` george anzinger
2002-06-07 23:51   ` Ruth Ivimey-Cook
2002-06-08  3:37     ` george anzinger
  -- strict thread matches above, loose matches on Subject: below --
2002-06-08  0:35 Tomas Vanderka
2002-08-02  7:24 Hartwig. Thomas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox