* /proc or ps tools bug? 2.6.3, time is off @ 2004-02-25 1:58 David Ford 2004-02-25 1:54 ` Albert Cahalan 2004-02-25 9:14 ` Petri Kaukasoina 0 siblings, 2 replies; 36+ messages in thread From: David Ford @ 2004-02-25 1:58 UTC (permalink / raw) To: linux-kernel mailing list, albert Kernel 2.6.3, procps 3.2.0 # while [ 1 ]; do (ps aux|grep "grep ps aux" && date) ; sleep 1; done root 20043 0.0 0.0 1504 456 pts/0 R 20:45 0:00 grep grep ps aux Tue Feb 24 20:45:25 EST 2004 root 20062 0.0 0.0 1504 460 pts/0 S 20:45 0:00 grep grep ps aux Tue Feb 24 20:45:26 EST 2004 root 20081 0.0 0.0 1504 460 pts/0 S 20:46 0:00 grep grep ps aux Tue Feb 24 20:45:27 EST 2004 Note the change in the timestamp as reported by 'ps' v.s. the time reported by 'date'. Repeatable every time at 26 seconds after the minute +/- a portion of a second. David ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: /proc or ps tools bug? 2.6.3, time is off 2004-02-25 1:58 /proc or ps tools bug? 2.6.3, time is off David Ford @ 2004-02-25 1:54 ` Albert Cahalan 2004-02-25 5:10 ` David Ford 2004-02-25 9:14 ` Petri Kaukasoina 1 sibling, 1 reply; 36+ messages in thread From: Albert Cahalan @ 2004-02-25 1:54 UTC (permalink / raw) To: David Ford; +Cc: linux-kernel mailing list, albert On Tue, 2004-02-24 at 20:58, David Ford wrote: > Kernel 2.6.3, procps 3.2.0 > > # while [ 1 ]; do (ps aux|grep "grep ps aux" && date) ; sleep 1; done > root 20043 0.0 0.0 1504 456 pts/0 R 20:45 0:00 grep grep ps aux > Tue Feb 24 20:45:25 EST 2004 > root 20062 0.0 0.0 1504 460 pts/0 S 20:45 0:00 grep grep ps aux > Tue Feb 24 20:45:26 EST 2004 > root 20081 0.0 0.0 1504 460 pts/0 S 20:46 0:00 grep grep ps aux > Tue Feb 24 20:45:27 EST 2004 > > Note the change in the timestamp as reported by 'ps' v.s. the time > reported by 'date'. > > Repeatable every time at 26 seconds after the minute +/- a portion of a > second. I'm not seeing it, with: procps both 3.1.8 and procps 3.2.0+ kernel 2.6.0-test11 library glibc 2.3 hardware uniprocessor G4 Mac ntp none (and you can tell by my email!) Run "ps --info" to gather much of this data. Note that time is a very awkward thing. You boot up, with some incorrect clock. Then you adjust the time. Later, you may discover that your clock has been running too slow. So you adjust the frequency, but what about the time that has already passed? Should you change the boot time to represent what is now known about your clock? What if, by doing so, you cause some processes to have started before boot? Then again, perhaps due to temperature change, you discover that your clock frequency is wrong... This is without even getting into the concept of leap seconds, which are determined a few months in advance. Two guesses: 1. leap seconds 2. SMP, with cycle counters out of sync ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: /proc or ps tools bug? 2.6.3, time is off 2004-02-25 1:54 ` Albert Cahalan @ 2004-02-25 5:10 ` David Ford 2004-02-25 3:27 ` Albert Cahalan 0 siblings, 1 reply; 36+ messages in thread From: David Ford @ 2004-02-25 5:10 UTC (permalink / raw) To: Albert Cahalan; +Cc: linux-kernel mailing list Albert Cahalan wrote: >On Tue, 2004-02-24 at 20:58, David Ford wrote: > > >>Kernel 2.6.3, procps 3.2.0 >> >># while [ 1 ]; do (ps aux|grep "grep ps aux" && date) ; sleep 1; done >>root 20043 0.0 0.0 1504 456 pts/0 R 20:45 0:00 grep grep ps aux >>Tue Feb 24 20:45:25 EST 2004 >>root 20062 0.0 0.0 1504 460 pts/0 S 20:45 0:00 grep grep ps aux >>Tue Feb 24 20:45:26 EST 2004 >>root 20081 0.0 0.0 1504 460 pts/0 S 20:46 0:00 grep grep ps aux >>Tue Feb 24 20:45:27 EST 2004 >> >>Note the change in the timestamp as reported by 'ps' v.s. the time >>reported by 'date'. >> >>Repeatable every time at 26 seconds after the minute +/- a portion of a >>second. >> >> > >I'm not seeing it, with: > >procps both 3.1.8 and procps 3.2.0+ >kernel 2.6.0-test11 >library glibc 2.3 >hardware uniprocessor G4 Mac >ntp none (and you can tell by my email!) > >Run "ps --info" to gather much of this data. > >Note that time is a very awkward thing. You boot up, >with some incorrect clock. Then you adjust the time. >Later, you may discover that your clock has been >running too slow. So you adjust the frequency, but >what about the time that has already passed? Should >you change the boot time to represent what is now >known about your clock? What if, by doing so, you >cause some processes to have started before boot? >Then again, perhaps due to temperature change, you >discover that your clock frequency is wrong... This >is without even getting into the concept of leap >seconds, which are determined a few months in advance. > >Two guesses: > >1. leap seconds >2. SMP, with cycle counters out of sync > > I'm seeing it on two machines now, I'm going to test on more machines as I get access. The second machine is my notebook with procps 3.1.15 on it, and it does it at the 46 second mark, also 2.6.3. I can see if a process long in the past would have a different time set on it, but shouldn't the entry in /proc coincide with the system clock that date is accessing? Or how many different "clocks" does the kernel have going? powerix conf.d # ps --info BSD j OL_j BSD l OL_l BSD s OL_s BSD u OL_u BSD v OL_v SysV -f (none) SysV -fl (none) SysV -j (none) SysV -l (none) procps version 3.1.15 Linux version 2.6.3 Compiled with: glibc 2.3, gcc 3.3 header_gap=-1 lines_to_next_header=1 screen_cols=91 screen_rows=29 personality=0x00000000 (from "unknown") EUID=0 TTY=136,3 Hertz=100 PAGE_SIZE=4096 page_size=4096 sizeof(proc_t)=492 sizeof(long)=4 sizeof(KLONG)=4 archdefs: i386 namelist_file="<no System.map file>" Actually, it seems that there is a -significant- time difference in this phantom clock now, I suspended my notebook to bring it home from the station, and now this time difference is greater than 9 minutes. I suspect it's roughly 46 seconds plus the amount of time that my notebook was suspended. Yes, I'm running ntpd. root 16894 0.0 0.0 1544 484 pts/3 S Feb24 0:00 grep grep ps Wed Feb 25 00:09:09 EST 2004 David ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: /proc or ps tools bug? 2.6.3, time is off 2004-02-25 5:10 ` David Ford @ 2004-02-25 3:27 ` Albert Cahalan 2004-02-25 16:28 ` George Anzinger 0 siblings, 1 reply; 36+ messages in thread From: Albert Cahalan @ 2004-02-25 3:27 UTC (permalink / raw) To: David Ford; +Cc: Albert Cahalan, linux-kernel mailing list On Wed, 2004-02-25 at 00:10, David Ford wrote: > I can see if a process long in the past would have a different time set > on it, but shouldn't the entry in /proc coincide with the system clock > that date is accessing? Or how many different "clocks" does the kernel > have going? There are way too many clocks, none of which are good. > Actually, it seems that there is a -significant- time difference in this > phantom clock now, I suspended my notebook to bring it home from the > station, and now this time difference is greater than 9 minutes. I > suspect it's roughly 46 seconds plus the amount of time that my notebook > was suspended. Yes, I'm running ntpd. > > root 16894 0.0 0.0 1544 484 pts/3 S Feb24 0:00 grep grep ps > Wed Feb 25 00:09:09 EST 2004 OK, this is pointing right at the problem. Linux does not record process start times at all. Instead, it records the number of clock ticks from boot until the process starts. Either the boot time or current time is real. The other may be computed from the uptime, which may be measured in clock ticks. The clock doesn't tick when your laptop sleeps. I seem to recall recent changes to the way the boot time in /proc/stat gets reported. In any case, a sleeping laptop suggests some interesting questions about process run times. Here's another one to make you scream: Linux does not supply real %CPU data. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: /proc or ps tools bug? 2.6.3, time is off 2004-02-25 3:27 ` Albert Cahalan @ 2004-02-25 16:28 ` George Anzinger 2004-02-25 16:04 ` Albert Cahalan 0 siblings, 1 reply; 36+ messages in thread From: George Anzinger @ 2004-02-25 16:28 UTC (permalink / raw) To: Albert Cahalan; +Cc: David Ford, linux-kernel mailing list Albert Cahalan wrote: > On Wed, 2004-02-25 at 00:10, David Ford wrote: > > >>I can see if a process long in the past would have a different time set >>on it, but shouldn't the entry in /proc coincide with the system clock >>that date is accessing? Or how many different "clocks" does the kernel >>have going? > > > There are way too many clocks, none of which are good. > > >>Actually, it seems that there is a -significant- time difference in this >>phantom clock now, I suspended my notebook to bring it home from the >>station, and now this time difference is greater than 9 minutes. I >>suspect it's roughly 46 seconds plus the amount of time that my notebook >>was suspended. Yes, I'm running ntpd. >> >>root 16894 0.0 0.0 1544 484 pts/3 S Feb24 0:00 grep grep ps >>Wed Feb 25 00:09:09 EST 2004 > > > OK, this is pointing right at the problem. > > Linux does not record process start times at all. > Instead, it records the number of clock ticks > from boot until the process starts. > > Either the boot time or current time is real. > The other may be computed from the uptime, which > may be measured in clock ticks. In 2.6.* boot time is captured at boot. This is then adjusted when ever the clock is set. Up time is the difference between the saved boot time and the current wall clock time. > > The clock doesn't tick when your laptop sleeps. I would guess that the clock adjustment made when the sleep ends is not adjusting the boot time as it should. That code should set the clock by calling do_settimeofday() which will do the right thing. As to small drifts of ~170 PPM, they are caused by code (ps I would guess) that assumes that jiffies is exactly 1/HZ whereas it is NOT in the 2.6.* kernel. The size of the jiffie that the kernel uses is returned by: struct timespec tv; : : clock_res(CLOCK_REALTIME, &tv); This will be in nanoseconds (and must be as that is what the wall clock is in). George > I seem to recall recent changes to the way the > boot time in /proc/stat gets reported. In any > case, a sleeping laptop suggests some interesting > questions about process run times. > > Here's another one to make you scream: Linux does > not supply real %CPU data. > > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: /proc or ps tools bug? 2.6.3, time is off 2004-02-25 16:28 ` George Anzinger @ 2004-02-25 16:04 ` Albert Cahalan 2004-02-25 20:45 ` George Anzinger 2004-02-25 21:10 ` George Anzinger 0 siblings, 2 replies; 36+ messages in thread From: Albert Cahalan @ 2004-02-25 16:04 UTC (permalink / raw) To: George Anzinger; +Cc: Albert Cahalan, David Ford, linux-kernel mailing list On Wed, 2004-02-25 at 11:28, George Anzinger wrote: > Albert Cahalan wrote: >> On Wed, 2004-02-25 at 00:10, David Ford wrote: >> Actually, it seems that there is a -significant- time difference in this >>> phantom clock now, I suspended my notebook to bring it home from the >>> station, and now this time difference is greater than 9 minutes. I >>> suspect it's roughly 46 seconds plus the amount of time that my notebook >>> was suspended. Yes, I'm running ntpd. >>> >>> root 16894 0.0 0.0 1544 484 pts/3 S Feb24 0:00 grep grep ps >>> Wed Feb 25 00:09:09 EST 2004 >> >> OK, this is pointing right at the problem. >> >> Linux does not record process start times at all. >> Instead, it records the number of clock ticks >> from boot until the process starts. >> >> Either the boot time or current time is real. >> The other may be computed from the uptime, which >> may be measured in clock ticks. > > In 2.6.* boot time is captured at boot. This is then adjusted when ever the > clock is set. Up time is the difference between the saved boot time and the > current wall clock time. > >> The clock doesn't tick when your laptop sleeps. > > I would guess that the clock adjustment made when the sleep ends is not > adjusting the boot time as it should. That code should set the clock by calling > do_settimeofday() which will do the right thing. I don't think so. The problem might be fixable by advancing jiffies, crediting the extra ticks to idle time. Consider the current situation as I know it, in jiffies: 00000 boot 10000 process 42 starts 20000 go to sleep 20000 wake (same jiffies, different time) 30000 process 51 starts 40000 ps examines the state of the system Process 42 was started 10 seconds after boot. (10000 jiffies) Process 51 appears to be started 30 seconds after boot. (30000 jiffies + ???) Now we want to compute: 1. real-world date and time for process start 2. length of process lifetime (real-world or not?) What works for process 42 won't work for process 51, because they are on different sides of a hidden gap. Another way to fix the problem is to move the boot time. It's kind of sick, but so are the alternatives. > As to small drifts of ~170 PPM, they are caused by code (ps I would guess) that > assumes that jiffies is exactly 1/HZ whereas it is NOT in the 2.6.* kernel. The > size of the jiffie that the kernel uses is returned by: > > struct timespec tv; > : > : > clock_res(CLOCK_REALTIME, &tv); > > This will be in nanoseconds (and must be as that is what the wall clock is in). This is NOT sane. Remeber that procps doesn't get to see HZ. Only USER_HZ is available, as the AT_CLKTCK ELF note. I think the way to fix this is to skip or add a tick every now and then, so that the long-term HZ is exact. Another way is to simply choose between pure old-style tick-based timekeeping and pure new-style cycle-based (TSC or ACPI) timekeeping. Systems with uncooperative hardware have to use the old-style time keeping. This should simply the code greatly. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: /proc or ps tools bug? 2.6.3, time is off 2004-02-25 16:04 ` Albert Cahalan @ 2004-02-25 20:45 ` George Anzinger 2004-02-25 19:16 ` Albert Cahalan 2004-02-25 21:10 ` George Anzinger 1 sibling, 1 reply; 36+ messages in thread From: George Anzinger @ 2004-02-25 20:45 UTC (permalink / raw) To: Albert Cahalan; +Cc: David Ford, linux-kernel mailing list Albert Cahalan wrote: > On Wed, 2004-02-25 at 11:28, George Anzinger wrote: > >>Albert Cahalan wrote: >> >>>On Wed, 2004-02-25 at 00:10, David Ford wrote: > > >>>Actually, it seems that there is a -significant- time difference in this >>> >>>>phantom clock now, I suspended my notebook to bring it home from the >>>>station, and now this time difference is greater than 9 minutes. I >>>>suspect it's roughly 46 seconds plus the amount of time that my notebook >>>>was suspended. Yes, I'm running ntpd. >>>> >>>>root 16894 0.0 0.0 1544 484 pts/3 S Feb24 0:00 grep grep ps >>>>Wed Feb 25 00:09:09 EST 2004 >>> >>>OK, this is pointing right at the problem. >>> >>>Linux does not record process start times at all. >>>Instead, it records the number of clock ticks >>>from boot until the process starts. >>> >>>Either the boot time or current time is real. >>>The other may be computed from the uptime, which >>>may be measured in clock ticks. >> >>In 2.6.* boot time is captured at boot. This is then adjusted when ever the >>clock is set. Up time is the difference between the saved boot time and the >>current wall clock time. >> >> >>>The clock doesn't tick when your laptop sleeps. >> >>I would guess that the clock adjustment made when the sleep ends is not >>adjusting the boot time as it should. That code should set the clock by calling >>do_settimeofday() which will do the right thing. > > > I don't think so. The problem might be fixable by advancing > jiffies, crediting the extra ticks to idle time. > Consider the current situation as I know it, in jiffies: > > 00000 boot > 10000 process 42 starts > 20000 go to sleep > 20000 wake (same jiffies, different time) > 30000 process 51 starts > 40000 ps examines the state of the system > > Process 42 was started 10 seconds after boot. (10000 jiffies) > Process 51 appears to be started 30 seconds after boot. (30000 jiffies + ???) > > Now we want to compute: > > 1. real-world date and time for process start > 2. length of process lifetime (real-world or not?) > > What works for process 42 won't work for process 51, > because they are on different sides of a hidden gap. > > Another way to fix the problem is to move the boot time. > It's kind of sick, but so are the alternatives. > > >>As to small drifts of ~170 PPM, they are caused by code (ps I would guess) that >>assumes that jiffies is exactly 1/HZ whereas it is NOT in the 2.6.* kernel. The >>size of the jiffie that the kernel uses is returned by: >> >>struct timespec tv; >>: >>: >>clock_res(CLOCK_REALTIME, &tv); >> >>This will be in nanoseconds (and must be as that is what the wall clock is in). > > > This is NOT sane. Remeber that procps doesn't get to see HZ. > Only USER_HZ is available, as the AT_CLKTCK ELF note. May be, I did not do this, but only cleaned up the internal notion of jiffy so timers would work more correctly. If you go back to HZ=100, every thing works better in this regard. On the other hand, what practical difference does it make? Almost no user code even looks at USER_HZ. Its just things like ps and friends as far as I can tell... Possibly we should just fix the utilities to use the above call to get the jiffie size... I don't know the full history, but was USER_HZ invented by the 2.5 changes? > > I think the way to fix this is to skip or add a tick > every now and then, so that the long-term HZ is exact. This is REAL problem for any code that wants to use more exact time/ timers than the 1/HZ. See, for example, the high res patch (signature). You can not just throw in an extra tick every so often. > > Another way is to simply choose between pure old-style > tick-based timekeeping and pure new-style cycle-based > (TSC or ACPI) timekeeping. Systems with uncooperative > hardware have to use the old-style time keeping. This > should simply the code greatly. Hm, the reason 1/HZ is not used is that the x86 hardware (PIT, to be exact) can not give a good 1/1000 value... > > > > -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: /proc or ps tools bug? 2.6.3, time is off 2004-02-25 20:45 ` George Anzinger @ 2004-02-25 19:16 ` Albert Cahalan 0 siblings, 0 replies; 36+ messages in thread From: Albert Cahalan @ 2004-02-25 19:16 UTC (permalink / raw) To: George Anzinger; +Cc: Albert Cahalan, David Ford, linux-kernel mailing list George Anzinger writes: > Albert Cahalan wrote: >> On Wed, 2004-02-25 at 11:28, George Anzinger wrote: >>> As to small drifts of ~170 PPM, they are caused by code >>> (ps I would guess) that assumes that jiffies is exactly >>> 1/HZ whereas it is NOT in the 2.6.* kernel. The size of >>> the jiffie that the kernel uses is returned by: >>> >>> struct timespec tv; >>> : >>> : >>> clock_res(CLOCK_REALTIME, &tv); >>> >>> This will be in nanoseconds (and must be as that is what the wall clock is in). >> >> This is NOT sane. Remeber that procps doesn't get to see HZ. >> Only USER_HZ is available, as the AT_CLKTCK ELF note. > > May be, I did not do this, but only cleaned up the internal > notion of jiffy so timers would work more correctly. If you > go back to HZ=100, every thing works better in this regard. > > On the other hand, what practical difference does it make? > Almost no user code even looks at USER_HZ. Its just things > like ps and friends as far as I can tell... Possibly we > should just fix the utilities to use the above call to get > the jiffie size... I don't know the full history, but was > USER_HZ invented by the 2.5 changes? USER_HZ was invented by the 2.5 changes. Linus has decreed that USER_HZ is part of the ABI. For some reason (ARM port or stubborn glibc hacker?) he has allowed USER_HZ to be exposed via an ELF note. Prior to that, he'd refused all attempts to get HZ exported through /proc/sys and similar. I'm OK with any integer value as long as I can get it. On older kernels procps will guess HZ from the uptime and clock ticks, since there is a long history of people running with non-standard HZ values. Since the ABI is that USER_HZ==100, the kernel is currently in violation. Perhaps the HZ-to-USER_HZ conversion needs to be redone. USER_HZ appears in SCSI ioctls, network stats, an old clock ("clocks"? "times"?) syscall... >> I think the way to fix this is to skip or add a tick >> every now and then, so that the long-term HZ is exact. > > This is REAL problem for any code that wants to use > more exact time/ timers than the 1/HZ. See, for example, > the high res patch (signature). You can not just throw > in an extra tick every so often. You're just considering short-term time scales. The extra ticks, over a period of many days, lead you to the exact time. I suppose it would be possible to have things both ways. Raw jiffies is as it is today. Then we have a correction factor that gets adjusted as needed to ensure that we can get long-term-exact 1/HZ ticks as: jiffies + correction >> Another way is to simply choose between pure old-style >> tick-based timekeeping and pure new-style cycle-based >> (TSC or ACPI) timekeeping. Systems with uncooperative >> hardware have to use the old-style time keeping. This >> should simply the code greatly. > > Hm, the reason 1/HZ is not used is that the x86 hardware > (PIT, to be exact) can not give a good 1/1000 value... If using the PIT: a. no broken attempt at high-res timekeeping b. add or skip whole ticks as needed for timekeeping Common timekeeping tasks don't fit neatly on jiffie ticks anyway. You need 16.683 ticks per NTSC field, etc. When you fully push timekeeping onto the TSC, ACPI timer, or PowerPC bus counter, then you can have a relatively crud-free high-res implementation. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: /proc or ps tools bug? 2.6.3, time is off 2004-02-25 16:04 ` Albert Cahalan 2004-02-25 20:45 ` George Anzinger @ 2004-02-25 21:10 ` George Anzinger 2004-02-26 1:52 ` john stultz 1 sibling, 1 reply; 36+ messages in thread From: George Anzinger @ 2004-02-25 21:10 UTC (permalink / raw) To: Albert Cahalan; +Cc: David Ford, linux-kernel mailing list Albert Cahalan wrote: > On Wed, 2004-02-25 at 11:28, George Anzinger wrote: > >>Albert Cahalan wrote: >> >>>On Wed, 2004-02-25 at 00:10, David Ford wrote: > > >>>Actually, it seems that there is a -significant- time difference in this >>> >>>>phantom clock now, I suspended my notebook to bring it home from the >>>>station, and now this time difference is greater than 9 minutes. I >>>>suspect it's roughly 46 seconds plus the amount of time that my notebook >>>>was suspended. Yes, I'm running ntpd. >>>> >>>>root 16894 0.0 0.0 1544 484 pts/3 S Feb24 0:00 grep grep ps >>>>Wed Feb 25 00:09:09 EST 2004 >>> >>>OK, this is pointing right at the problem. >>> >>>Linux does not record process start times at all. >>>Instead, it records the number of clock ticks >>>from boot until the process starts. >>> >>>Either the boot time or current time is real. >>>The other may be computed from the uptime, which >>>may be measured in clock ticks. >> >>In 2.6.* boot time is captured at boot. This is then adjusted when ever the >>clock is set. Up time is the difference between the saved boot time and the >>current wall clock time. >> >> >>>The clock doesn't tick when your laptop sleeps. >> >>I would guess that the clock adjustment made when the sleep ends is not >>adjusting the boot time as it should. That code should set the clock by calling >>do_settimeofday() which will do the right thing. > > > I don't think so. The problem might be fixable by advancing > jiffies, crediting the extra ticks to idle time. > Consider the current situation as I know it, in jiffies: > > 00000 boot > 10000 process 42 starts > 20000 go to sleep > 20000 wake (same jiffies, different time) > 30000 process 51 starts > 40000 ps examines the state of the system > > Process 42 was started 10 seconds after boot. (10000 jiffies) > Process 51 appears to be started 30 seconds after boot. (30000 jiffies + ???) > > Now we want to compute: > > 1. real-world date and time for process start > 2. length of process lifetime (real-world or not?) > > What works for process 42 won't work for process 51, > because they are on different sides of a hidden gap. > > Another way to fix the problem is to move the boot time. > It's kind of sick, but so are the alternatives. > > >>As to small drifts of ~170 PPM, they are caused by code (ps I would guess) that >>assumes that jiffies is exactly 1/HZ whereas it is NOT in the 2.6.* kernel. The >>size of the jiffie that the kernel uses is returned by: >> >>struct timespec tv; >>: >>: >>clock_res(CLOCK_REALTIME, &tv); >> >>This will be in nanoseconds (and must be as that is what the wall clock is in). > > > This is NOT sane. Remeber that procps doesn't get to see HZ. > Only USER_HZ is available, as the AT_CLKTCK ELF note. > > I think the way to fix this is to skip or add a tick > every now and then, so that the long-term HZ is exact. > > Another way is to simply choose between pure old-style > tick-based timekeeping and pure new-style cycle-based > (TSC or ACPI) timekeeping. Systems with uncooperative > hardware have to use the old-style time keeping. This > should simply the code greatly. On checking the code and thinking about this, I would suggest that we change start_time in the task struct to be the wall time (or monotonic time if that seems better). I only find two places this is used, in proc and in the accounting code. Both of these could easily be changed. Of course, even leaving it as it is, they could be changed to report more correct values by using the correct conversions to translate the system HZ to USER_HZ. Hm, OK, I will work up a patch to do the ladder, i.e. to correctly convert the elapsed jiffies to USER_HZ and (for the accounting code) seconds. > > > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: /proc or ps tools bug? 2.6.3, time is off 2004-02-25 21:10 ` George Anzinger @ 2004-02-26 1:52 ` john stultz 2004-02-26 23:06 ` George Anzinger 2004-02-26 23:14 ` George Anzinger 0 siblings, 2 replies; 36+ messages in thread From: john stultz @ 2004-02-26 1:52 UTC (permalink / raw) To: george anzinger; +Cc: Albert Cahalan, David Ford, linux-kernel mailing list On Wed, 2004-02-25 at 13:10, George Anzinger wrote: > Albert Cahalan wrote: > > This is NOT sane. Remeber that procps doesn't get to see HZ. > > Only USER_HZ is available, as the AT_CLKTCK ELF note. > > > > I think the way to fix this is to skip or add a tick > > every now and then, so that the long-term HZ is exact. > > > > Another way is to simply choose between pure old-style > > tick-based timekeeping and pure new-style cycle-based > > (TSC or ACPI) timekeeping. Systems with uncooperative > > hardware have to use the old-style time keeping. This > > should simply the code greatly. > > On checking the code and thinking about this, I would suggest that we change > start_time in the task struct to be the wall time (or monotonic time if that > seems better). I only find two places this is used, in proc and in the > accounting code. Both of these could easily be changed. Of course, even > leaving it as it is, they could be changed to report more correct values by > using the correct conversions to translate the system HZ to USER_HZ. Is this close to what your thinking of? I can't reproduce the issue on my systems, so I'll need someone else to test this. thanks -john --- 1.5/include/linux/times.h Sun Nov 9 19:26:08 2003 +++ edited/include/linux/times.h Wed Feb 25 17:39:11 2004 @@ -7,7 +7,7 @@ #include <asm/param.h> #if (HZ % USER_HZ)==0 -# define jiffies_to_clock_t(x) ((x) / (HZ / USER_HZ)) +# define jiffies_to_clock_t(x) (((x*TICK_NSEC*HZ)/NSEC_PER_SEC) / (HZ / USER_HZ)) #else # define jiffies_to_clock_t(x) ((clock_t) jiffies_64_to_clock_t((u64) x)) #endif ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: /proc or ps tools bug? 2.6.3, time is off 2004-02-26 1:52 ` john stultz @ 2004-02-26 23:06 ` George Anzinger 2004-02-26 23:10 ` john stultz 2004-02-26 23:14 ` George Anzinger 1 sibling, 1 reply; 36+ messages in thread From: George Anzinger @ 2004-02-26 23:06 UTC (permalink / raw) To: john stultz; +Cc: Albert Cahalan, David Ford, linux-kernel mailing list john stultz wrote: > On Wed, 2004-02-25 at 13:10, George Anzinger wrote: > >>Albert Cahalan wrote: >> >>>This is NOT sane. Remeber that procps doesn't get to see HZ. >>>Only USER_HZ is available, as the AT_CLKTCK ELF note. >>> >>>I think the way to fix this is to skip or add a tick >>>every now and then, so that the long-term HZ is exact. >>> >>>Another way is to simply choose between pure old-style >>>tick-based timekeeping and pure new-style cycle-based >>>(TSC or ACPI) timekeeping. Systems with uncooperative >>>hardware have to use the old-style time keeping. This >>>should simply the code greatly. >> >>On checking the code and thinking about this, I would suggest that we change >>start_time in the task struct to be the wall time (or monotonic time if that >>seems better). I only find two places this is used, in proc and in the >>accounting code. Both of these could easily be changed. Of course, even >>leaving it as it is, they could be changed to report more correct values by >>using the correct conversions to translate the system HZ to USER_HZ. > > > Is this close to what your thinking of? > I can't reproduce the issue on my systems, so I'll need someone else to > test this. More or less. I wonder if: static inline long jiffies_to_clock_t(long x) { u64 tmp = (u64)x * TICK_NSEC; div64(tmp, (NSEC_PER_SEC / USER_HZ)); return (long)x; } might be better as it addresses the overflow issue. Should be able to toss the #if (HZ % USER_HZ)==0 test too. We could get carried away and do scaled math to eliminate the div64 but I don't think this path is used enough to justify the clarity ;) that would make. -g > > thanks > -john > > --- 1.5/include/linux/times.h Sun Nov 9 19:26:08 2003 > +++ edited/include/linux/times.h Wed Feb 25 17:39:11 2004 > @@ -7,7 +7,7 @@ > #include <asm/param.h> > > #if (HZ % USER_HZ)==0 > -# define jiffies_to_clock_t(x) ((x) / (HZ / USER_HZ)) > +# define jiffies_to_clock_t(x) (((x*TICK_NSEC*HZ)/NSEC_PER_SEC) / (HZ / USER_HZ)) > #else > # define jiffies_to_clock_t(x) ((clock_t) jiffies_64_to_clock_t((u64) x)) > #endif > > > > > -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: /proc or ps tools bug? 2.6.3, time is off 2004-02-26 23:06 ` George Anzinger @ 2004-02-26 23:10 ` john stultz 2004-02-27 0:20 ` George Anzinger 0 siblings, 1 reply; 36+ messages in thread From: john stultz @ 2004-02-26 23:10 UTC (permalink / raw) To: george anzinger; +Cc: Albert Cahalan, David Ford, linux-kernel mailing list On Thu, 2004-02-26 at 15:06, George Anzinger wrote: > john stultz wrote: > > On Wed, 2004-02-25 at 13:10, George Anzinger wrote: > > > >>Albert Cahalan wrote: > >> > >>>This is NOT sane. Remeber that procps doesn't get to see HZ. > >>>Only USER_HZ is available, as the AT_CLKTCK ELF note. > >>> > >>>I think the way to fix this is to skip or add a tick > >>>every now and then, so that the long-term HZ is exact. > >>> > >>>Another way is to simply choose between pure old-style > >>>tick-based timekeeping and pure new-style cycle-based > >>>(TSC or ACPI) timekeeping. Systems with uncooperative > >>>hardware have to use the old-style time keeping. This > >>>should simply the code greatly. > >> > >>On checking the code and thinking about this, I would suggest that we change > >>start_time in the task struct to be the wall time (or monotonic time if that > >>seems better). I only find two places this is used, in proc and in the > >>accounting code. Both of these could easily be changed. Of course, even > >>leaving it as it is, they could be changed to report more correct values by > >>using the correct conversions to translate the system HZ to USER_HZ. > > > > > > Is this close to what your thinking of? > > I can't reproduce the issue on my systems, so I'll need someone else to > > test this. > > More or less. I wonder if: > static inline long jiffies_to_clock_t(long x) > { > u64 tmp = (u64)x * TICK_NSEC; > div64(tmp, (NSEC_PER_SEC / USER_HZ)); > return (long)x; > } > might be better as it addresses the overflow issue. Should be able to toss the > #if (HZ % USER_HZ)==0 test too. We could get carried away and do scaled math to > eliminate the div64 but I don't think this path is used enough to justify the > clarity ;) that would make. Sounds good to me. Would you mind sending the diff so Petri and David could test it? thanks -john ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: /proc or ps tools bug? 2.6.3, time is off 2004-02-26 23:10 ` john stultz @ 2004-02-27 0:20 ` George Anzinger 2004-04-13 22:38 ` john stultz 0 siblings, 1 reply; 36+ messages in thread From: George Anzinger @ 2004-02-27 0:20 UTC (permalink / raw) To: john stultz; +Cc: Albert Cahalan, David Ford, linux-kernel mailing list john stultz wrote: > On Thu, 2004-02-26 at 15:06, George Anzinger wrote: > >>john stultz wrote: >> >>>On Wed, 2004-02-25 at 13:10, George Anzinger wrote: >>> >>> >>>>Albert Cahalan wrote: >>>> >>>> >>>>>This is NOT sane. Remeber that procps doesn't get to see HZ. >>>>>Only USER_HZ is available, as the AT_CLKTCK ELF note. >>>>> >>>>>I think the way to fix this is to skip or add a tick >>>>>every now and then, so that the long-term HZ is exact. >>>>> >>>>>Another way is to simply choose between pure old-style >>>>>tick-based timekeeping and pure new-style cycle-based >>>>>(TSC or ACPI) timekeeping. Systems with uncooperative >>>>>hardware have to use the old-style time keeping. This >>>>>should simply the code greatly. >>>> >>>>On checking the code and thinking about this, I would suggest that we change >>>>start_time in the task struct to be the wall time (or monotonic time if that >>>>seems better). I only find two places this is used, in proc and in the >>>>accounting code. Both of these could easily be changed. Of course, even >>>>leaving it as it is, they could be changed to report more correct values by >>>>using the correct conversions to translate the system HZ to USER_HZ. >>> >>> >>>Is this close to what your thinking of? >>>I can't reproduce the issue on my systems, so I'll need someone else to >>>test this. >> >>More or less. I wonder if: > > >>static inline long jiffies_to_clock_t(long x) >>{ >> u64 tmp = (u64)x * TICK_NSEC; >> div64(tmp, (NSEC_PER_SEC / USER_HZ)); >> return (long)x; >>} >>might be better as it addresses the overflow issue. Should be able to toss the >>#if (HZ % USER_HZ)==0 test too. We could get carried away and do scaled math to >>eliminate the div64 but I don't think this path is used enough to justify the >>clarity ;) that would make. > > > Sounds good to me. Would you mind sending the diff so Petri and David > could test it? Oops, I have been caught :) The above was composed in the email window. I don't have a 2.6.x kernel up at the moment and I don't have any free cycles... Late next week?? > -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: /proc or ps tools bug? 2.6.3, time is off 2004-02-27 0:20 ` George Anzinger @ 2004-04-13 22:38 ` john stultz 2004-04-13 22:59 ` George Anzinger 2004-04-14 12:10 ` Tim Schmielau 0 siblings, 2 replies; 36+ messages in thread From: john stultz @ 2004-04-13 22:38 UTC (permalink / raw) To: george anzinger; +Cc: Albert Cahalan, David Ford, linux-kernel mailing list On Thu, 2004-02-26 at 16:20, George Anzinger wrote: > john stultz wrote: > > On Thu, 2004-02-26 at 15:06, George Anzinger wrote: > >>john stultz wrote: > >>>On Wed, 2004-02-25 at 13:10, George Anzinger wrote: > >>>>Albert Cahalan wrote: > >>>> > >>>>>This is NOT sane. Remeber that procps doesn't get to see HZ. > >>>>>Only USER_HZ is available, as the AT_CLKTCK ELF note. > >>>>> > >>>>>I think the way to fix this is to skip or add a tick > >>>>>every now and then, so that the long-term HZ is exact. > >>>>> > >>>>>Another way is to simply choose between pure old-style > >>>>>tick-based timekeeping and pure new-style cycle-based > >>>>>(TSC or ACPI) timekeeping. Systems with uncooperative > >>>>>hardware have to use the old-style time keeping. This > >>>>>should simply the code greatly. > >>>> > >>>>On checking the code and thinking about this, I would suggest that we change > >>>>start_time in the task struct to be the wall time (or monotonic time if that > >>>>seems better). I only find two places this is used, in proc and in the > >>>>accounting code. Both of these could easily be changed. Of course, even > >>>>leaving it as it is, they could be changed to report more correct values by > >>>>using the correct conversions to translate the system HZ to USER_HZ. > >>> > >>> > >>>Is this close to what your thinking of? > >>>I can't reproduce the issue on my systems, so I'll need someone else to > >>>test this. > >> > >>More or less. I wonder if: > > > >>static inline long jiffies_to_clock_t(long x) > >>{ > >> u64 tmp = (u64)x * TICK_NSEC; > >> div64(tmp, (NSEC_PER_SEC / USER_HZ)); > >> return (long)x; > >>} > >>might be better as it addresses the overflow issue. Should be able to toss the > >>#if (HZ % USER_HZ)==0 test too. We could get carried away and do scaled math to > >>eliminate the div64 but I don't think this path is used enough to justify the > >>clarity ;) that would make. > > > > Sounds good to me. Would you mind sending the diff so Petri and David > > could test it? > > Oops, I have been caught :) The above was composed in the email window. I > don't have a 2.6.x kernel up at the moment and I don't have any free cycles... > Late next week?? Finally got a chance to go through my work queue and yikes! This is seriously stale! As neither George or I have come to bat with a patch, I'll attempt a swing. Albert/David: Would you mind testing the following to see if it resolves the issue for you? George: Mind skimming this to make sure its close enough to what you intended? thanks -john diff -Nru a/include/linux/times.h b/include/linux/times.h --- a/include/linux/times.h Tue Apr 13 15:00:25 2004 +++ b/include/linux/times.h Tue Apr 13 15:00:25 2004 @@ -7,7 +7,12 @@ #include <asm/param.h> #if (HZ % USER_HZ)==0 -# define jiffies_to_clock_t(x) ((x) / (HZ / USER_HZ)) +static inline long jiffies_to_clock_t(long x) +{ + u64 tmp = (u64)x * TICK_NSEC; + x = do_div(tmp, (NSEC_PER_SEC / USER_HZ)); + return (long)tmp; +} #else # define jiffies_to_clock_t(x) ((clock_t) jiffies_64_to_clock_t((u64) x)) #endif ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: /proc or ps tools bug? 2.6.3, time is off 2004-04-13 22:38 ` john stultz @ 2004-04-13 22:59 ` George Anzinger 2004-04-14 12:10 ` Tim Schmielau 1 sibling, 0 replies; 36+ messages in thread From: George Anzinger @ 2004-04-13 22:59 UTC (permalink / raw) To: john stultz; +Cc: Albert Cahalan, David Ford, linux-kernel mailing list john stultz wrote: > On Thu, 2004-02-26 at 16:20, George Anzinger wrote: > >>john stultz wrote: >> >>>On Thu, 2004-02-26 at 15:06, George Anzinger wrote: >>> >>>>john stultz wrote: >>>> >>>>>On Wed, 2004-02-25 at 13:10, George Anzinger wrote: >>>>> >>>>>>Albert Cahalan wrote: >>>>>> >>>>>> >>>>>>>This is NOT sane. Remeber that procps doesn't get to see HZ. >>>>>>>Only USER_HZ is available, as the AT_CLKTCK ELF note. >>>>>>> >>>>>>>I think the way to fix this is to skip or add a tick >>>>>>>every now and then, so that the long-term HZ is exact. >>>>>>> >>>>>>>Another way is to simply choose between pure old-style >>>>>>>tick-based timekeeping and pure new-style cycle-based >>>>>>>(TSC or ACPI) timekeeping. Systems with uncooperative >>>>>>>hardware have to use the old-style time keeping. This >>>>>>>should simply the code greatly. >>>>>> >>>>>>On checking the code and thinking about this, I would suggest that we change >>>>>>start_time in the task struct to be the wall time (or monotonic time if that >>>>>>seems better). I only find two places this is used, in proc and in the >>>>>>accounting code. Both of these could easily be changed. Of course, even >>>>>>leaving it as it is, they could be changed to report more correct values by >>>>>>using the correct conversions to translate the system HZ to USER_HZ. >>>>> >>>>> >>>>>Is this close to what your thinking of? >>>>>I can't reproduce the issue on my systems, so I'll need someone else to >>>>>test this. >>>> >>>>More or less. I wonder if: >>> >>>>static inline long jiffies_to_clock_t(long x) >>>>{ >>>> u64 tmp = (u64)x * TICK_NSEC; >>>> div64(tmp, (NSEC_PER_SEC / USER_HZ)); >>>> return (long)x; >>>>} >>>>might be better as it addresses the overflow issue. Should be able to toss the >>>>#if (HZ % USER_HZ)==0 test too. We could get carried away and do scaled math to >>>>eliminate the div64 but I don't think this path is used enough to justify the >>>>clarity ;) that would make. >>> >>>Sounds good to me. Would you mind sending the diff so Petri and David >>>could test it? >> >>Oops, I have been caught :) The above was composed in the email window. I >>don't have a 2.6.x kernel up at the moment and I don't have any free cycles... >>Late next week?? > > > Finally got a chance to go through my work queue and yikes! This is > seriously stale! As neither George or I have come to bat with a patch, > I'll attempt a swing. > > Albert/David: Would you mind testing the following to see if it resolves > the issue for you? > > George: Mind skimming this to make sure its close enough to what you > intended? Looks rather like exactly what I intended. -g > > thanks > -john > > > diff -Nru a/include/linux/times.h b/include/linux/times.h > --- a/include/linux/times.h Tue Apr 13 15:00:25 2004 > +++ b/include/linux/times.h Tue Apr 13 15:00:25 2004 > @@ -7,7 +7,12 @@ > #include <asm/param.h> > > #if (HZ % USER_HZ)==0 > -# define jiffies_to_clock_t(x) ((x) / (HZ / USER_HZ)) > +static inline long jiffies_to_clock_t(long x) > +{ > + u64 tmp = (u64)x * TICK_NSEC; > + x = do_div(tmp, (NSEC_PER_SEC / USER_HZ)); > + return (long)tmp; > +} > #else > # define jiffies_to_clock_t(x) ((clock_t) jiffies_64_to_clock_t((u64) x)) > #endif > > > > > -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: /proc or ps tools bug? 2.6.3, time is off 2004-04-13 22:38 ` john stultz 2004-04-13 22:59 ` George Anzinger @ 2004-04-14 12:10 ` Tim Schmielau 2004-04-14 17:03 ` George Anzinger 2004-04-14 18:28 ` john stultz 1 sibling, 2 replies; 36+ messages in thread From: Tim Schmielau @ 2004-04-14 12:10 UTC (permalink / raw) To: john stultz Cc: george anzinger, Albert Cahalan, David Ford, linux-kernel mailing list > diff -Nru a/include/linux/times.h b/include/linux/times.h > --- a/include/linux/times.h Tue Apr 13 15:00:25 2004 > +++ b/include/linux/times.h Tue Apr 13 15:00:25 2004 > @@ -7,7 +7,12 @@ > #include <asm/param.h> > > #if (HZ % USER_HZ)==0 > -# define jiffies_to_clock_t(x) ((x) / (HZ / USER_HZ)) > +static inline long jiffies_to_clock_t(long x) > +{ > + u64 tmp = (u64)x * TICK_NSEC; > + x = do_div(tmp, (NSEC_PER_SEC / USER_HZ)); > + return (long)tmp; > +} > #else > # define jiffies_to_clock_t(x) ((clock_t) jiffies_64_to_clock_t((u64) x)) > #endif Excuse me for barging in lately and innocently, but I find this patch hard to comprehend: - shouldn't a foo_to_clock_t() function return a clock? - the x = seems superfluous - the #if is not a shortcut anymore, so why keep it? Shouldn't this patch be more like the following (completely untested)? Tim diff -urp --exclude-from dontdiff linux-2.6.5/include/linux/times.h linux-2.6.5-jfix1/include/linux/times.h --- linux-2.6.5/include/linux/times.h 2004-02-04 04:43:09.000000000 +0100 +++ linux-2.6.5-jfix1/include/linux/times.h 2004-04-14 13:48:57.000000000 +0200 @@ -6,11 +6,16 @@ #include <asm/types.h> #include <asm/param.h> -#if (HZ % USER_HZ)==0 -# define jiffies_to_clock_t(x) ((x) / (HZ / USER_HZ)) -#else -# define jiffies_to_clock_t(x) ((clock_t) jiffies_64_to_clock_t((u64) x)) -#endif +static inline clock_t jiffies_to_clock_t(long x) +{ +#if (TICK_NSEC % (NSEC_PER_SEC / USER_HZ)) == 0 + return x / (HZ / USER_HZ); +#else + u64 tmp = (u64)x * TICK_NSEC; + do_div(tmp, (NSEC_PER_SEC / USER_HZ)); + return (long)tmp; +#endif +} static inline unsigned long clock_t_to_jiffies(unsigned long x) { ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: /proc or ps tools bug? 2.6.3, time is off 2004-04-14 12:10 ` Tim Schmielau @ 2004-04-14 17:03 ` George Anzinger 2004-04-14 18:28 ` john stultz 1 sibling, 0 replies; 36+ messages in thread From: George Anzinger @ 2004-04-14 17:03 UTC (permalink / raw) To: Tim Schmielau Cc: john stultz, Albert Cahalan, David Ford, linux-kernel mailing list Tim Schmielau wrote: >>diff -Nru a/include/linux/times.h b/include/linux/times.h >>--- a/include/linux/times.h Tue Apr 13 15:00:25 2004 >>+++ b/include/linux/times.h Tue Apr 13 15:00:25 2004 >>@@ -7,7 +7,12 @@ >> #include <asm/param.h> >> >> #if (HZ % USER_HZ)==0 >>-# define jiffies_to_clock_t(x) ((x) / (HZ / USER_HZ)) >>+static inline long jiffies_to_clock_t(long x) >>+{ >>+ u64 tmp = (u64)x * TICK_NSEC; >>+ x = do_div(tmp, (NSEC_PER_SEC / USER_HZ)); >>+ return (long)tmp; >>+} >> #else >> # define jiffies_to_clock_t(x) ((clock_t) jiffies_64_to_clock_t((u64) x)) >> #endif > > > Excuse me for barging in lately and innocently, but I find this patch > hard to comprehend: > - shouldn't a foo_to_clock_t() function return a clock? > - the x = seems superfluous > - the #if is not a shortcut anymore, so why keep it? > Shouldn't this patch be more like the following > (completely untested)? > > Tim > > > diff -urp --exclude-from dontdiff linux-2.6.5/include/linux/times.h linux-2.6.5-jfix1/include/linux/times.h > --- linux-2.6.5/include/linux/times.h 2004-02-04 04:43:09.000000000 +0100 > +++ linux-2.6.5-jfix1/include/linux/times.h 2004-04-14 13:48:57.000000000 +0200 > @@ -6,11 +6,16 @@ > #include <asm/types.h> > #include <asm/param.h> > > -#if (HZ % USER_HZ)==0 > -# define jiffies_to_clock_t(x) ((x) / (HZ / USER_HZ)) > -#else > -# define jiffies_to_clock_t(x) ((clock_t) jiffies_64_to_clock_t((u64) x)) > -#endif > +static inline clock_t jiffies_to_clock_t(long x) > +{ > +#if (TICK_NSEC % (NSEC_PER_SEC / USER_HZ)) == 0 > + return x / (HZ / USER_HZ); > +#else > + u64 tmp = (u64)x * TICK_NSEC; > + do_div(tmp, (NSEC_PER_SEC / USER_HZ)); > + return (long)tmp; > +#endif > +} > > static inline unsigned long clock_t_to_jiffies(unsigned long x) > { > It does look a bit better. Takes into account the issue of TICK_NSEC being what it is. -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: /proc or ps tools bug? 2.6.3, time is off 2004-04-14 12:10 ` Tim Schmielau 2004-04-14 17:03 ` George Anzinger @ 2004-04-14 18:28 ` john stultz 2004-04-15 10:37 ` Petri Kaukasoina 1 sibling, 1 reply; 36+ messages in thread From: john stultz @ 2004-04-14 18:28 UTC (permalink / raw) To: Tim Schmielau Cc: george anzinger, Albert Cahalan, David Ford, linux-kernel mailing list On Wed, 2004-04-14 at 05:10, Tim Schmielau wrote: > Excuse me for barging in lately and innocently, but I find this patch > hard to comprehend: > - shouldn't a foo_to_clock_t() function return a clock? > - the x = seems superfluous > - the #if is not a shortcut anymore, so why keep it? > Shouldn't this patch be more like the following > (completely untested)? Yes, you're cleanups look much better! Although we still have yet to hear if it resolves the problem. thanks -john ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: /proc or ps tools bug? 2.6.3, time is off 2004-04-14 18:28 ` john stultz @ 2004-04-15 10:37 ` Petri Kaukasoina 2004-04-15 11:05 ` Tim Schmielau 0 siblings, 1 reply; 36+ messages in thread From: Petri Kaukasoina @ 2004-04-15 10:37 UTC (permalink / raw) To: john stultz Cc: Tim Schmielau, george anzinger, Albert Cahalan, David Ford, linux-kernel mailing list On Wed, Apr 14, 2004 at 11:28:15AM -0700, john stultz wrote: > On Wed, 2004-04-14 at 05:10, Tim Schmielau wrote: > > Excuse me for barging in lately and innocently, but I find this patch > > hard to comprehend: > > - shouldn't a foo_to_clock_t() function return a clock? > > - the x = seems superfluous > > - the #if is not a shortcut anymore, so why keep it? > > Shouldn't this patch be more like the following > > (completely untested)? > > Yes, you're cleanups look much better! Although we still have yet to > hear if it resolves the problem. Hi, If we are still talking about the problem with ps showing process start times in future, I'm sorry neither of the patches helped. The error grows here at a rate of 15 seconds in 24 hours as before. -Petri ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: /proc or ps tools bug? 2.6.3, time is off 2004-04-15 10:37 ` Petri Kaukasoina @ 2004-04-15 11:05 ` Tim Schmielau 2004-04-15 16:14 ` Petri Kaukasoina 0 siblings, 1 reply; 36+ messages in thread From: Tim Schmielau @ 2004-04-15 11:05 UTC (permalink / raw) To: Petri Kaukasoina Cc: john stultz, george anzinger, Albert Cahalan, David Ford, linux-kernel mailing list On Thu, 15 Apr 2004, Petri Kaukasoina wrote: > If we are still talking about the problem with ps showing process start > times in future, I'm sorry neither of the patches helped. The error grows > here at a rate of 15 seconds in 24 hours as before. Oops... sure, it cannot. Maybe this one is better... --- linux-2.6.5/include/linux/times.h 2004-02-04 04:43:09.000000000 +0100 +++ linux-2.6.5-jfix1/include/linux/times.h 2004-04-15 12:59:05.000000000 +0200 @@ -6,11 +6,16 @@ #include <asm/types.h> #include <asm/param.h> -#if (HZ % USER_HZ)==0 -# define jiffies_to_clock_t(x) ((x) / (HZ / USER_HZ)) -#else -# define jiffies_to_clock_t(x) ((clock_t) jiffies_64_to_clock_t((u64) x)) -#endif +static inline clock_t jiffies_to_clock_t(long x) +{ +#if (TICK_NSEC % (NSEC_PER_SEC / USER_HZ)) == 0 + return x / (HZ / USER_HZ); +#else + u64 tmp = (u64)x * TICK_NSEC; + do_div(tmp, (NSEC_PER_SEC / USER_HZ)); + return (long)tmp; +#endif +} static inline unsigned long clock_t_to_jiffies(unsigned long x) { @@ -34,7 +39,7 @@ static inline unsigned long clock_t_to_j static inline u64 jiffies_64_to_clock_t(u64 x) { -#if (HZ % USER_HZ)==0 +#if (TICK_NSEC % (NSEC_PER_SEC / USER_HZ)) == 0 do_div(x, HZ / USER_HZ); #else /* @@ -42,8 +47,8 @@ static inline u64 jiffies_64_to_clock_t( * but even this doesn't overflow in hundreds of years * in 64 bits, so.. */ - x *= USER_HZ; - do_div(x, HZ); + x *= TICK_NSEC; + do_div(x, (NSEC_PER_SEC / USER_HZ)); #endif return x; } ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: /proc or ps tools bug? 2.6.3, time is off 2004-04-15 11:05 ` Tim Schmielau @ 2004-04-15 16:14 ` Petri Kaukasoina 2004-05-01 13:51 ` Tim Schmielau 0 siblings, 1 reply; 36+ messages in thread From: Petri Kaukasoina @ 2004-04-15 16:14 UTC (permalink / raw) To: Tim Schmielau Cc: john stultz, george anzinger, Albert Cahalan, David Ford, linux-kernel mailing list On Thu, Apr 15, 2004 at 01:05:17PM +0200, Tim Schmielau wrote: > On Thu, 15 Apr 2004, Petri Kaukasoina wrote: > > > If we are still talking about the problem with ps showing process start > > times in future, I'm sorry neither of the patches helped. The error grows > > here at a rate of 15 seconds in 24 hours as before. > > Oops... > sure, it cannot. Maybe this one is better... > > > --- linux-2.6.5/include/linux/times.h 2004-02-04 04:43:09.000000000 +0100 > +++ linux-2.6.5-jfix1/include/linux/times.h 2004-04-15 12:59:05.000000000 +0200 Yes, it seems to have fixed it. There is a small error: ps shows a start time of a new minute about four seconds too early, but the error stays constant and does not change as a function of uptime any longer. (Actually it still does but only at the same rate as ntpd corrects time.) -Petri ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: /proc or ps tools bug? 2.6.3, time is off 2004-04-15 16:14 ` Petri Kaukasoina @ 2004-05-01 13:51 ` Tim Schmielau 2004-05-02 1:41 ` Andrew Morton 0 siblings, 1 reply; 36+ messages in thread From: Tim Schmielau @ 2004-05-01 13:51 UTC (permalink / raw) To: john stultz, george anzinger; +Cc: Petri Kaukasoina, linux-kernel mailing list On Thu, 15 Apr 2004, Petri Kaukasoina wrote: > On Thu, Apr 15, 2004 at 01:05:17PM +0200, Tim Schmielau wrote: > > On Thu, 15 Apr 2004, Petri Kaukasoina wrote: > > > > > If we are still talking about the problem with ps showing process start > > > times in future, I'm sorry neither of the patches helped. The error grows > > > here at a rate of 15 seconds in 24 hours as before. > > > > Oops... > > sure, it cannot. Maybe this one is better... > > > > > > --- linux-2.6.5/include/linux/times.h 2004-02-04 04:43:09.000000000 +0100 > > +++ linux-2.6.5-jfix1/include/linux/times.h 2004-04-15 12:59:05.000000000 +0200 > > Yes, it seems to have fixed it. There is a small error: ps shows a start > time of a new minute about four seconds too early, but the error stays > constant and does not change as a function of uptime any longer. (Actually > it still does but only at the same rate as ntpd corrects time.) > > -Petri > John, George, can you take care of the patch so it doesn't get lost? I don't know how to handle the ntpd issue, which I guess also is the reason of the four seconds difference. Thanks, Tim --- linux-2.6.5/include/linux/times.h 2004-02-04 04:43:09.000000000 +0100 +++ linux-2.6.5-jfix1/include/linux/times.h 2004-04-15 12:59:05.000000000 +0200 @@ -6,11 +6,16 @@ #include <asm/types.h> #include <asm/param.h> -#if (HZ % USER_HZ)==0 -# define jiffies_to_clock_t(x) ((x) / (HZ / USER_HZ)) -#else -# define jiffies_to_clock_t(x) ((clock_t) jiffies_64_to_clock_t((u64) x)) -#endif +static inline clock_t jiffies_to_clock_t(long x) +{ +#if (TICK_NSEC % (NSEC_PER_SEC / USER_HZ)) == 0 + return x / (HZ / USER_HZ); +#else + u64 tmp = (u64)x * TICK_NSEC; + do_div(tmp, (NSEC_PER_SEC / USER_HZ)); + return (long)tmp; +#endif +} static inline unsigned long clock_t_to_jiffies(unsigned long x) { @@ -34,7 +39,7 @@ static inline unsigned long clock_t_to_j static inline u64 jiffies_64_to_clock_t(u64 x) { -#if (HZ % USER_HZ)==0 +#if (TICK_NSEC % (NSEC_PER_SEC / USER_HZ)) == 0 do_div(x, HZ / USER_HZ); #else /* @@ -42,8 +47,8 @@ static inline u64 jiffies_64_to_clock_t( * but even this doesn't overflow in hundreds of years * in 64 bits, so.. */ - x *= USER_HZ; - do_div(x, HZ); + x *= TICK_NSEC; + do_div(x, (NSEC_PER_SEC / USER_HZ)); #endif return x; } ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: /proc or ps tools bug? 2.6.3, time is off 2004-05-01 13:51 ` Tim Schmielau @ 2004-05-02 1:41 ` Andrew Morton 2004-05-02 1:59 ` Tim Schmielau 0 siblings, 1 reply; 36+ messages in thread From: Andrew Morton @ 2004-05-02 1:41 UTC (permalink / raw) To: Tim Schmielau; +Cc: johnstul, george, kaukasoi, linux-kernel Tim Schmielau <tim@physik3.uni-rostock.de> wrote: > > +#if (TICK_NSEC % (NSEC_PER_SEC / USER_HZ)) == 0 I think this has an inclusion ordering problem. In file included from net/ipv6/route.c:30: include/linux/times.h:11:42: division by zero in #if include/linux/times.h:42:42: division by zero in #if either NSEC_PER_SEC or USER_HZ hasn't been defined yet. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: /proc or ps tools bug? 2.6.3, time is off 2004-05-02 1:41 ` Andrew Morton @ 2004-05-02 1:59 ` Tim Schmielau 2004-05-04 2:40 ` john stultz 0 siblings, 1 reply; 36+ messages in thread From: Tim Schmielau @ 2004-05-02 1:59 UTC (permalink / raw) To: Andrew Morton; +Cc: johnstul, george, kaukasoi, linux-kernel On Sat, 1 May 2004, Andrew Morton wrote: > Tim Schmielau <tim@physik3.uni-rostock.de> wrote: > > > > +#if (TICK_NSEC % (NSEC_PER_SEC / USER_HZ)) == 0 > > I think this has an inclusion ordering problem. > > In file included from net/ipv6/route.c:30: > include/linux/times.h:11:42: division by zero in #if > include/linux/times.h:42:42: division by zero in #if > > either NSEC_PER_SEC or USER_HZ hasn't been defined yet. > Yep, we'd need to include timex.h for it. This get's messy. OK, I found why John's original patch didn't fix the issue, but I'd like to hand the patch off to someone with a vision of how time shall be handled in the kernel. Sorry, Tim ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: /proc or ps tools bug? 2.6.3, time is off 2004-05-02 1:59 ` Tim Schmielau @ 2004-05-04 2:40 ` john stultz 2004-05-04 6:12 ` Tim Schmielau 0 siblings, 1 reply; 36+ messages in thread From: john stultz @ 2004-05-04 2:40 UTC (permalink / raw) To: Andrew Morton; +Cc: Tim Schmielau, george, kaukasoi, linux-kernel, davem On Sat, 2004-05-01 at 18:59, Tim Schmielau wrote: > On Sat, 1 May 2004, Andrew Morton wrote: > > > Tim Schmielau <tim@physik3.uni-rostock.de> wrote: > > > > > > +#if (TICK_NSEC % (NSEC_PER_SEC / USER_HZ)) == 0 > > > > I think this has an inclusion ordering problem. > > Yep, we'd need to include timex.h for it. This get's messy. Well, not too messy. Including timex.h looks to resolve the issue without trouble. Let me know if I somehow stepped over an issue. DaveM: Do please note the changes to the TCP code. Having jiffies_to_clock_t be more then just a #define introduced a few cast warnings, and I believe the fixes are right, but you can't be too sure. jiffies-to-clockt-fix_A1: ------------------------- All, This patch polishes up Tim Schmielau's (tim@physik3.uni-rostock.de) fix for jiffies_to_clock_t() and jiffies_64_to_clock_t(). The issues observed was w/ /proc output not matching up to wall time due to accumulated error caused by HZ not being exactly 1000 on i386 systems. The solution is to correct that error by using the more accurate TICK_NSEC in our calculation. Additionally, this patch corrects 3 warnings in the TCP layer uncovered by this change. ***DaveM please review!*** thanks -john diff -Nru a/include/linux/times.h b/include/linux/times.h --- a/include/linux/times.h Mon May 3 19:24:14 2004 +++ b/include/linux/times.h Mon May 3 19:24:14 2004 @@ -2,15 +2,21 @@ #define _LINUX_TIMES_H #ifdef __KERNEL__ +#include <linux/timex.h> #include <asm/div64.h> #include <asm/types.h> #include <asm/param.h> -#if (HZ % USER_HZ)==0 -# define jiffies_to_clock_t(x) ((x) / (HZ / USER_HZ)) +static inline clock_t jiffies_to_clock_t(long x) +{ +#if (TICK_NSEC % (NSEC_PER_SEC / USER_HZ)) == 0 + return x / (HZ / USER_HZ); #else -# define jiffies_to_clock_t(x) ((clock_t) jiffies_64_to_clock_t((u64) x)) + u64 tmp = (u64)x * TICK_NSEC; + do_div(tmp, (NSEC_PER_SEC / USER_HZ)); + return (long)tmp; #endif +} static inline unsigned long clock_t_to_jiffies(unsigned long x) { @@ -34,7 +40,7 @@ static inline u64 jiffies_64_to_clock_t(u64 x) { -#if (HZ % USER_HZ)==0 +#if (TICK_NSEC % (NSEC_PER_SEC / USER_HZ)) == 0 do_div(x, HZ / USER_HZ); #else /* @@ -42,8 +48,8 @@ * but even this doesn't overflow in hundreds of years * in 64 bits, so.. */ - x *= USER_HZ; - do_div(x, HZ); + x *= TICK_NSEC; + do_div(x, (NSEC_PER_SEC / USER_HZ)); #endif return x; } diff -Nru a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c --- a/net/ipv4/tcp_ipv4.c Mon May 3 19:24:14 2004 +++ b/net/ipv4/tcp_ipv4.c Mon May 3 19:24:14 2004 @@ -2452,7 +2452,7 @@ int ttd = req->expires - jiffies; sprintf(tmpbuf, "%4d: %08X:%04X %08X:%04X" - " %02X %08X:%08X %02X:%08X %08X %5d %8d %u %d %p", + " %02X %08X:%08X %02X:%08lX %08X %5d %8d %u %d %p", i, req->af.v4_req.loc_addr, ntohs(inet_sk(sk)->sport), @@ -2526,7 +2526,7 @@ srcp = ntohs(tw->tw_sport); sprintf(tmpbuf, "%4d: %08X:%04X %08X:%04X" - " %02X %08X:%08X %02X:%08X %08X %5d %8d %d %d %p", + " %02X %08X:%08X %02X:%08lX %08X %5d %8d %d %d %p", i, src, srcp, dest, destp, tw->tw_substate, 0, 0, 3, jiffies_to_clock_t(ttd), 0, 0, 0, 0, atomic_read(&tw->tw_refcnt), tw); diff -Nru a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c --- a/net/ipv6/tcp_ipv6.c Mon May 3 19:24:14 2004 +++ b/net/ipv6/tcp_ipv6.c Mon May 3 19:24:14 2004 @@ -1933,7 +1933,7 @@ dest = &req->af.v6_req.rmt_addr; seq_printf(seq, "%4d: %08X%08X%08X%08X:%04X %08X%08X%08X%08X:%04X " - "%02X %08X:%08X %02X:%08X %08X %5d %8d %d %d %p\n", + "%02X %08X:%08X %02X:%08lX %08X %5d %8d %d %d %p\n", i, src->s6_addr32[0], src->s6_addr32[1], src->s6_addr32[2], src->s6_addr32[3], @@ -2019,7 +2019,7 @@ seq_printf(seq, "%4d: %08X%08X%08X%08X:%04X %08X%08X%08X%08X:%04X " - "%02X %08X:%08X %02X:%08X %08X %5d %8d %d %d %p\n", + "%02X %08X:%08X %02X:%08lX %08X %5d %8d %d %d %p\n", i, src->s6_addr32[0], src->s6_addr32[1], src->s6_addr32[2], src->s6_addr32[3], srcp, ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: /proc or ps tools bug? 2.6.3, time is off 2004-05-04 2:40 ` john stultz @ 2004-05-04 6:12 ` Tim Schmielau 2004-05-04 14:59 ` john stultz 0 siblings, 1 reply; 36+ messages in thread From: Tim Schmielau @ 2004-05-04 6:12 UTC (permalink / raw) To: john stultz; +Cc: Andrew Morton, george, kaukasoi, linux-kernel, davem On Mon, 3 May 2004, john stultz wrote: > On Sat, 2004-05-01 at 18:59, Tim Schmielau wrote: > > > > Yep, we'd need to include timex.h for it. This get's messy. > > Well, not too messy. Including timex.h looks to resolve the issue > without trouble. Let me know if I somehow stepped over an issue. It looks ok, but somehow defeats the whole purpose of having separate include files. Someday we may consolidate all the time related things into just one ore two header files then. > jiffies-to-clockt-fix_A1: > ------------------------- Thanks, John! > All, > This patch polishes up Tim Schmielau's (tim@physik3.uni-rostock.de) fix > for jiffies_to_clock_t() and jiffies_64_to_clock_t(). The issues > observed was w/ /proc output not matching up to wall time due to > accumulated error caused by HZ not being exactly 1000 on i386 systems. > The solution is to correct that error by using the more accurate > TICK_NSEC in our calculation. I wonder whether it's conceptually correct to use jiffies for accurate long-time measurements at all. ntpd is there for a reason. Using both corrected, accurate and freely running clocks IMHO is calling for trouble. This might be something to think about for 2.7. Thanks, Tim ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: /proc or ps tools bug? 2.6.3, time is off 2004-05-04 6:12 ` Tim Schmielau @ 2004-05-04 14:59 ` john stultz 2004-05-04 16:50 ` Tim Schmielau 2004-05-07 0:33 ` George Anzinger 0 siblings, 2 replies; 36+ messages in thread From: john stultz @ 2004-05-04 14:59 UTC (permalink / raw) To: Tim Schmielau; +Cc: Andrew Morton, george, kaukasoi, linux-kernel, davem On Mon, 2004-05-03 at 23:12, Tim Schmielau wrote: > On Mon, 3 May 2004, john stultz wrote: > > This patch polishes up Tim Schmielau's (tim@physik3.uni-rostock.de) fix > > for jiffies_to_clock_t() and jiffies_64_to_clock_t(). The issues > > observed was w/ /proc output not matching up to wall time due to > > accumulated error caused by HZ not being exactly 1000 on i386 systems. > > The solution is to correct that error by using the more accurate > > TICK_NSEC in our calculation. > > I wonder whether it's conceptually correct to use jiffies for accurate > long-time measurements at all. ntpd is there for a reason. Using both > corrected, accurate and freely running clocks IMHO is calling for trouble. > This might be something to think about for 2.7. Indeed. Moving away from jiffies as a time counter and more of an interrupt counter is important. That allows for implementations of variable HZ and other things the high-res timer folks want without affecting the time keeping code. Roughly, I'd like to see the time code for all arches in 2.7 to look like: u64 system_time /* NTP adjusted nanosecs since boot */ u64 wall_time_offset /* offset to system_time for time of day */ u64 offset_base /* last read raw hw value */ ts_read(): returns the raw cycle value from the hardware timesource (TSC/ACPI PM/HPET) ts_delta(now, then): returns the difference between two raw cycle values ts_cyc2ns(cycles): converts a cycle value to ns monotonic_clock(): returns NTP adjusted nanoseconds since boot ie: system_time + NTP_GUNK(ts_cyc2ns(ts_delta(ts_read(),offset_base))) gettimeofday(): returns monotonic_clock() + sys_time_offset settimeofday(): adjusts only sys_time_offset time_interrupt_hook(): updates system_time. called by timer interrupt atleast once every hardware cycle (ie: before the hardware counter overflows), but otherwise unaffected by lost interrupts, etc. ie: then = offset_base now = ts_read() system_time += NTP_GUNK(ts_cyc2ns(ts_delta(now, then))); DO_MORE_NTP_GUNK() And ignoring the magic NTP_GUNK macros, that's all there is to it (Although don't kid your self, the NTP_GUNK is nasty). Of course, with this approach, we actually have to be able to trust the hardware 100%. With the current state of i386 hw having serious problems w/ reliable timesources, this may be difficult. I've got a bigger proposal (with proper credits to Keith Mannthey and George Anzinger for reviews and corrections) that I wrote up awhile back, and I'll likely send it out if this sketch gathers any interest. thanks -john ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: /proc or ps tools bug? 2.6.3, time is off 2004-05-04 14:59 ` john stultz @ 2004-05-04 16:50 ` Tim Schmielau 2004-05-07 0:33 ` George Anzinger 1 sibling, 0 replies; 36+ messages in thread From: Tim Schmielau @ 2004-05-04 16:50 UTC (permalink / raw) To: john stultz; +Cc: Andrew Morton, george, kaukasoi, linux-kernel, davem On Tue, 4 May 2004, john stultz wrote: > On Mon, 2004-05-03 at 23:12, Tim Schmielau wrote: > > > > I wonder whether it's conceptually correct to use jiffies for accurate > > long-time measurements at all. ntpd is there for a reason. Using both > > corrected, accurate and freely running clocks IMHO is calling for trouble. > > This might be something to think about for 2.7. > > Indeed. Moving away from jiffies as a time counter and more of an > interrupt counter is important. That allows for implementations of > variable HZ and other things the high-res timer folks want without > affecting the time keeping code. > > Roughly, I'd like to see the time code for all arches in 2.7 to look > like: [simple, well thought-out proposal snipped] > time_interrupt_hook(): > updates system_time. > Of course, with this approach, we actually have to be able to trust the > hardware 100%. With the current state of i386 hw having serious problems > w/ reliable timesources, this may be difficult. Well, with some configurable plausibility checks in time_interrupt_hook() it shouldn't be worse than what we have now... > I've got a bigger proposal (with proper credits to Keith Mannthey and > George Anzinger for reviews and corrections) that I wrote up awhile > back, and I'll likely send it out if this sketch gathers any interest. Yes, that sounds interesting. It's just that I won't have any spare time to spend in the next two weeks. Tim ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: /proc or ps tools bug? 2.6.3, time is off 2004-05-04 14:59 ` john stultz 2004-05-04 16:50 ` Tim Schmielau @ 2004-05-07 0:33 ` George Anzinger 2004-05-07 1:21 ` john stultz 1 sibling, 1 reply; 36+ messages in thread From: George Anzinger @ 2004-05-07 0:33 UTC (permalink / raw) To: john stultz; +Cc: Tim Schmielau, Andrew Morton, kaukasoi, linux-kernel, davem john stultz wrote: > On Mon, 2004-05-03 at 23:12, Tim Schmielau wrote: > >>On Mon, 3 May 2004, john stultz wrote: >> >>> This patch polishes up Tim Schmielau's (tim@physik3.uni-rostock.de) fix >>>for jiffies_to_clock_t() and jiffies_64_to_clock_t(). The issues >>>observed was w/ /proc output not matching up to wall time due to >>>accumulated error caused by HZ not being exactly 1000 on i386 systems. >>>The solution is to correct that error by using the more accurate >>>TICK_NSEC in our calculation. >> >>I wonder whether it's conceptually correct to use jiffies for accurate >>long-time measurements at all. ntpd is there for a reason. Using both >>corrected, accurate and freely running clocks IMHO is calling for trouble. >>This might be something to think about for 2.7. > > > Indeed. Moving away from jiffies as a time counter and more of an > interrupt counter is important. That allows for implementations of > variable HZ and other things the high-res timer folks want without > affecting the time keeping code. > > Roughly, I'd like to see the time code for all arches in 2.7 to look > like: > > u64 system_time /* NTP adjusted nanosecs since boot */ > u64 wall_time_offset /* offset to system_time for time of day */ > u64 offset_base /* last read raw hw value */ Hm. In 2.6 we use an NTP adjusted wall time and a wall_to_monotonic offset. I don't really see the advantage here. Does this change buy us something? For what its worth, I introduced the wall_to_monotonic offset just because it was easier to do (and understand, I think) in the current kernel. > > ts_read(): > returns the raw cycle value from the hardware timesource > (TSC/ACPI PM/HPET) > ts_delta(now, then): > returns the difference between two raw cycle values > ts_cyc2ns(cycles): > converts a cycle value to ns > > monotonic_clock(): > returns NTP adjusted nanoseconds since boot > ie: system_time + > NTP_GUNK(ts_cyc2ns(ts_delta(ts_read(),offset_base))) > gettimeofday(): > returns monotonic_clock() + sys_time_offset > settimeofday(): > adjusts only sys_time_offset > time_interrupt_hook(): > updates system_time. called by timer interrupt atleast once > every hardware cycle (ie: before the hardware counter > overflows), but otherwise unaffected by lost interrupts, etc. > ie: > then = offset_base > now = ts_read() > system_time += NTP_GUNK(ts_cyc2ns(ts_delta(now, then))); > DO_MORE_NTP_GUNK() > > And ignoring the magic NTP_GUNK macros, that's all there is to it > (Although don't kid your self, the NTP_GUNK is nasty). Right, and it needs to be recast to use secs and nanosecs... But you forget the accounting code which needs the periodic interrupt to charge time to whom ever. > > Of course, with this approach, we actually have to be able to trust the > hardware 100%. With the current state of i386 hw having serious problems > w/ reliable timesources, this may be difficult. Yes, and there is also a problem getting a stable, reliable, and correct calibration of TSC to PIT even with a constant TSC rate. In the HRT patch I finally resorted to correcting the TSC last read value on a regular basis. With out this it drifts (or maybe, more correctly, the calibration was wrong) enough to mess up the high res timers. I suspect that while we use two different timers (PIT & TSC or PIT & pm-timer or ...) that don't use the same source clock we will continue to have such problems. Other archs have a much easier time of it. > > I've got a bigger proposal (with proper credits to Keith Mannthey and > George Anzinger for reviews and corrections) that I wrote up awhile > back, and I'll likely send it out if this sketch gathers any interest. > -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: /proc or ps tools bug? 2.6.3, time is off 2004-05-07 0:33 ` George Anzinger @ 2004-05-07 1:21 ` john stultz 2004-05-07 20:41 ` George Anzinger 0 siblings, 1 reply; 36+ messages in thread From: john stultz @ 2004-05-07 1:21 UTC (permalink / raw) To: ganzinger; +Cc: Tim Schmielau, Andrew Morton, kaukasoi, linux-kernel, davem On Thu, 2004-05-06 at 17:33, George Anzinger wrote: > john stultz wrote: > > Roughly, I'd like to see the time code for all arches in 2.7 to look > > like: > > > > u64 system_time /* NTP adjusted nanosecs since boot */ > > u64 wall_time_offset /* offset to system_time for time of day */ > > u64 offset_base /* last read raw hw value */ > > Hm. In 2.6 we use an NTP adjusted wall time and a wall_to_monotonic offset. I > don't really see the advantage here. Does this change buy us something? > For what its worth, I introduced the wall_to_monotonic offset just because it > was easier to do (and understand, I think) in the current kernel. Well, in my opinion it seems much cleaner. Right now any time we adjust xtime, we have to remember to adjust wall_to_monotonic. I believe we've had issues where a change was made to just one and not the other. This is easier and has simpler rules. system_time always increments and is only modified by the periodic time_interrupt_hook(). Then wall_time_offset is only changes by do_settimeofday(). In fact, I hope to make these values static to the time code, so that all in-kernel users must go through the monotonic_clock() and do_gettimeofday() interfaces. To be brutal, I'd like to see xtime killed completely. Jiffies and HZ too, although I'd be happy with those two being made static to the interval timer code. There are too many places where folks have tried to extrapolate a time value from some global accounting variable, and w/ HZ not quite being exactly 1000 now on i386, all that code is just slightly wrong. Its spaghetti code now, and we just need to put that mess behind a few clean understandable interfaces. But hey, that's me dreaming, I'm sure there will be some reason odd someone will need to get into the guts of the time code and we'll have to break the opacity. :) And I do realize we'll also need a get_timestamp() or something that will quickly return a low-res timestamp like the current value of system_time. I don't intend for all those users of xtime or jiffies to really go out and hit hardware to calculate a nanosecond accurate time value. > > And ignoring the magic NTP_GUNK macros, that's all there is to it > > (Although don't kid your self, the NTP_GUNK is nasty). > > Right, and it needs to be recast to use secs and nanosecs... Yea, yea, you're right. u64 nanosecond values are just so much simpler to work with, until you hit the NTP code. > But you forget the > accounting code which needs the periodic interrupt to charge time to whom ever. True, although I'd like to avoid doing that in the time subsystem. Instead the interval timer subsystem would run the accounting code, which would then call get_timestamp() to calculate the amount of time to charge a process. thanks -john ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: /proc or ps tools bug? 2.6.3, time is off 2004-05-07 1:21 ` john stultz @ 2004-05-07 20:41 ` George Anzinger 2004-05-07 21:38 ` john stultz 0 siblings, 1 reply; 36+ messages in thread From: George Anzinger @ 2004-05-07 20:41 UTC (permalink / raw) To: john stultz Cc: ganzinger, Tim Schmielau, Andrew Morton, kaukasoi, linux-kernel, davem john stultz wrote: > On Thu, 2004-05-06 at 17:33, George Anzinger wrote: > >>john stultz wrote: >> >>>Roughly, I'd like to see the time code for all arches in 2.7 to look >>>like: >>> >>>u64 system_time /* NTP adjusted nanosecs since boot */ >>>u64 wall_time_offset /* offset to system_time for time of day */ >>>u64 offset_base /* last read raw hw value */ >> >>Hm. In 2.6 we use an NTP adjusted wall time and a wall_to_monotonic offset. I >>don't really see the advantage here. Does this change buy us something? >>For what its worth, I introduced the wall_to_monotonic offset just because it >>was easier to do (and understand, I think) in the current kernel. > > > Well, in my opinion it seems much cleaner. Right now any time we adjust > xtime, we have to remember to adjust wall_to_monotonic. I believe we've > had issues where a change was made to just one and not the other. > > This is easier and has simpler rules. system_time always increments and > is only modified by the periodic time_interrupt_hook(). Then > wall_time_offset is only changes by do_settimeofday(). In fact, I hope > to make these values static to the time code, so that all in-kernel > users must go through the monotonic_clock() and do_gettimeofday() > interfaces. All that is fine for the kernel coder and such, but the fact remains that gettimeofday() is the BIG user and I keep seeing folks trying to make it faster. Also xtime.tv_sec is used a LOT in the kernel under the name: get_seconds(). ~> -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: /proc or ps tools bug? 2.6.3, time is off 2004-05-07 20:41 ` George Anzinger @ 2004-05-07 21:38 ` john stultz 0 siblings, 0 replies; 36+ messages in thread From: john stultz @ 2004-05-07 21:38 UTC (permalink / raw) To: ganzinger; +Cc: Tim Schmielau, Andrew Morton, kaukasoi, lkml, davem On Fri, 2004-05-07 at 13:41, George Anzinger wrote: > john stultz wrote: > > On Thu, 2004-05-06 at 17:33, George Anzinger wrote: > > > >>john stultz wrote: > >> > >>>Roughly, I'd like to see the time code for all arches in 2.7 to look > >>>like: > >>> > >>>u64 system_time /* NTP adjusted nanosecs since boot */ > >>>u64 wall_time_offset /* offset to system_time for time of day */ > >>>u64 offset_base /* last read raw hw value */ > >> > >>Hm. In 2.6 we use an NTP adjusted wall time and a wall_to_monotonic offset. I > >>don't really see the advantage here. Does this change buy us something? > >>For what its worth, I introduced the wall_to_monotonic offset just because it > >>was easier to do (and understand, I think) in the current kernel. > > > > > > Well, in my opinion it seems much cleaner. Right now any time we adjust > > xtime, we have to remember to adjust wall_to_monotonic. I believe we've > > had issues where a change was made to just one and not the other. > > > > This is easier and has simpler rules. system_time always increments and > > is only modified by the periodic time_interrupt_hook(). Then > > wall_time_offset is only changes by do_settimeofday(). In fact, I hope > > to make these values static to the time code, so that all in-kernel > > users must go through the monotonic_clock() and do_gettimeofday() > > interfaces. > > All that is fine for the kernel coder and such, but the fact remains that > gettimeofday() is the BIG user and I keep seeing folks trying to make it faster. > Also xtime.tv_sec is used a LOT in the kernel under the name: get_seconds(). <sigh> You're may be right. Having to convert from a u64 nanosec value to a timeval in sys_gettimeofday() as well as get_seconds() may be a performance problem. But I'm not completely convinced, as we already have to play games shifting from timevals to timespecs and back. I'm not sure the nsec/1000000 will kill us. Pragmatically I'm willing to bend on that one by using timespecs instead of u64s. But while I'm in the design phase, thinking of the problem as juggling u64 nanoseconds simplifies it. Be it a u64 or a timespec, it really doesn't change the design all that much. One you get to use "+" and the other you use "time_add()". thanks -john ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: /proc or ps tools bug? 2.6.3, time is off 2004-02-26 1:52 ` john stultz 2004-02-26 23:06 ` George Anzinger @ 2004-02-26 23:14 ` George Anzinger 1 sibling, 0 replies; 36+ messages in thread From: George Anzinger @ 2004-02-26 23:14 UTC (permalink / raw) To: john stultz; +Cc: Albert Cahalan, David Ford, linux-kernel mailing list John, This is the other place I found "start_time" being used. It is at about line 340 in kernel/acct.c. The same sort of math should be done here: elapsed = get_jiffies_64() - current->start_time; ac.ac_etime = encode_comp_t(elapsed < (unsigned long) -1l ? (unsigned long) elapsed : (unsigned long) -1l); do_div(elapsed, HZ); -g john stultz wrote: > On Wed, 2004-02-25 at 13:10, George Anzinger wrote: > >>Albert Cahalan wrote: >> >>>This is NOT sane. Remeber that procps doesn't get to see HZ. >>>Only USER_HZ is available, as the AT_CLKTCK ELF note. >>> >>>I think the way to fix this is to skip or add a tick >>>every now and then, so that the long-term HZ is exact. >>> >>>Another way is to simply choose between pure old-style >>>tick-based timekeeping and pure new-style cycle-based >>>(TSC or ACPI) timekeeping. Systems with uncooperative >>>hardware have to use the old-style time keeping. This >>>should simply the code greatly. >> >>On checking the code and thinking about this, I would suggest that we change >>start_time in the task struct to be the wall time (or monotonic time if that >>seems better). I only find two places this is used, in proc and in the >>accounting code. Both of these could easily be changed. Of course, even >>leaving it as it is, they could be changed to report more correct values by >>using the correct conversions to translate the system HZ to USER_HZ. > > > Is this close to what your thinking of? > I can't reproduce the issue on my systems, so I'll need someone else to > test this. > > thanks > -john > > --- 1.5/include/linux/times.h Sun Nov 9 19:26:08 2003 > +++ edited/include/linux/times.h Wed Feb 25 17:39:11 2004 > @@ -7,7 +7,7 @@ > #include <asm/param.h> > > #if (HZ % USER_HZ)==0 > -# define jiffies_to_clock_t(x) ((x) / (HZ / USER_HZ)) > +# define jiffies_to_clock_t(x) (((x*TICK_NSEC*HZ)/NSEC_PER_SEC) / (HZ / USER_HZ)) > #else > # define jiffies_to_clock_t(x) ((clock_t) jiffies_64_to_clock_t((u64) x)) > #endif > > > > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: /proc or ps tools bug? 2.6.3, time is off 2004-02-25 1:58 /proc or ps tools bug? 2.6.3, time is off David Ford 2004-02-25 1:54 ` Albert Cahalan @ 2004-02-25 9:14 ` Petri Kaukasoina 2004-02-25 9:18 ` Petri Kaukasoina 2004-02-25 21:39 ` David Ford 1 sibling, 2 replies; 36+ messages in thread From: Petri Kaukasoina @ 2004-02-25 9:14 UTC (permalink / raw) To: linux-kernel mailing list On Tue, Feb 24, 2004 at 08:58:39PM -0500, David Ford wrote: > Kernel 2.6.3, procps 3.2.0 > > Note the change in the timestamp as reported by 'ps' v.s. the time > reported by 'date'. > Hi, I reported the same problem some time ago. Could you type grep cpu /proc/stat; cat /proc/uptime for example, I get cpu 140708 1489 43735 21209021 292168 4879 4192 cpu0 140708 1489 43735 21209021 292168 4879 4192 216925.15 215037.34 Then add jiffies and divide by uptime: (140708+1489+43735+21209021+292168+4879+4192)/216925.15 = 100.01695 which is not 100 here as it should be. (On kernel 2.2.* I have it exactly 100). ps uses Hertz=100 but it should be 170 ppm larger which makes an error of about 15 seconds a day. (Running without ntpd doesn't fix it.) ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: /proc or ps tools bug? 2.6.3, time is off 2004-02-25 9:14 ` Petri Kaukasoina @ 2004-02-25 9:18 ` Petri Kaukasoina 2004-02-25 21:39 ` David Ford 1 sibling, 0 replies; 36+ messages in thread From: Petri Kaukasoina @ 2004-02-25 9:18 UTC (permalink / raw) To: linux-kernel mailing list On Wed, Feb 25, 2004 at 11:14:25AM +0200, I wrote: > (On kernel 2.2.* I have it exactly 100). I meant on kernel 2.4.* it's ok. I don't know about 2.2. Kernel 2.6.3 does show the problem. Sorry about the confusion with version numbers. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: /proc or ps tools bug? 2.6.3, time is off 2004-02-25 9:14 ` Petri Kaukasoina 2004-02-25 9:18 ` Petri Kaukasoina @ 2004-02-25 21:39 ` David Ford 1 sibling, 0 replies; 36+ messages in thread From: David Ford @ 2004-02-25 21:39 UTC (permalink / raw) To: linux-kernel mailing list powerix root # grep cpu /proc/stat; cat /proc/uptime cpu 403880 466666 580559 15017904 475868 10864 3030 cpu0 403880 466666 580559 15017904 475868 10864 3030 186882.63 154999.74 (gdb) p 403880 +466666 +580559 +15017904 +475868 +10864 +3030 $1 = 16958771 (gdb) p 16958771/186882.63 $2 = 90.745571164104447 Hmm, not quite 100.0 david Petri Kaukasoina wrote: >Hi, > >I reported the same problem some time ago. Could you type > >grep cpu /proc/stat; cat /proc/uptime > >for example, I get > >cpu 140708 1489 43735 21209021 292168 4879 4192 >cpu0 140708 1489 43735 21209021 292168 4879 4192 >216925.15 215037.34 > >Then add jiffies and divide by uptime: > >(140708+1489+43735+21209021+292168+4879+4192)/216925.15 = 100.01695 > >which is not 100 here as it should be. (On kernel 2.2.* I have it exactly >100). ps uses Hertz=100 but it should be 170 ppm larger which makes an error >of about 15 seconds a day. (Running without ntpd doesn't fix it.) > > ^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2004-05-07 21:39 UTC | newest] Thread overview: 36+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-02-25 1:58 /proc or ps tools bug? 2.6.3, time is off David Ford 2004-02-25 1:54 ` Albert Cahalan 2004-02-25 5:10 ` David Ford 2004-02-25 3:27 ` Albert Cahalan 2004-02-25 16:28 ` George Anzinger 2004-02-25 16:04 ` Albert Cahalan 2004-02-25 20:45 ` George Anzinger 2004-02-25 19:16 ` Albert Cahalan 2004-02-25 21:10 ` George Anzinger 2004-02-26 1:52 ` john stultz 2004-02-26 23:06 ` George Anzinger 2004-02-26 23:10 ` john stultz 2004-02-27 0:20 ` George Anzinger 2004-04-13 22:38 ` john stultz 2004-04-13 22:59 ` George Anzinger 2004-04-14 12:10 ` Tim Schmielau 2004-04-14 17:03 ` George Anzinger 2004-04-14 18:28 ` john stultz 2004-04-15 10:37 ` Petri Kaukasoina 2004-04-15 11:05 ` Tim Schmielau 2004-04-15 16:14 ` Petri Kaukasoina 2004-05-01 13:51 ` Tim Schmielau 2004-05-02 1:41 ` Andrew Morton 2004-05-02 1:59 ` Tim Schmielau 2004-05-04 2:40 ` john stultz 2004-05-04 6:12 ` Tim Schmielau 2004-05-04 14:59 ` john stultz 2004-05-04 16:50 ` Tim Schmielau 2004-05-07 0:33 ` George Anzinger 2004-05-07 1:21 ` john stultz 2004-05-07 20:41 ` George Anzinger 2004-05-07 21:38 ` john stultz 2004-02-26 23:14 ` George Anzinger 2004-02-25 9:14 ` Petri Kaukasoina 2004-02-25 9:18 ` Petri Kaukasoina 2004-02-25 21:39 ` David Ford
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox