All of lore.kernel.org
 help / color / mirror / Atom feed
From: john stultz <johnstul@us.ibm.com>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: LKML <linux-kernel@vger.kernel.org>, Jiri Olsa <jolsa@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Oleg Nesterov <oleg@redhat.com>
Subject: Re: [PATCH 01/11] x86: Fix vtime/file timestamp inconsistencies
Date: Thu, 15 Jul 2010 12:30:07 -0700	[thread overview]
Message-ID: <1279222207.2686.55.camel@localhost.localdomain> (raw)
In-Reply-To: <20100715134058.B8F1.A69D9226@jp.fujitsu.com>

On Thu, 2010-07-15 at 13:41 +0900, KOSAKI Motohiro wrote:
> > On Thu, 2010-07-15 at 10:51 +0900, KOSAKI Motohiro wrote:
> > > > On Wed, 2010-07-14 at 11:40 +0900, KOSAKI Motohiro wrote:
> > > > > Hi
> > > > > 
> > > > > > Due to vtime calling vgettimeofday(), its possible that an application
> > > > > > could call  time();create("stuff",O_RDRW);  only to see the file's
> > > > > > creation timestamp to be before the value returned by time.
> > > > > 
> > > > > Just dumb question.
> > > > > 
> > > > > Almost application are using gettimeofday() instead time(). It mean
> > > > > your fix don't solve almost application.
> > > > 
> > > > Correct,  filesystem timestamps and gettimeofday can still seem
> > > > inconsistently ordered. But that is expected.
> > > > 
> > > > Because of granularity differences (one interface is only tick
> > > > resolution, the other is clocksource resolution), we can't interleave
> > > > the two interfaces (time and gettimeofday, respectively) and expect to
> > > > get ordered results.
> > > 
> > > hmmm...
> > > Yes, times() vs gettimeofday() mekes no sense. nobody want this. but
> > > I don't understand why we can ignore gettimeofday() vs file-tiemstamp.
> > 
> > 
> > So, just to be clear, this discussion is really around the question of
> > "Why don't filesystems use a clocksource-granular (ie: getnstimeofday())
> > timestamps instead of tick-granular (ie current_kernel_time())
> > timestamps."
> > 
> > However, this is *not* what the patch that started this thread was
> > about. In the patch I'm simply fixing an inconsistency in the vtime
> > interface, where it does not align with what the syscall-time interface
> > provides. 
> > 
> > The issue was noticed via inconsistencies with filesystem timestamps,
> > but the patch does not change anything to do with filesystem timestamp
> > behavior.
> 
> Ah, I see. This patch is unrelated to filesystem timestamp. It fix inconsistency
> vsyscall with syscall. 
> 
> I agree that it should be fixed. So yes, other parts of my mail is a bit offtopic.
> 
> 
> > > > This is why the fix I'm proposing is important: Filesystem timestamps
> > > > have always been tick granular, so when vtime() was made clocksource
> > > > granular (by using vgettime internally) we broke the historic
> > > > expectation that the time() interface could be interleaved with
> > > > filesystem operations.
> > > > 
> > > > Side note: For full nanosecond resolution of the tick-granular
> > > > timestamps, check out the clock_gettime(CLOCK_REALTIME_COARSE, ...)
> > > > interface.
> > > > 
> > > > 
> > > > > So, Why can't we fix vgettimeofday() vs create() inconsistency?
> > > > > This is just question, I don't intend to disagree you.
> > > > 
> > > > The only way to make gettimeofday and create consistent is to use
> > > > gettimeofday clocksource resolution timestamps for files. This however
> > > > would potentially cause a large performance hit, since each every file
> > > > timestamp would require a possibly expensive read of the clocksource.
> > > 
> > > Why clocksource() reading is so slow? the implementation of current
> > > tsc clocksource ->read method is here.
> > > 
> > > 
> > > 	static cycle_t read_tsc(struct clocksource *cs)
> > > 	{
> > > 	        cycle_t ret = (cycle_t)get_cycles();
> > > 	
> > > 	        return ret >= clocksource_tsc.cycle_last ?
> > > 	                ret : clocksource_tsc.cycle_last;
> > > 	}
> > > 
> > > It mean, the difference is almost only one rdtsc.
> > 
> > Sure, for hardware that can use the TSC clocksource, it is fairly cheap,
> > however there are numerous systems that cannot use the TSC (or
> > architectures that don't have a fast TSC like counter) and in those
> > cases a read can take more then a microsecond.
> 
> I'm not timekeeping expert. but my first impression is, if clocksource->read
> need more than a microsecond, it's really problematic. ->read of such clocksource
> should always return 0 instead honestly reading h/w counter.

Sadly there is quite a lot of x86 hardware that cannot use the TSC. So
the only alternative is the HPET (~0.8us) or ACPI PM (~1.2us).

If the clocksource->read() function returned 0 on those systems, then
gettimeofday would return only tick-granular time (again, which is what
CLOCK_REALTIME_COARSE already provides). 

That said, Ingo had an optimization patch to do something quite similar,
giving up resolution for speed. And now the CLOCK_REALTIME_COARSE code
is there it might be even easier to implement, but its not something we
can enable by default, as inter-tick resolution is need in many cases.

And yes, ideally every system would have a fast TSC like counter that
was accurate and reliable, and this would be less of an issue, but we
have to work with the hardware that is out there.


> > Even with the TSC, the multiplication required to convert to nanoseconds
> > adds extra overhead that isn't seen when using the pre-calculated
> > tick-granular current_kernel_time() value.
> > 
> > It may not seem like much, but with filesystems each small delay adds
> > up. 
> > 
> > I'm not a filesystems guy, and maybe there are some filesystems that
> > really want very fine-grained timestamps. If so they can consider
> > switching from using current_kernel_time() to getnstimeofday(). But due
> > to the likely performance impact, its not something I'd suggest doing.
> 
> Again, I'm not against you. I only would like to hear what you propose. because
> I'm not sure rough granularity time() vsyscall really makes userland happy.
> because (again) as far as iknow, alomsot applications don't use time().

Since I assume the developers who implemented the filesystem have
considered this trade off and made a choice.  I honestly don't have much
to propose here. :)

I think if you feel strongly that filesystems should use
clocksource-granular instead of tick-granular timestamps, you might try
to bring it up on ext4 devel list or even generate a patch and try it
out yourself (I've provided a trivial starting point for you below - but
its likely a real solution will be a bit more complex).

Good luck!

thanks
-john


diff --git a/kernel/time.c b/kernel/time.c
index 848b1c2..ce10dae 100644
--- a/kernel/time.c
+++ b/kernel/time.c
@@ -227,7 +227,8 @@ SYSCALL_DEFINE1(adjtimex, struct timex __user *, txc_p)
  */
 struct timespec current_fs_time(struct super_block *sb)
 {
-	struct timespec now = current_kernel_time();
+	struct timespec now;
+	getnstimeofday(&now);
 	return timespec_trunc(now, sb->s_time_gran);
 }
 EXPORT_SYMBOL(current_fs_time);






  reply	other threads:[~2010-07-15 19:30 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-14  0:56 [PATCH 00/11] -tip Timekeeping changes for 2.6.36 John Stultz
2010-07-14  0:56 ` [PATCH 01/11] x86: Fix vtime/file timestamp inconsistencies John Stultz
2010-07-14  0:56   ` [PATCH 02/11] Implement timespec_add John Stultz
2010-07-14  0:56     ` [PATCH 03/11] time: Kill off CONFIG_GENERIC_TIME John Stultz
2010-07-14  0:56       ` [PATCH 04/11] powerpc: Simplify update_vsyscall John Stultz
2010-07-14  0:56         ` [PATCH 05/11] powerpc: Cleanup xtime usage John Stultz
2010-07-14  0:56           ` [PATCH 06/11] Fix update_vsyscall to provide wall_to_monotonic offset John Stultz
2010-07-14  0:56             ` [PATCH 07/11] Convert um to use read_persistent_clock John Stultz
2010-07-14  0:56               ` [PATCH 08/11] Cleanup hrtimer.c's direct access to wall_to_monotonic John Stultz
2010-07-14  0:56                 ` [PATCH 09/11] Make xtime and wall_to_monotonic static John Stultz
2010-07-14  0:56                   ` [PATCH 10/11] Convert common x86 clocksources to use clocksource_register_hz/khz John Stultz
2010-07-14  0:56                     ` [PATCH 11/11] Add __clocksource_updatefreq_hz/khz methods John Stultz
2010-07-27 10:49                       ` [tip:timers/clocksource] clocksource: " tip-bot for John Stultz
2010-07-27 10:48                     ` [tip:timers/clocksource] x86: Convert common clocksources to use clocksource_register_hz/khz tip-bot for John Stultz
2010-07-27 10:48                   ` [tip:timers/clocksource] timekeeping: Make xtime and wall_to_monotonic static tip-bot for John Stultz
2010-07-27 10:48                 ` [tip:timers/clocksource] hrtimer: Cleanup direct access to wall_to_monotonic tip-bot for John Stultz
2010-07-27 10:47               ` [tip:timers/clocksource] um: Convert to use read_persistent_clock tip-bot for John Stultz
2010-07-27 10:47             ` [tip:timers/clocksource] timkeeping: Fix update_vsyscall to provide wall_to_monotonic offset tip-bot for John Stultz
2010-07-27 10:47           ` [tip:timers/clocksource] powerpc: Cleanup xtime usage tip-bot for John Stultz
2010-07-27 10:46         ` [tip:timers/clocksource] powerpc: Simplify update_vsyscall tip-bot for John Stultz
2010-07-27 23:41         ` [PATCH 04/11] " Paul Mackerras
2010-07-27 23:41           ` Paul Mackerras
2010-07-28  1:33           ` john stultz
2010-07-28  1:33             ` john stultz
2010-07-27 10:46       ` [tip:timers/clocksource] time: Kill off CONFIG_GENERIC_TIME tip-bot for John Stultz
2010-07-27 10:46     ` [tip:timers/clocksource] time: Implement timespec_add tip-bot for John Stultz
2010-07-14  2:40   ` [PATCH 01/11] x86: Fix vtime/file timestamp inconsistencies KOSAKI Motohiro
2010-07-14 16:19     ` john stultz
2010-07-15  1:51       ` KOSAKI Motohiro
2010-07-15  2:46         ` john stultz
2010-07-15  4:41           ` KOSAKI Motohiro
2010-07-15 19:30             ` john stultz [this message]
2010-07-15  2:51         ` john stultz
2010-07-15  4:41           ` KOSAKI Motohiro
2010-07-27 10:45   ` [tip:timers/clocksource] " tip-bot for John Stultz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1279222207.2686.55.camel@localhost.localdomain \
    --to=johnstul@us.ibm.com \
    --cc=jolsa@redhat.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.