git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Date handling.
@ 2005-04-14  8:16 David Woodhouse
  2005-04-14  9:00 ` Linus Torvalds
  0 siblings, 1 reply; 18+ messages in thread
From: David Woodhouse @ 2005-04-14  8:16 UTC (permalink / raw)
  To: git

The date handling is somewhat unreliable. We render dates into textual
representation using the committer's locale (day names, etc), then later
attempt to interpret that in some other locale. And we were just using
localtime without even specifying the timezone so the timestamp was
fairly randomised anyway. In fact, an $AUTHOR_DATE environment variable
was making its way into the database entirely unchecked. 

I see two possible solutions:
	1. Just store seconds-since-GMT-epoch and if we really want, the
	   timezone as auxiliary information.
	2. Store dates in RFC2822 form.

Unless someone convincingly expresses a preference before I get to work
and start playing with it, I'll implement the latter.

-- 
dwmw2



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Date handling.
  2005-04-14  8:16 Date handling David Woodhouse
@ 2005-04-14  9:00 ` Linus Torvalds
  2005-04-14  9:12   ` Linus Torvalds
  2005-04-14  9:31   ` David Woodhouse
  0 siblings, 2 replies; 18+ messages in thread
From: Linus Torvalds @ 2005-04-14  9:00 UTC (permalink / raw)
  To: David Woodhouse; +Cc: git



On Thu, 14 Apr 2005, David Woodhouse wrote:
> 
> I see two possible solutions:
> 	1. Just store seconds-since-GMT-epoch and if we really want, the
> 	   timezone as auxiliary information.

Yeah, I think this is the right thing to do. I can change "commit" to do 
it.

I used to think that the date was purely a "enforced comment" (like the
committer info is, as far as git is concerned), which is why I used the 
simple textual representation. But yes, when I wrote that "rev-tree" thing 
I did curse that and consider just changing it.

It's still just technically a "hint", since time isn't synchronized in any 
way (and in a distributed system, time _cannot_ be synchronized). But 
it's a useful hint, so ..

> 	2. Store dates in RFC2822 form.
> 
> Unless someone convincingly expresses a preference before I get to work
> and start playing with it, I'll implement the latter.

I do like text output, but if it is painful, the "unix seconds" format is 
certainly a hell of a lot simpler. And quite frankly, if we change it, we 
might as well just change it all the way. So I'd almost prefer (1).

But "He who does the work gets to choose the implementation". And I do
agree that this is a bad format decision, and that we should change it. It 
shouldn't even be that painful. Only "rev-tree" cares, and even rev-tree 
doesn't care _that_ deeply.

		Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Date handling.
  2005-04-14  9:00 ` Linus Torvalds
@ 2005-04-14  9:12   ` Linus Torvalds
  2005-04-14 17:38     ` David Woodhouse
  2005-04-14  9:31   ` David Woodhouse
  1 sibling, 1 reply; 18+ messages in thread
From: Linus Torvalds @ 2005-04-14  9:12 UTC (permalink / raw)
  To: David Woodhouse; +Cc: git



On Thu, 14 Apr 2005, Linus Torvalds wrote:
> 
> Yeah, I think this is the right thing to do. I can change "commit" to do 
> it.

I take that back. I'd be much happier with you doing and testing it, 
because now I'm crashing.

		Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Date handling.
  2005-04-14  9:00 ` Linus Torvalds
  2005-04-14  9:12   ` Linus Torvalds
@ 2005-04-14  9:31   ` David Woodhouse
  1 sibling, 0 replies; 18+ messages in thread
From: David Woodhouse @ 2005-04-14  9:31 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

On Thu, 2005-04-14 at 02:00 -0700, Linus Torvalds wrote:
> I do like text output, but if it is painful, the "unix seconds" format is 
> certainly a hell of a lot simpler. And quite frankly, if we change it, we 
> might as well just change it all the way. So I'd almost prefer (1).

Text _output_ is easy to generate; we don't need to store text in the
database for that. So I've changed my mind -- I prefer (1) too.

-- 
dwmw2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Date handling.
  2005-04-14  9:12   ` Linus Torvalds
@ 2005-04-14 17:38     ` David Woodhouse
  2005-04-14 19:19       ` tony.luck
  2005-04-24  3:04       ` Jan Harkes
  0 siblings, 2 replies; 18+ messages in thread
From: David Woodhouse @ 2005-04-14 17:38 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

On Thu, 2005-04-14 at 02:12 -0700, Linus Torvalds wrote:
> I take that back. I'd be much happier with you doing and testing it, 
> because now I'm crashing.

OK. commit-tree now eats RFC2822 dates as AUTHOR_DATE because that's
what you're going to want to feed it. We store seconds since UTC epoch,
we add the author's or committer's timezone as auxiliary data so that
dates can be pretty-printed in the original timezone later if anyone
cares. I left the date parsing in rev-tree.c for backward compatibility
but it can be dropped when we change to base64 :)

Yes, glibc sucks and strptime is a pile of crap. We have to parse it
ourselves.

Index: commit-tree.c
--- 1756b578489f93999ded68ae347bef7d6063101c/commit-tree.c  (mode:100664 sha1:12196c79f31d004dff0df1f50dda67d8204f5568)
+++ 82ba574c85e9a2e4652419c88244e9dd1bfa8baa/commit-tree.c  (mode:100644 sha1:35cb09402c9868499bcaf6de42afbad9fdfebe05)
@@ -7,6 +7,9 @@
 
 #include <pwd.h>
 #include <time.h>
+#include <string.h>
+#include <ctype.h>
+#include <time.h>
 
 #define BLOCKING (1ul << 14)
 #define ORIG_OFFSET (40)
@@ -95,6 +98,148 @@
 	}
 }
 
+static const char *month_names[] = {
+        "Jan", "Feb", "Mar", "Apr", "May", "Jun",
+        "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"
+};
+
+static const char *weekday_names[] = {
+        "Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"
+};
+
+
+static char *skipfws(char *str)
+{
+	while (isspace(*str))
+		str++;
+	return str;
+}
+
+	
+/* Gr. strptime is crap for this; it doesn't have a way to require RFC2822
+   (i.e. English) day/month names, and it doesn't work correctly with %z. */
+static void parse_rfc2822_date(char *date, char *result, int maxlen)
+{
+	struct tm tm;
+	char *p;
+	int i, offset;
+	time_t then;
+
+	memset(&tm, 0, sizeof(tm));
+
+	/* Skip day-name */
+	p = skipfws(date);
+	if (!isdigit(*p)) {
+		for (i=0; i<7; i++) {
+			if (!strncmp(p,weekday_names[i],3) && p[3] == ',') {
+				p = skipfws(p+4);
+				goto day;
+			}
+		}
+		return;
+	}					
+
+	/* day */
+ day:
+	tm.tm_mday = strtoul(p, &p, 10);
+
+	if (tm.tm_mday < 1 || tm.tm_mday > 31)
+		return;
+
+	if (!isspace(*p))
+		return;
+
+	p = skipfws(p);
+
+	/* month */
+
+	for (i=0; i<12; i++) {
+		if (!strncmp(p, month_names[i], 3) && isspace(p[3])) {
+			tm.tm_mon = i;
+			p = skipfws(p+strlen(month_names[i]));
+			goto year;
+		}
+	}
+	return; /* Error -- bad month */
+
+	/* year */
+ year:	
+	tm.tm_year = strtoul(p, &p, 10);
+
+	if (!tm.tm_year && !isspace(*p))
+		return;
+
+	if (tm.tm_year > 1900)
+		tm.tm_year -= 1900;
+		
+	p=skipfws(p);
+
+	/* hour */
+	if (!isdigit(*p))
+		return;
+	tm.tm_hour = strtoul(p, &p, 10);
+	
+	if (!tm.tm_hour > 23)
+		return;
+
+	if (*p != ':')
+		return; /* Error -- bad time */
+	p++;
+
+	/* minute */
+	if (!isdigit(*p))
+		return;
+	tm.tm_min = strtoul(p, &p, 10);
+	
+	if (!tm.tm_min > 59)
+		return;
+
+	if (isspace(*p))
+		goto zone;
+
+	if (*p != ':')
+		return; /* Error -- bad time */
+	p++;
+
+	/* second */
+	if (!isdigit(*p))
+		return;
+	tm.tm_sec = strtoul(p, &p, 10);
+	
+	if (!tm.tm_sec > 59)
+		return;
+
+	if (!isspace(*p))
+		return;
+
+ zone:
+	p = skipfws(p);
+
+	if (*p == '-')
+		offset = -60;
+	else if (*p == '+')
+		offset = 60;
+	else
+	       return;
+
+	if (!isdigit(p[1]) || !isdigit(p[2]) || !isdigit(p[3]) || !isdigit(p[4]))
+		return;
+
+	i = strtoul(p+1, NULL, 10);
+	offset *= ((i % 100) + ((i / 100) * 60));
+
+	if (*(skipfws(p + 5)))
+		return;
+
+	then = mktime(&tm); /* mktime appears to ignore the GMT offset, stupidly */
+	if (then == -1)
+		return;
+
+	then -= offset;
+
+	snprintf(result, maxlen, "%lu %5.5s", then, p);
+}
+
 /*
  * Having more than two parents may be strange, but hey, there's
  * no conceptual reason why the file format couldn't accept multi-way
@@ -114,10 +259,12 @@
 	unsigned char commit_sha1[20];
 	char *gecos, *realgecos;
 	char *email, realemail[1000];
-	char *date, *realdate;
+	char date[20], realdate[20];
+	char *audate;
 	char comment[1000];
 	struct passwd *pw;
 	time_t now;
+	struct tm *tm;
 	char *buffer;
 	unsigned int size;
 
@@ -142,15 +289,19 @@
 	realemail[len] = '@';
 	gethostname(realemail+len+1, sizeof(realemail)-len-1);
 	time(&now);
-	realdate = ctime(&now);
+	tm = localtime(&now);
+
+	strftime(realdate, sizeof(realdate), "%s %z", tm);
+	strcpy(date, realdate);
 
 	gecos = getenv("AUTHOR_NAME") ? : realgecos;
 	email = getenv("AUTHOR_EMAIL") ? : realemail;
-	date = getenv("AUTHOR_DATE") ? : realdate;
+	audate = getenv("AUTHOR_DATE");
+	if (audate)
+		parse_rfc2822_date(audate, date, sizeof(date));
 
 	remove_special(gecos); remove_special(realgecos);
 	remove_special(email); remove_special(realemail);
-	remove_special(date); remove_special(realdate);
 
 	init_buffer(&buffer, &size);
 	add_buffer(&buffer, &size, "tree %s\n", sha1_to_hex(tree_sha1));
Index: rev-tree.c
--- 1756b578489f93999ded68ae347bef7d6063101c/rev-tree.c  (mode:100664 sha1:7bf9e9a92f528485360f374239809714ce7a19f5)
+++ 82ba574c85e9a2e4652419c88244e9dd1bfa8baa/rev-tree.c  (mode:100644 sha1:9443f4560beedcf81f2f5e7e0664d169cb5c8527)
@@ -1,4 +1,5 @@
 #define _XOPEN_SOURCE /* glibc2 needs this */
+#define _BSD_SOURCE /* for tm.tm_gmtoff */
 #include <time.h>
 #include <ctype.h>
 
@@ -21,6 +22,7 @@
 	char buffer[100];
 	struct tm tm;
 	const char *formats[] = {
+		"%s",
 		"%c",
 		"%a %b %d %T %y",
 		NULL
@@ -30,7 +32,7 @@
 	p = buffer;
 	while (isspace(c = *buf))
 		buf++;
-	while ((c = *buf++) != '\n')
+	while ((c = *buf++) != '\n' && c)
 		*p++ = c;
 	*p++ = 0;
 	buf = buffer;
@@ -50,6 +52,8 @@
 
 static unsigned long parse_commit_date(const char *buf)
 {
+	unsigned long time;
+
 	if (memcmp(buf, "author", 6))
 		return 0;
 	while (*buf++ != '\n')
@@ -58,7 +62,11 @@
 		return 0;
 	while (*buf++ != '>')
 		/* nada */;
-	return parse_time(buf);
+
+	time = strtoul(buf, NULL, 10);
+	if (!time)
+		time = parse_time(buf);
+	return time;
 }
 
 static int parse_commit(unsigned char *sha1)


-- 
dwmw2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Date handling.
  2005-04-14 17:38     ` David Woodhouse
@ 2005-04-14 19:19       ` tony.luck
  2005-04-14 19:23         ` David Woodhouse
  2005-04-24  3:04       ` Jan Harkes
  1 sibling, 1 reply; 18+ messages in thread
From: tony.luck @ 2005-04-14 19:19 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Linus Torvalds, git

> OK. commit-tree now eats RFC2822 dates as AUTHOR_DATE because that's
> what you're going to want to feed it. We store seconds since UTC epoch,
> we add the author's or committer's timezone as auxiliary data so that
> dates can be pretty-printed in the original timezone later if anyone
> cares.

With a UTC date, why would anyone care in which timezone the commit was
made?  Any pretty printing would most likely be prettiest if it is done
relative to the timezone of the person looking at the commit record, not
the person who created the record.

If we do need the timezone, then I think we also need the latitude of the
committer too, so that we know whether to interpret "July" as summer or
winter :-)

-Tony

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Date handling.
  2005-04-14 19:19       ` tony.luck
@ 2005-04-14 19:23         ` David Woodhouse
  0 siblings, 0 replies; 18+ messages in thread
From: David Woodhouse @ 2005-04-14 19:23 UTC (permalink / raw)
  To: tony.luck; +Cc: Linus Torvalds, git

On Thu, 2005-04-14 at 12:19 -0700, tony.luck@intel.com wrote:
> With a UTC date, why would anyone care in which timezone the commit was
> made?  Any pretty printing would most likely be prettiest if it is done
> relative to the timezone of the person looking at the commit record, not
> the person who created the record.

I'd prefer not to lose the information. If someone has committed a
change at 2am, I like to know that it was 2am for _them_. It helps me
decide where to look first for the cause of problems. :)

It also helps disambiguate certain comments, especially those involving
words or phrases such as "yesterday" or "this afternoon".

-- 
dwmw2



^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: Date handling.
@ 2005-04-14 19:42 Luck, Tony
  2005-04-14 20:54 ` David Woodhouse
  2005-04-15  5:02 ` Paul Jackson
  0 siblings, 2 replies; 18+ messages in thread
From: Luck, Tony @ 2005-04-14 19:42 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Linus Torvalds, git

>I'd prefer not to lose the information. If someone has committed a
>change at 2am, I like to know that it was 2am for _them_. It helps me
>decide where to look first for the cause of problems. :)

I'd think the 8:00am-before-the-first-coffee checkins would be the
most worrying :-)

>It also helps disambiguate certain comments, especially those involving
>words or phrases such as "yesterday" or "this afternoon".

This is a very good point ... but this still has problems with the
"git is a filesystem, not a SCM" mantra.  Timezone comments don't
belong in the git inode.

-Tony

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: Date handling.
  2005-04-14 19:42 Luck, Tony
@ 2005-04-14 20:54 ` David Woodhouse
  2005-04-14 21:01   ` H. Peter Anvin
  2005-04-15  5:02 ` Paul Jackson
  1 sibling, 1 reply; 18+ messages in thread
From: David Woodhouse @ 2005-04-14 20:54 UTC (permalink / raw)
  To: Luck, Tony; +Cc: Linus Torvalds, git

On Thu, 2005-04-14 at 12:42 -0700, Luck, Tony wrote:
> This is a very good point ... but this still has problems with the
> "git is a filesystem, not a SCM" mantra.  Timezone comments don't
> belong in the git inode.

Yeah, but really I'd want to see other serious users of it before I'd
accept that the timezone information _really_ needs to be stored
separately. After all, the committer and author information really
wouldn't be considered part of the _filesystem_ either.

-- 
dwmw2



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Date handling.
  2005-04-14 20:54 ` David Woodhouse
@ 2005-04-14 21:01   ` H. Peter Anvin
  2005-04-14 21:48     ` David Woodhouse
  0 siblings, 1 reply; 18+ messages in thread
From: H. Peter Anvin @ 2005-04-14 21:01 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Luck, Tony, Linus Torvalds, git

David Woodhouse wrote:
> On Thu, 2005-04-14 at 12:42 -0700, Luck, Tony wrote:
> 
>>This is a very good point ... but this still has problems with the
>>"git is a filesystem, not a SCM" mantra.  Timezone comments don't
>>belong in the git inode.
> 
> Yeah, but really I'd want to see other serious users of it before I'd
> accept that the timezone information _really_ needs to be stored
> separately. After all, the committer and author information really
> wouldn't be considered part of the _filesystem_ either.
> 

Both of these are metadata; they may not be directly relevant to the 
filesystem, but are attributes relevant to the client thereof; 
effectively an xattr.  It's not really any different than the fact that 
RFC 2822-style messages frequently contain headers rarely used by either 
MTAs or MUAs; they're metadata provided along the standard format for 
metadata in that system.  In fact, the ability for RFC (2)822 to 
accommodate this type of data has shown to be a major strength of the 
system, as opposed to the uncountably many attempts at binary email formats.

	-hpa

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Date handling.
  2005-04-14 21:01   ` H. Peter Anvin
@ 2005-04-14 21:48     ` David Woodhouse
  0 siblings, 0 replies; 18+ messages in thread
From: David Woodhouse @ 2005-04-14 21:48 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Luck, Tony, Linus Torvalds, git

On Thu, 2005-04-14 at 14:01 -0700, H. Peter Anvin wrote:
> Both of these are metadata; they may not be directly relevant to the 
> filesystem, but are attributes relevant to the client thereof; 
> effectively an xattr.

Right. That's perfectly acceptable -- and that's the reason why I think
it's also fine to keep the timezone and the rename information in there
too. If we were being _really_ anal about auxiliary information being
separate, we'd stick it in a separate blob object and merely refer to it
from the commit object. I don't think there's really any call to take it
that far, though.

-- 
dwmw2



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Date handling.
  2005-04-14 19:42 Luck, Tony
  2005-04-14 20:54 ` David Woodhouse
@ 2005-04-15  5:02 ` Paul Jackson
  1 sibling, 0 replies; 18+ messages in thread
From: Paul Jackson @ 2005-04-15  5:02 UTC (permalink / raw)
  To: Luck, Tony; +Cc: dwmw2, torvalds, git

> I'd think the 8:00am-before-the-first-coffee checkins would be the
> most worrying :-)

For me, it was the Friday evening after beer bust checkin.

But my employer can't afford those anymore, so I'm safe.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Date handling.
  2005-04-14 17:38     ` David Woodhouse
  2005-04-14 19:19       ` tony.luck
@ 2005-04-24  3:04       ` Jan Harkes
  2005-04-24  3:33         ` James Purser
  2005-04-24  6:38         ` David Woodhouse
  1 sibling, 2 replies; 18+ messages in thread
From: Jan Harkes @ 2005-04-24  3:04 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Linus Torvalds, git

On Thu, Apr 14, 2005 at 06:38:36PM +0100, David Woodhouse wrote:
> +/* Gr. strptime is crap for this; it doesn't have a way to require RFC2822
> +   (i.e. English) day/month names, and it doesn't work correctly with %z. */
> +static void parse_rfc2822_date(char *date, char *result, int maxlen)
> +{
...
> +	then = mktime(&tm); /* mktime appears to ignore the GMT offset, stupidly */

I noticed that some commit timestamps seemed to be off, looking into it
a bit more it seems like mktime is influenced by the setting of the
local TZ environment. However in parse_rfc2822_date we are trying to
interpret a time in the timezone of the original author not in the
timezone of the committer.

Here is a short test program that I believe shows the problem.

The question is, do we want to just calculate the time_t offset
ourselves without using mktime, or force the TZ environment to UTC.

Jan


/* cc -o mktime mktime.c ; ./mktime
 *
 * I get the following output,
 *   current 18000
 *   TZ=EST 18000
 *   TZ=UTC 0
 *   TZ=CET -3600
 */

#include <time.h>
#include <stdlib.h>
#include <stdio.h>

int main(int argc, char **argv)
{
    struct tm tm = { 0, };
    time_t zero;

    /* 1970-01-01 00:00:00 UTC, should map to 'time_t 0' */
    tm.tm_mday = 1;
    tm.tm_year = 70;
			    zero = mktime(&tm); printf("current %d\n", zero);
    setenv("TZ", "EST", 1); zero = mktime(&tm); printf("TZ=EST %d\n", zero);
    setenv("TZ", "UTC", 1); zero = mktime(&tm); printf("TZ=UTC %d\n", zero);
    setenv("TZ", "CET", 1); zero = mktime(&tm); printf("TZ=CET %d\n", zero);
}

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Date handling.
  2005-04-24  3:04       ` Jan Harkes
@ 2005-04-24  3:33         ` James Purser
  2005-04-24  6:38         ` David Woodhouse
  1 sibling, 0 replies; 18+ messages in thread
From: James Purser @ 2005-04-24  3:33 UTC (permalink / raw)
  To: Jan Harkes; +Cc: David Woodhouse, Linus Torvalds, git

Wouldn't it be easier to force GMT or UTC as the base timezone for the
application. This would remove any confusion between different
timezones.
-- 
James Purser
http://ksit.dynalias.com


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Date handling.
  2005-04-24  3:04       ` Jan Harkes
  2005-04-24  3:33         ` James Purser
@ 2005-04-24  6:38         ` David Woodhouse
  2005-04-24  6:43           ` Russ Allbery
  2005-04-25  1:22           ` Jan Harkes
  1 sibling, 2 replies; 18+ messages in thread
From: David Woodhouse @ 2005-04-24  6:38 UTC (permalink / raw)
  To: Jan Harkes; +Cc: Linus Torvalds, git

On Sat, 2005-04-23 at 23:04 -0400, Jan Harkes wrote:
> I noticed that some commit timestamps seemed to be off, looking into it
> a bit more it seems like mktime is influenced by the setting of the
> local TZ environment.

Ewww. I missed that in the documentation. I suppose I should have worked
it out having empirically determined that it ignores the tm_gmtoff
field.

> The question is, do we want to just calculate the time_t offset
> ourselves without using mktime, or force the TZ environment to UTC.

I don't think we want to be in the business of counting leap seconds; we
need to let the system do it. I don't much like setting TZ to UTC though
-- how about we use your test case to find the offset and subtract that?

Does this work?

Index: commit-tree.c
===================================================================
--- 31e9af73983d640090508b06784ef7db4816c957/commit-tree.c  (mode:100644 sha1:c0b07f89286c3f6cceae8122b4c3142c8efaf8e1)
+++ uncommitted/commit-tree.c  (mode:100664)
@@ -138,10 +138,14 @@
 	struct tm tm;
 	char *p;
 	int i, offset;
-	time_t then;
+	time_t then, localofs;
 
 	memset(&tm, 0, sizeof(tm));
 
+	tm.tm_mday = 1;
+	tm.tm_year = 70;
+	localofs = mktime(&tm);
+
 	/* Skip day-name */
 	p = skipfws(date);
 	if (!isdigit(*p)) {
@@ -246,7 +250,9 @@
 	if (*(skipfws(p + 5)))
 		return;
 
-	then = mktime(&tm); /* mktime appears to ignore the GMT offset, stupidly */
+	/* No way to convert to a time_t and honour tm_gmtoff; we have to
+	   do the evil trick by subtracting the local offset */
+	then = mktime(&tm) - localofs;
 	if (then == -1)
 		return;
 


-- 
dwmw2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Date handling.
  2005-04-24  6:38         ` David Woodhouse
@ 2005-04-24  6:43           ` Russ Allbery
  2005-04-25  1:22           ` Jan Harkes
  1 sibling, 0 replies; 18+ messages in thread
From: Russ Allbery @ 2005-04-24  6:43 UTC (permalink / raw)
  To: git

David Woodhouse <dwmw2@infradead.org> writes:

> I don't think we want to be in the business of counting leap seconds; we
> need to let the system do it. I don't much like setting TZ to UTC though
> -- how about we use your test case to find the offset and subtract that?

> Does this work?

Nope, daylight savings time breaks this, since you may or may not be in
the same time zone on January 1st as you are at the current time.

However, you don't need to count leap seconds when you implement your own
mktime, since mktime doesn't have to take leap seconds into account.  Unix
timestamps, unless you're using TAI, don't include leap seconds.

-- 
Russ Allbery (rra@stanford.edu)             <http://www.eyrie.org/~eagle/>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Date handling.
  2005-04-24  6:38         ` David Woodhouse
  2005-04-24  6:43           ` Russ Allbery
@ 2005-04-25  1:22           ` Jan Harkes
  2005-04-25  1:32             ` Russ Allbery
  1 sibling, 1 reply; 18+ messages in thread
From: Jan Harkes @ 2005-04-25  1:22 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Linus Torvalds, git

On Sun, Apr 24, 2005 at 04:38:49PM +1000, David Woodhouse wrote:
> On Sat, 2005-04-23 at 23:04 -0400, Jan Harkes wrote:
> > I noticed that some commit timestamps seemed to be off, looking into it
> > a bit more it seems like mktime is influenced by the setting of the
> > local TZ environment.
> 
> Ewww. I missed that in the documentation. I suppose I should have worked
> it out having empirically determined that it ignores the tm_gmtoff
> field.
> 
> > The question is, do we want to just calculate the time_t offset
> > ourselves without using mktime, or force the TZ environment to UTC.
> 
> I don't think we want to be in the business of counting leap seconds; we
> need to let the system do it. I don't much like setting TZ to UTC though
> -- how about we use your test case to find the offset and subtract that?
> 
> Does this work?

As Russ mentioned, that probably doesn't work with daylight savings
time. However I did some testing and it looks like the following lines
around mktime make it work as we would expect.

    tm.tm_isdst = -1;
    then = mktime(&tm);
    then += tm.tm_gmtoff;

Attached is the program I used to test it, it seems pretty much unfazed
by changes to the TZ environment variable. Although I tested around a
daylight savings time switch, I'm still not 100% sure if it doesn't mess
up in some corner case.

Jan


#include <time.h>
#include <stdlib.h>
#include <stdio.h>

time_t mkutctime(struct tm *tm, int offset)
{
    time_t time;

    /* we don't know whether our timezone happens to be dst or not, let libc
     * figure that one out. */
    tm->tm_isdst = -1;

    /* interpret struct tm in the local timezone */
    time = mktime(tm);
    if (time == -1) return -1;

    /* libc lets us know how many seconds our local time differs from UTC
     * this is a non-standard BSD extension, which is probably not as
     * portable, but it seems to work. */
    time += tm->tm_gmtoff;

    /* However as the passed in struct tm was not UTC but in some other
     * timezone, we still have subtract the offset that came with the
     * RFC2822 date */
    time -= offset;

    return time;
}

int main(int argc, char **argv)
{
    struct tm tm = { 0, };
    time_t time;

    tm.tm_year = 70;
    tm.tm_mday = 1;
    time = mkutctime(&tm, 0);
    printf("1970-01-01 00:00:00 UTC = 0 (%d)\n", time);

    tm.tm_year = 105;
    tm.tm_mon = 2;
    tm.tm_mday = 17;
    tm.tm_hour = 20;
    tm.tm_min = 58;
    tm.tm_sec = 31;
    time = mkutctime(&tm, -5 * 3600);
    printf("2005-03-17 20:58:31 EST = 1111111111 (%d)\n", time);

    tm.tm_mon = 3;
    tm.tm_mday = 3;
    tm.tm_hour = 1;
    tm.tm_min = 59;
    tm.tm_sec = 59;
    time = mkutctime(&tm, -5 * 3600);
    printf("2005-04-03 01:59:59 EST = 1112511599 (%d)\n", time);

    tm.tm_hour = 3;
    tm.tm_min = 0;
    tm.tm_sec = 0;
    time = mkutctime(&tm, -4 * 3600);
    printf("2005-04-03 03:00:00 EDT = 1112511600 (%d)\n", time);
}


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Date handling.
  2005-04-25  1:22           ` Jan Harkes
@ 2005-04-25  1:32             ` Russ Allbery
  0 siblings, 0 replies; 18+ messages in thread
From: Russ Allbery @ 2005-04-25  1:32 UTC (permalink / raw)
  To: git

Jan Harkes <jaharkes@cs.cmu.edu> writes:

> As Russ mentioned, that probably doesn't work with daylight savings
> time. However I did some testing and it looks like the following lines
> around mktime make it work as we would expect.

>     tm.tm_isdst = -1;
>     then = mktime(&tm);
>     then += tm.tm_gmtoff;

> Attached is the program I used to test it, it seems pretty much unfazed
> by changes to the TZ environment variable. Although I tested around a
> daylight savings time switch, I'm still not 100% sure if it doesn't mess
> up in some corner case.

I don't know what sort of portability you're striving for, but many
platforms don't have tm.tm_gmtoff.  But reimplementing mktime from scratch
isn't particularly hard so long as you don't need some of the "extra"
features of mktime (canonicalizing a struct tm or accepting out of range
values and doing the "right thing").

I came in a little late to this discussion, but I gather that the overall
goal here is parsing RFC 2822 dates.  You're all certainly welcome to take
the code that I wrote for INN to do this if you wish, although it parses
the full RFC 2822 syntax and therefore may accept things you consider
insane (comments, newlines, etc.)  Or you're welcome to cherry-pick bits
and pieces out of it (like mktime_utc).  This code has a fairly extensive
test suite and has also been tested against the old INN parsedate function
on ~2M Usenet articles.

All of this code is my own work, and as far as I'm concerned it's in the
public domain or as close of an approximation that one can get to that in
your local legal environment.

The code is largish and needs some Autoconf support, so I won't just send
it to the list unless someone wants it, but let me know if you do.  You
can also get it by downloading INN from:

    <ftp://ftp.isc.org/isc/inn/snapshots/>

(getting the latest CURRENT snapshot) and looking in lib/date.c.  You
don't need the parsedate_nntp stuff, and you probably don't care about
parsedate_rfc2822_lax, which accepts common violations of RFC 2822 syntax
found in Usenet messages.  The test suite is in tests/lib/date-t.c.

-- 
Russ Allbery (rra@stanford.edu)             <http://www.eyrie.org/~eagle/>

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2005-04-25  1:28 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-14  8:16 Date handling David Woodhouse
2005-04-14  9:00 ` Linus Torvalds
2005-04-14  9:12   ` Linus Torvalds
2005-04-14 17:38     ` David Woodhouse
2005-04-14 19:19       ` tony.luck
2005-04-14 19:23         ` David Woodhouse
2005-04-24  3:04       ` Jan Harkes
2005-04-24  3:33         ` James Purser
2005-04-24  6:38         ` David Woodhouse
2005-04-24  6:43           ` Russ Allbery
2005-04-25  1:22           ` Jan Harkes
2005-04-25  1:32             ` Russ Allbery
2005-04-14  9:31   ` David Woodhouse
  -- strict thread matches above, loose matches on Subject: below --
2005-04-14 19:42 Luck, Tony
2005-04-14 20:54 ` David Woodhouse
2005-04-14 21:01   ` H. Peter Anvin
2005-04-14 21:48     ` David Woodhouse
2005-04-15  5:02 ` Paul Jackson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).