* [BUG] minor: wrong handling of GIT_AUTHOR_DATE
@ 2008-08-16 20:53 Hermann Gausterer
2008-08-16 23:03 ` Linus Torvalds
0 siblings, 1 reply; 10+ messages in thread
From: Hermann Gausterer @ 2008-08-16 20:53 UTC (permalink / raw)
To: Git Mailinglist; +Cc: Hermann Gausterer
[-- Attachment #1: Type: text/plain, Size: 960 bytes --]
hi
i found a minor bug in the handling of the
environment variable GIT_AUTHOR_DATE
i used this variable to import an old project.
i used "stat" to get the timestamp of a file
and set the git history to this date with
this command:
GIT_AUTHOR_DATE=`stat -c '%y' "$FILE"`
old files (created with an older kernel)
produced this output.
2008-05-28 14:21:35.000000000 +0200
but new files return nanosecond resolution
timestamps.
2008-06-04 17:25:54.917476713 +0200
of course this resolution is NOT needed
for git, but git DOES NOT ignore this time-
stamps. it changes the date to something
completly wrong :-/
steps to reproduce:
$ git init
$ touch test
$ stat -c %y test
2008-08-16 22:25:45.491701924 +0200
$ export GIT_AUTHOR_DATE=`stat -c %y test`
$ git add test
$ git commit -a
$ git log
commit 56f92b8f6efc7bdaa5abdf03a8c5dbf79dd1fdff
Author: Hermann Gausterer <git-bugreport@mrq1.org>
Date: Thu Aug 1 01:52:04 1985 +0200
test
$
mfg hermann
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [BUG] minor: wrong handling of GIT_AUTHOR_DATE 2008-08-16 20:53 [BUG] minor: wrong handling of GIT_AUTHOR_DATE Hermann Gausterer @ 2008-08-16 23:03 ` Linus Torvalds 2008-08-16 23:17 ` Linus Torvalds 0 siblings, 1 reply; 10+ messages in thread From: Linus Torvalds @ 2008-08-16 23:03 UTC (permalink / raw) To: Hermann Gausterer, Junio C Hamano; +Cc: Git Mailinglist On Sat, 16 Aug 2008, Hermann Gausterer wrote: > > i used "stat" to get the timestamp of a file > and set the git history to this date with > this command: > > GIT_AUTHOR_DATE=`stat -c '%y' "$FILE"` > > old files (created with an older kernel) > produced this output. > > 2008-05-28 14:21:35.000000000 +0200 > > but new files return nanosecond resolution > timestamps. > > 2008-06-04 17:25:54.917476713 +0200 > > of course this resolution is NOT needed > for git, but git DOES NOT ignore this time- > stamps. it changes the date to something > completly wrong :-/ Git uses a fairly odd date parsing library, and it turns out that 917476713 +0200 is actually a perfectly valid date in the git format, because one thing git allows is the "seconds since epoch" one. So doing [torvalds@nehalem git]$ ./test-date "917476713 +0200" 917476713 +0200 -> 917476713 +0200 -> Wed Jan 27 14:38:33 1999 917476713 +0200 -> Wed Jan 27 14:38:33 1999 and it turns out that git will totally ignore any other format date when i sees this standard format (yes, that is literally the format that git uses internally). So because git date parsing doesn't even really understand fractional seconds, and thus doesn't parse it, it will take the fraction, and if it was larger than 100000000, it will assume it's a seconds-since-epoch date. Unlucky. Anyway, something like this should fix it. Junio: we might also make the code that actually parses the seconds-per-epoch thing only trigger if we haven't already seen a date (ie it might check for "tm->tm_year < 0" etc before accepting that seconds format). Linus --- Subject: Ignore fractional seconds in date parsing From: Linus Torvalds <torvads@linux-foundation.org> .. otherwise a nanosecond resolution fractional second might be interpreted as a seconds-since-epoch date format string and overwrite the date we so carefully just parsed. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Noticed-by: Hermann Gausterer <git-mailinglist@mrq1.org> --- date.c | 5 +++++ 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/date.c b/date.c index 35a5257..5e502da 100644 --- a/date.c +++ b/date.c @@ -363,6 +363,11 @@ static int match_multi_number(unsigned long num, char c, const char *date, char tm->tm_hour = num; tm->tm_min = num2; tm->tm_sec = num3; + + /* Ignore any possible fractional seconds */ + if (*end == '.') + (void) strtol(end+1, &end, 10); + break; } return 0; ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [BUG] minor: wrong handling of GIT_AUTHOR_DATE 2008-08-16 23:03 ` Linus Torvalds @ 2008-08-16 23:17 ` Linus Torvalds 2008-08-17 2:46 ` Junio C Hamano 0 siblings, 1 reply; 10+ messages in thread From: Linus Torvalds @ 2008-08-16 23:17 UTC (permalink / raw) To: Hermann Gausterer, Junio C Hamano; +Cc: Git Mailinglist On Sat, 16 Aug 2008, Linus Torvalds wrote: > > Junio: we might also make the code that actually parses the > seconds-per-epoch thing only trigger if we haven't already seen a date (ie > it might check for "tm->tm_year < 0" etc before accepting that seconds > format). Here's a slightly expanded version of the previous patch, which will ignore those big integer if it has already seen any human-readable date format (either any time except 0:00:00 or any normal date). It includes the fractional second parsing code from the previous patch too, since that's an independent thing and makes sense regardless. Junio, your call. But this one gets the date right for strings that just randomly have some big number in them, ie [torvalds@nehalem git]$ ./test-date "17:25:54 917476713 2008-06-04 -0700" 17:25:54 917476713 2008-06-04 -0700 -> 1212625554 -0700 -> Wed Jun 4 17:25:54 2008 17:25:54 917476713 2008-06-04 -0700 -> Wed Jun 4 17:25:54 2008 because it will now see that "nodate()" is not true. I think it's a good idea to only accept the epoch format when there hasn't been any other time format visible. Linus --- date.c | 16 +++++++++++++++- 1 files changed, 15 insertions(+), 1 deletions(-) diff --git a/date.c b/date.c index 35a5257..e11e78e 100644 --- a/date.c +++ b/date.c @@ -363,6 +363,11 @@ static int match_multi_number(unsigned long num, char c, const char *date, char tm->tm_hour = num; tm->tm_min = num2; tm->tm_sec = num3; + + /* Ignore any possible fractional seconds */ + if (*end == '.') + (void) strtol(end+1, &end, 10); + break; } return 0; @@ -402,6 +407,15 @@ static int match_multi_number(unsigned long num, char c, const char *date, char return end - date; } +/* Have we filled in any part of the time/date yet? */ +static inline int nodate(struct tm *tm) +{ + return tm->tm_year < 0 && + tm->tm_mon < 0 && + tm->tm_mday < 0 && + !(tm->tm_hour | tm->tm_min | tm->tm_sec); +} + /* * We've seen a digit. Time? Year? Date? */ @@ -418,7 +432,7 @@ static int match_digit(const char *date, struct tm *tm, int *offset, int *tm_gmt * more than 8 digits. This is because we don't want to rule out * numbers like 20070606 as a YYYYMMDD date. */ - if (num >= 100000000) { + if (num >= 100000000 && nodate(tm)) { time_t time = num; if (gmtime_r(&time, tm)) { *tm_gmt = 1; ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [BUG] minor: wrong handling of GIT_AUTHOR_DATE 2008-08-16 23:17 ` Linus Torvalds @ 2008-08-17 2:46 ` Junio C Hamano 2008-08-17 3:13 ` Junio C Hamano 2008-08-17 3:50 ` Linus Torvalds 0 siblings, 2 replies; 10+ messages in thread From: Junio C Hamano @ 2008-08-17 2:46 UTC (permalink / raw) To: Linus Torvalds; +Cc: Hermann Gausterer, Git Mailinglist Linus Torvalds <torvalds@linux-foundation.org> writes: > Junio, your call. But this one gets the date right for strings that just > randomly have some big number in them, ie > > [torvalds@nehalem git]$ ./test-date "17:25:54 917476713 2008-06-04 -0700" > 17:25:54 917476713 2008-06-04 -0700 -> 1212625554 -0700 -> Wed Jun 4 17:25:54 2008 > 17:25:54 917476713 2008-06-04 -0700 -> Wed Jun 4 17:25:54 2008 Being able to parse this is a very low priority. You've taught people here and on the kernel list that the "date" can use any non-digit-non-word as a word separator, and "git log --since 2.days" is something you often do. People who followed that advice would have gotten used to this already, e.g. $ git reflog delete master@{07.04.2005.15:15:00.-0700} should not be broken. I think your first hunk needs to distinguish between "very-long-precision posint" (in which case we ignore because it is likely to be nanoseconds fraction) and others. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG] minor: wrong handling of GIT_AUTHOR_DATE 2008-08-17 2:46 ` Junio C Hamano @ 2008-08-17 3:13 ` Junio C Hamano 2008-08-17 4:25 ` Linus Torvalds 2008-08-17 3:50 ` Linus Torvalds 1 sibling, 1 reply; 10+ messages in thread From: Junio C Hamano @ 2008-08-17 3:13 UTC (permalink / raw) To: Linus Torvalds; +Cc: Hermann Gausterer, Git Mailinglist Junio C Hamano <gitster@pobox.com> writes: > You've taught people here and on the kernel list that the "date" can use > any non-digit-non-word as a word separator, and "git log --since 2.days" > is something you often do. > > People who followed that advice would have gotten used to this already, e.g. > > $ git reflog delete master@{07.04.2005.15:15:00.-0700} > > should not be broken. > > I think your first hunk needs to distinguish between "very-long-precision > posint" (in which case we ignore because it is likely to be nanoseconds > fraction) and others. Perhaps like this. -- >8 -- From: Linus Torvalds <torvalds@linux-foundation.org> Date: Sat, 16 Aug 2008 16:17:50 -0700 Subject: [PATCH] date parsing: do not mistake fractional nanosecond that follow HH:MM:SS Some program output nanosecond fractional after the usual HH:MM:SS format. If the fraction is large enough, it can be interpreted as the seconds since epoch, and can overwrite the already parsed date/time. We also make sure we use the seconds since epoch interpretation only when we have not seen any other date/time data in the input yet. Note that we cannot unconditionally drop anything that follows '.'; people have been taught that we allow '.' as a word separator and have got used to formats like "--since 2.days" and "2008.08.16.01:23:45.-0700" to work. Noticed-by: Hermann Gausterer Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> --- date.c | 16 +++++++++++++++- 1 files changed, 15 insertions(+), 1 deletions(-) diff --git a/date.c b/date.c index 35a5257..b2c5a8b 100644 --- a/date.c +++ b/date.c @@ -363,6 +363,11 @@ static int match_multi_number(unsigned long num, char c, const char *date, char tm->tm_hour = num; tm->tm_min = num2; tm->tm_sec = num3; + if (*end == '.') { + num = strspn(end+1, "0123456789"); + if (9 <= num) + end += num + 1; + } break; } return 0; @@ -402,6 +407,15 @@ static int match_multi_number(unsigned long num, char c, const char *date, char return end - date; } +/* Have we filled in any part of the time/date yet? */ +static inline int nodate(struct tm *tm) +{ + return tm->tm_year < 0 && + tm->tm_mon < 0 && + tm->tm_mday < 0 && + !(tm->tm_hour | tm->tm_min | tm->tm_sec); +} + /* * We've seen a digit. Time? Year? Date? */ @@ -418,7 +432,7 @@ static int match_digit(const char *date, struct tm *tm, int *offset, int *tm_gmt * more than 8 digits. This is because we don't want to rule out * numbers like 20070606 as a YYYYMMDD date. */ - if (num >= 100000000) { + if (num >= 100000000 && nodate(tm)) { time_t time = num; if (gmtime_r(&time, tm)) { *tm_gmt = 1; -- 1.6.0.rc3.17.gc14c8 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [BUG] minor: wrong handling of GIT_AUTHOR_DATE 2008-08-17 3:13 ` Junio C Hamano @ 2008-08-17 4:25 ` Linus Torvalds 2008-08-17 4:37 ` Linus Torvalds 2008-08-17 5:27 ` Junio C Hamano 0 siblings, 2 replies; 10+ messages in thread From: Linus Torvalds @ 2008-08-17 4:25 UTC (permalink / raw) To: Junio C Hamano; +Cc: Hermann Gausterer, Git Mailinglist On Sat, 16 Aug 2008, Junio C Hamano wrote: > > Perhaps like this. So see in the previous email why I don't think "ignore nanoseconds" is really any better than "igore all fractions". That said: > @@ -363,6 +363,11 @@ static int match_multi_number(unsigned long num, char c, const char *date, char > tm->tm_hour = num; > tm->tm_min = num2; > tm->tm_sec = num3; > + if (*end == '.') { > + num = strspn(end+1, "0123456789"); > + if (9 <= num) > + end += num + 1; Apart from the "compare with 9", your patch is _much_ better than mine. Using "strtoul()" was a horrible horrible thing to do, since it will match not just '+' and '-' but also spaces etc. So my patch was definitely crap, and yours is better, but I don't much like that expectations of 9+ digits. After all, if we only worry about 9+ digits of a big number, then the "nodate()" logic already takes care of much of it. So here's a much better version, I think. The rules are: - valid days of month/mday are always single or double digits. - valid years are either two or four digits No, we don't support the year 600 _anyway_, since our encoding is based on the UNIX epoch, and the day we worry about the year 10,000 is far away and we can raise the limit to five digits when we get closer. - Other numbers (eg "600 days ago") can have any number of digits, but they cannot start with a zero. Again, the only exception is for two-digit numbers, since that is fairly common for dates ("Dec 01" is not unheard of) So that means that any milli- or micro-second would be thrown out just because the number of digits shows that it cannot be an interesting date. That would make the patch look something like this... [ Those four deleted lines I removed just because the cases had already been handled, eg the ">1900" case was already handled when we checked for a four-digit year, and the >70 case was handled when we checked for exactly two digits ] Hmm? Linus --- date.c | 26 ++++++++++++++++++++------ 1 files changed, 20 insertions(+), 6 deletions(-) diff --git a/date.c b/date.c index 35a5257..950b88f 100644 --- a/date.c +++ b/date.c @@ -402,6 +402,15 @@ static int match_multi_number(unsigned long num, char c, const char *date, char return end - date; } +/* Have we filled in any part of the time/date yet? */ +static inline int nodate(struct tm *tm) +{ + return tm->tm_year < 0 && + tm->tm_mon < 0 && + tm->tm_mday < 0 && + !(tm->tm_hour | tm->tm_min | tm->tm_sec); +} + /* * We've seen a digit. Time? Year? Date? */ @@ -418,7 +427,7 @@ static int match_digit(const char *date, struct tm *tm, int *offset, int *tm_gmt * more than 8 digits. This is because we don't want to rule out * numbers like 20070606 as a YYYYMMDD date. */ - if (num >= 100000000) { + if (num >= 100000000 && nodate(tm)) { time_t time = num; if (gmtime_r(&time, tm)) { *tm_gmt = 1; @@ -463,6 +472,13 @@ static int match_digit(const char *date, struct tm *tm, int *offset, int *tm_gmt } /* + * Ignore lots of numerals. We took care of 4-digit years above. + * Days or months must be one or two digits. + */ + if (n > 2) + return n; + + /* * NOTE! We will give precedence to day-of-month over month or * year numbers in the 1-12 range. So 05 is always "mday 5", * unless we already have a mday.. @@ -488,10 +504,6 @@ static int match_digit(const char *date, struct tm *tm, int *offset, int *tm_gmt if (num > 0 && num < 32) { tm->tm_mday = num; - } else if (num > 1900) { - tm->tm_year = num - 1900; - } else if (num > 70) { - tm->tm_year = num; } else if (num > 0 && num < 13) { tm->tm_mon = num-1; } @@ -823,7 +835,9 @@ static const char *approxidate_digit(const char *date, struct tm *tm, int *num) } } - *num = number; + /* Accept zero-padding only for small numbers ("Dec 02", never "Dec 0002") */ + if (date[0] != '0' || end - date <= 2) + *num = number; return end; } ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [BUG] minor: wrong handling of GIT_AUTHOR_DATE 2008-08-17 4:25 ` Linus Torvalds @ 2008-08-17 4:37 ` Linus Torvalds 2008-08-17 5:27 ` Junio C Hamano 1 sibling, 0 replies; 10+ messages in thread From: Linus Torvalds @ 2008-08-17 4:37 UTC (permalink / raw) To: Junio C Hamano; +Cc: Hermann Gausterer, Git Mailinglist On Sat, 16 Aug 2008, Linus Torvalds wrote: > > The rules are: > > - valid days of month/mday are always single or double digits. > > - valid years are either two or four digits > > No, we don't support the year 600 _anyway_, since our encoding is based > on the UNIX epoch, and the day we worry about the year 10,000 is far > away and we can raise the limit to five digits when we get closer. > > - Other numbers (eg "600 days ago") can have any number of digits, but > they cannot start with a zero. Again, the only exception is for > two-digit numbers, since that is fairly common for dates ("Dec 01" is > not unheard of) > > So that means that any milli- or micro-second would be thrown out just > because the number of digits shows that it cannot be an interesting date. I should explain that nonsensical statement a bit. A milli- or micro-second can obviously be a perfectly fine number according to the rules above, as long as it doesn't start with a '0'. So if we have 12:34:56.123 then that '123' gets parsed as a number, and we remember it. But because it's bigger than 31, we'll never use it as such _unless_ there is something after it to trigger that use. So you can say "12:34:56.123.days.ago", and because of the "days", that 123 will actually be meaninful now. But the problem with "12.34.56.001" was that we used to remember the "001" as a number, and because we could see no other use for it we then assumed that it meant the day of the month. Of course, we *should* do that only if we have seen a month-name too, but we don't currently track that, so.. Adding that as a further sanity test would be good, but it would require us to have some extra "flags" field. Maybe we should have a #define SEEN_MONTH 1 #define SEEN_YEAR 2 #define SEEN_DAY 4 #define SEEN_TIME 8 ... struct extended_tm { unsigned long seen; unsigned long number; struct tm tm; } and pass *that* around instead of passing "struct tm *" and "unsigned long *num" around. That would be good. Then we could do if (tm->number && tm->number < 32 && (tm->seen & SEEN_MONTH) && !(tm->seen & SEEN_DAY)) tm->tm.tm_mday = tm->number; there instead, which would protect us from other numbers just being seen as days instead. Anybody? I'm not going to bother. Linus ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG] minor: wrong handling of GIT_AUTHOR_DATE 2008-08-17 4:25 ` Linus Torvalds 2008-08-17 4:37 ` Linus Torvalds @ 2008-08-17 5:27 ` Junio C Hamano 2008-08-17 16:07 ` Linus Torvalds 1 sibling, 1 reply; 10+ messages in thread From: Junio C Hamano @ 2008-08-17 5:27 UTC (permalink / raw) To: Linus Torvalds; +Cc: Hermann Gausterer, Git Mailinglist Linus Torvalds <torvalds@linux-foundation.org> writes: > [ Those four deleted lines I removed just because the cases had already > been handled, eg the ">1900" case was already handled when we checked > for a four-digit year, and the >70 case was handled when we checked for > exactly two digits ] > > Hmm? > @@ -488,10 +504,6 @@ static int match_digit(const char *date, struct tm *tm, int *offset, int *tm_gmt > > if (num > 0 && num < 32) { > tm->tm_mday = num; > - } else if (num > 1900) { > - tm->tm_year = num - 1900; > - } else if (num > 70) { > - tm->tm_year = num; > } else if (num > 0 && num < 13) { > tm->tm_mon = num-1; > } The comment above this part says we always favor mday over mon, but I wonder why this sequence is not like: if (tm->tm_mday is not set && num > 0 && num < 32) tm->tm_mday = num; else if (tm->tm_mon is not set && num > 0 && num < 13) tm->tm_mon = num - 1; Is this because we do not initialize tm fields to "unknown" in the beginning? I admit I haven't bothered to look at this part of the code for a looong time. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG] minor: wrong handling of GIT_AUTHOR_DATE 2008-08-17 5:27 ` Junio C Hamano @ 2008-08-17 16:07 ` Linus Torvalds 0 siblings, 0 replies; 10+ messages in thread From: Linus Torvalds @ 2008-08-17 16:07 UTC (permalink / raw) To: Junio C Hamano; +Cc: Hermann Gausterer, Git Mailinglist On Sat, 16 Aug 2008, Junio C Hamano wrote: > > Is this because we do not initialize tm fields to "unknown" in the > beginning? I admit I haven't bothered to look at this part of the code > for a looong time. Indeed. The _exact_ date handling initializes the date to -1 (and the time to 0), but the approxidate thing defaults to "now" and then modifies that. Which is why we'd need to have a separate flags field for "I have initialized this field". Linus ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG] minor: wrong handling of GIT_AUTHOR_DATE 2008-08-17 2:46 ` Junio C Hamano 2008-08-17 3:13 ` Junio C Hamano @ 2008-08-17 3:50 ` Linus Torvalds 1 sibling, 0 replies; 10+ messages in thread From: Linus Torvalds @ 2008-08-17 3:50 UTC (permalink / raw) To: Junio C Hamano; +Cc: Hermann Gausterer, Git Mailinglist On Sat, 16 Aug 2008, Junio C Hamano wrote: > > People who followed that advice would have gotten used to this already, e.g. > > $ git reflog delete master@{07.04.2005.15:15:00.-0700} > > should not be broken. Hmm. Fair enough. In that case, just the "nodate()" approach is probably fine on its own. HOWEVER: > I think your first hunk needs to distinguish between "very-long-precision > posint" (in which case we ignore because it is likely to be nanoseconds > fraction) and others. Well, that ignores nanosecond resolution seconds, but not microseconds, for example. Now, microseconds normally don't matter (because they won't trigger the 'seconds-since-epoch' case), but they _can_ trigger some other cases. For example, let's assume that we have microseconds in the date specifier. Then try this one: ./test-date "12:12:12.000001" Notice what happens? Oops. With my patch, you get 12:12:12.0000001 -> Sat Aug 16 12:12:12 2008 and with your, you get 12:12:12.000001 -> Fri Aug 1 12:12:12 2008 and yeah, it's odd, but I can explain it. But you are definitely right about the case of doing "15:15:00.-0700" and yes, my patch was crap too. Linus ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2008-08-17 16:09 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-08-16 20:53 [BUG] minor: wrong handling of GIT_AUTHOR_DATE Hermann Gausterer 2008-08-16 23:03 ` Linus Torvalds 2008-08-16 23:17 ` Linus Torvalds 2008-08-17 2:46 ` Junio C Hamano 2008-08-17 3:13 ` Junio C Hamano 2008-08-17 4:25 ` Linus Torvalds 2008-08-17 4:37 ` Linus Torvalds 2008-08-17 5:27 ` Junio C Hamano 2008-08-17 16:07 ` Linus Torvalds 2008-08-17 3:50 ` Linus Torvalds
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox